-
Did I Vet You Before? Assessing the Chrome Web Store Vetting Process through Browser Extension Similarity
Authors:
José Miguel Moreno,
Narseo Vallina-Rodriguez,
Juan Tapiador
Abstract:
Web browsers, particularly Google Chrome and other Chromium-based browsers, have grown in popularity over the past decade, with browser extensions becoming an integral part of their ecosystem. These extensions can customize and enhance the user experience, providing functionality that ranges from ad blockers to, more recently, AI assistants. Given the ever-increasing importance of web browsers, di…
▽ More
Web browsers, particularly Google Chrome and other Chromium-based browsers, have grown in popularity over the past decade, with browser extensions becoming an integral part of their ecosystem. These extensions can customize and enhance the user experience, providing functionality that ranges from ad blockers to, more recently, AI assistants. Given the ever-increasing importance of web browsers, distribution marketplaces for extensions play a key role in kee** users safe by vetting submissions that display abusive or malicious behavior. In this paper, we characterize the prevalence of malware and other infringing extensions in the Chrome Web Store (CWS), the largest distribution platform for this type of software. To do so, we introduce SimExt, a novel methodology for detecting similarly behaving extensions that leverages static and dynamic analysis, Natural Language Processing (NLP) and vector embeddings. Our study reveals significant gaps in the CWS vetting process, as 86% of infringing extensions are extremely similar to previously vetted items, and these extensions take months or even years to be removed. By characterizing the top kinds of infringing extension, we find that 83% are New Tab Extensions (NTEs) and raise some concerns about the consistency of the vetting labels assigned by CWS analysts. Our study also reveals that only 1% of malware extensions flagged by the CWS are detected as malicious by anti-malware engines, indicating a concerning gap between the threat landscape seen by CWS moderators and the detection capabilities of the threat intelligence community.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
Authors:
Bianca Marin Moreno,
Margaux Brégère,
Pierre Gaillard,
Nadia Oudjane
Abstract:
We explore online learning in episodic loop-free Markov decision processes on non-stationary environments (changing losses and probability transitions). Our focus is on the Concave Utility Reinforcement Learning problem (CURL), an extension of classical RL for handling convex performance criteria in state-action distributions induced by agent policies. While various machine learning problems can b…
▽ More
We explore online learning in episodic loop-free Markov decision processes on non-stationary environments (changing losses and probability transitions). Our focus is on the Concave Utility Reinforcement Learning problem (CURL), an extension of classical RL for handling convex performance criteria in state-action distributions induced by agent policies. While various machine learning problems can be written as CURL, its non-linearity invalidates traditional Bellman equations. Despite recent solutions to classical CURL, none address non-stationary MDPs. This paper introduces MetaCURL, the first CURL algorithm for non-stationary MDPs. It employs a meta-algorithm running multiple black-box algorithms instances over different intervals, aggregating outputs via a slee** expert framework. The key hurdle is partial information due to MDP uncertainty. Under partial information on the probability transitions (uncertainty and non-stationarity coming only from external noise, independent of agent state-action pairs), we achieve optimal dynamic regret without prior knowledge of MDP changes. Unlike approaches for RL, MetaCURL handles full adversarial losses, not just stochastic ones. We believe our approach for managing non-stationarity with experts can be of interest to the RL community.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
A Guide to Tracking Phylogenies in Parallel and Distributed Agent-based Evolution Models
Authors:
Matthew Andres Moreno,
Anika Ranjan,
Emily Dolson,
Luis Zaman
Abstract:
Computer simulations are an important tool for studying the mechanics of biological evolution. In particular, in silico work with agent-based models provides an opportunity to collect high-quality records of ancestry relationships among simulated agents. Such phylogenies can provide insight into evolutionary dynamics within these simulations. Existing work generally tracks lineages directly, yield…
▽ More
Computer simulations are an important tool for studying the mechanics of biological evolution. In particular, in silico work with agent-based models provides an opportunity to collect high-quality records of ancestry relationships among simulated agents. Such phylogenies can provide insight into evolutionary dynamics within these simulations. Existing work generally tracks lineages directly, yielding an exact phylogenetic record of evolutionary history. However, direct tracking can be inefficient for large-scale, many-processor evolutionary simulations. An alternate approach to extracting phylogenetic information from simulation that scales more favorably is post hoc estimation, akin to how bioinformaticians build phylogenies by assessing genetic similarities between organisms. Recently introduced ``hereditary stratigraphy'' algorithms provide means for efficient inference of phylogenetic history from non-coding annotations on simulated organisms' genomes. A number of options exist in configuring hereditary stratigraphy methodology, but no work has yet tested how they impact reconstruction quality. To address this question, we surveyed reconstruction accuracy under alternate configurations across a matrix of evolutionary conditions varying in selection pressure, spatial structure, and ecological dynamics. We synthesize results from these experiments to suggest a prescriptive system of best practices for work with hereditary stratigraphy, ultimately guiding researchers in choosing appropriate instrumentation for large-scale simulation studies.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Phylotrack: C++ and Python libraries for in silico phylogenetic tracking
Authors:
Emily Dolson,
Santiago Rodriguez-Papa,
Matthew Andres Moreno
Abstract:
In silico evolution instantiates the processes of heredity, variation, and differential reproductive success (the three "ingredients" for evolution by natural selection) within digital populations of computational agents. Consequently, these populations undergo evolution, and can be used as virtual model systems for studying evolutionary dynamics. This experimental paradigm -- used across biologic…
▽ More
In silico evolution instantiates the processes of heredity, variation, and differential reproductive success (the three "ingredients" for evolution by natural selection) within digital populations of computational agents. Consequently, these populations undergo evolution, and can be used as virtual model systems for studying evolutionary dynamics. This experimental paradigm -- used across biological modeling, artificial life, and evolutionary computation -- complements research done using in vitro and in vivo systems by enabling experiments that would be impossible in the lab or field. One key benefit is complete, exact observability. For example, it is possible to perfectly record all parent-child relationships across simulation history, yielding complete phylogenies (ancestry trees). This information reveals when traits were gained or lost, and also facilitates inference of underlying evolutionary dynamics.
The Phylotrack project provides libraries for tracking and analyzing phylogenies in in silico evolution. The project is composed of 1) Phylotracklib: a header-only C++ library, developed under the umbrella of the Empirical project, and 2) Phylotrackpy: a Python wrapper around Phylotracklib, created with Pybind11. Both components supply a public-facing API to attach phylogenetic tracking to digital evolution systems, as well as a stand-alone interface for measuring a variety of popular phylogenetic topology metrics. Underlying design and C++ implementation prioritizes efficiency, allowing for fast generational turnover for agent populations numbering in the tens of thousands. Several explicit features (e.g., phylogeny pruning and abstraction, etc.) are provided for reducing the memory footprint of phylogenetic information.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Ecology, Spatial Structure, and Selection Pressure Induce Strong Signatures in Phylogenetic Structure
Authors:
Matthew Andres Moreno,
Santiago Rodriguez-Papa,
Emily Dolson
Abstract:
Evolutionary dynamics are shaped by a variety of fundamental, generic drivers, including spatial structure, ecology, and selection pressure. These drivers impact the trajectory of evolution, and have been hypothesized to influence phylogenetic structure. Here, we set out to assess (1) if spatial structure, ecology, and selection pressure leave detectable signatures in phylogenetic structure, (2) t…
▽ More
Evolutionary dynamics are shaped by a variety of fundamental, generic drivers, including spatial structure, ecology, and selection pressure. These drivers impact the trajectory of evolution, and have been hypothesized to influence phylogenetic structure. Here, we set out to assess (1) if spatial structure, ecology, and selection pressure leave detectable signatures in phylogenetic structure, (2) the extent, in particular, to which ecology can be detected and discerned in the presence of spatial structure, and (3) the extent to which these phylogenetic signatures generalize across evolutionary systems. To this end, we analyze phylogenies generated by manipulating spatial structure, ecology, and selection pressure within three computational models of varied scope and sophistication. We find that selection pressure, spatial structure, and ecology have characteristic effects on phylogenetic metrics, although these effects are complex and not always intuitive. Signatures have some consistency across systems when using equivalent taxonomic unit definitions (e.g., individual, genotype, species). Further, we find that sufficiently strong ecology can be detected in the presence of spatial structure. We also find that, while low-resolution phylogenetic reconstructions can bias some phylogenetic metrics, high-resolution reconstructions recapitulate them faithfully. Although our results suggest potential for evolutionary inference of spatial structure, ecology, and selection pressure through phylogenetic analysis, further methods development is needed to distinguish these drivers' phylometric signatures from each other and to appropriately normalize phylogenetic metrics. With such work, phylogenetic analysis could provide a versatile toolkit to study large-scale evolving populations.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Case Study of Novelty, Complexity, and Adaptation in a Multicellular System
Authors:
Matthew Andres Moreno,
Santiago Rodriguez Papa,
Charles Ofria
Abstract:
Continuing generation of novelty, complexity, and adaptation are well-established as core aspects of open-ended evolution. However, it has yet to be firmly established to what extent these phenomena are coupled and by what means they interact. In this work, we track the co-evolution of novelty, complexity, and adaptation in a case study from the DISHTINY simulation system, which is designed to stu…
▽ More
Continuing generation of novelty, complexity, and adaptation are well-established as core aspects of open-ended evolution. However, it has yet to be firmly established to what extent these phenomena are coupled and by what means they interact. In this work, we track the co-evolution of novelty, complexity, and adaptation in a case study from the DISHTINY simulation system, which is designed to study the evolution of digital multicellularity. In this case study, we describe ten qualitatively distinct multicellular morphologies, several of which exhibit asymmetrical growth and distinct life stages. We contextualize the evolutionary history of these morphologies with measurements of complexity and adaptation. Our case study suggests a loose -- sometimes divergent -- relationship can exist among novelty, complexity, and adaptation.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Trackable Island-model Genetic Algorithms at Wafer Scale
Authors:
Matthew Andres Moreno,
Connor Yang,
Emily Dolson,
Luis Zaman
Abstract:
Emerging ML/AI hardware accelerators, like the 850,000 processor Cerebras Wafer-Scale Engine (WSE), hold great promise to scale up the capabilities of evolutionary computation. However, challenges remain in maintaining visibility into underlying evolutionary processes while efficiently utilizing these platforms' large processor counts. Here, we focus on the problem of extracting phylogenetic infor…
▽ More
Emerging ML/AI hardware accelerators, like the 850,000 processor Cerebras Wafer-Scale Engine (WSE), hold great promise to scale up the capabilities of evolutionary computation. However, challenges remain in maintaining visibility into underlying evolutionary processes while efficiently utilizing these platforms' large processor counts. Here, we focus on the problem of extracting phylogenetic information from digital evolution on the WSE platform. We present a tracking-enabled asynchronous island-based genetic algorithm (GA) framework for WSE hardware. Emulated and on-hardware GA benchmarks with a simple tracking-enabled agent model clock upwards of 1 million generations a minute for population sizes reaching 16 million. This pace enables quadrillions of evaluations a day. We validate phylogenetic reconstructions from these trials and demonstrate their suitability for inference of underlying evolutionary conditions. In particular, we demonstrate extraction of clear phylometric signals that differentiate wafer-scale runs with adaptive dynamics enabled versus disabled. Together, these benchmark and validation trials reflect strong potential for highly scalable evolutionary computation that is both efficient and observable. Kernel code implementing the island-model GA supports drop-in customization to support any fixed-length genome content and fitness criteria, allowing it to be leveraged to advance research interests across the community.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Trackable Agent-based Evolution Models at Wafer Scale
Authors:
Matthew Andres Moreno,
Connor Yang,
Emily Dolson,
Luis Zaman
Abstract:
Continuing improvements in computing hardware are poised to transform capabilities for in silico modeling of cross-scale phenomena underlying major open questions in evolutionary biology and artificial life, such as transitions in individuality, eco-evolutionary dynamics, and rare evolutionary events. Emerging ML/AI-oriented hardware accelerators, like the 850,000 processor Cerebras Wafer Scale En…
▽ More
Continuing improvements in computing hardware are poised to transform capabilities for in silico modeling of cross-scale phenomena underlying major open questions in evolutionary biology and artificial life, such as transitions in individuality, eco-evolutionary dynamics, and rare evolutionary events. Emerging ML/AI-oriented hardware accelerators, like the 850,000 processor Cerebras Wafer Scale Engine (WSE), hold particular promise. However, practical challenges remain in conducting informative evolution experiments that efficiently utilize these platforms' large processor counts. Here, we focus on the problem of extracting phylogenetic information from agent-based evolution on the WSE platform. This goal drove significant refinements to decentralized in silico phylogenetic tracking, reported here. These improvements yield order-of-magnitude performance improvements. We also present an asynchronous island-based genetic algorithm (GA) framework for WSE hardware. Emulated and on-hardware GA benchmarks with a simple tracking-enabled agent model clock upwards of 1 million generations a minute for population sizes reaching 16 million agents. We validate phylogenetic reconstructions from these trials and demonstrate their suitability for inference of underlying evolutionary conditions. In particular, we demonstrate extraction, from wafer-scale simulation, of clear phylometric signals that differentiate runs with adaptive dynamics enabled versus disabled. Together, these benchmark and validation trials reflect strong potential for highly scalable agent-based evolution simulation that is both efficient and observable. Developed capabilities will bring entirely new classes of previously intractable research questions within reach, benefiting further explorations within the evolutionary biology and artificial life communities across a variety of emerging high-performance computing platforms.
△ Less
Submitted 1 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Methods to Estimate Cryptic Sequence Complexity
Authors:
Matthew Andres Moreno
Abstract:
Complexity is a signature quality of interest in artificial life systems. Alongside other dimensions of assessment, it is common to quantify genome sites that contribute to fitness as a complexity measure. However, limitations to the sensitivity of fitness assays in models with implicit replication criteria involving rich biotic interactions introduce the possibility of difficult-to-detect ``crypt…
▽ More
Complexity is a signature quality of interest in artificial life systems. Alongside other dimensions of assessment, it is common to quantify genome sites that contribute to fitness as a complexity measure. However, limitations to the sensitivity of fitness assays in models with implicit replication criteria involving rich biotic interactions introduce the possibility of difficult-to-detect ``cryptic'' adaptive sites, which contribute small fitness effects below the threshold of individual detectability or involve epistatic redundancies. Here, we propose three knockout-based assay procedures designed to quantify cryptic adaptive sites within digital genomes. We report initial tests of these methods on a simple genome model with explicitly configured site fitness effects. In these limited tests, estimation results reflect ground truth cryptic sequence complexities well. Presented work provides initial steps toward development of new methods and software tools that improve the resolution, rigor, and tractability of complexity analyses across alife systems, particularly those requiring expensive in situ assessments of organism fitness.
△ Less
Submitted 31 May, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Algorithms for Efficient, Compact Online Data Stream Curation
Authors:
Matthew Andres Moreno,
Santiago Rodriguez Papa,
Emily Dolson
Abstract:
Data stream algorithms tackle operations on high-volume sequences of read-once data items. Data stream scenarios include inherently real-time systems like sensor networks and financial markets. They also arise in purely-computational scenarios like ordered traversal of big data or long-running iterative simulations. In this work, we develop methods to maintain running archives of stream data that…
▽ More
Data stream algorithms tackle operations on high-volume sequences of read-once data items. Data stream scenarios include inherently real-time systems like sensor networks and financial markets. They also arise in purely-computational scenarios like ordered traversal of big data or long-running iterative simulations. In this work, we develop methods to maintain running archives of stream data that are temporally representative, a task we call "stream curation." Our approach contributes to rich existing literature on data stream binning, which we extend by providing stateless (i.e., non-iterative) curation schemes that enable key optimizations to trim archive storage overhead and streamline processing of incoming observations. We also broaden support to cover new trade-offs between curated archive size and temporal coverage. We present a suite of five stream curation algorithms that span $\mathcal{O}(n)$, $\mathcal{O}(\log n)$, and $\mathcal{O}(1)$ orders of growth for retained data items. Within each order of growth, algorithms are provided to maintain even coverage across history or bias coverage toward more recent time points. More broadly, memory-efficient stream curation can boost the data stream mining capabilities of low-grade hardware in roles such as sensor nodes and data logging devices.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Analysis of Phylogeny Tracking Algorithms for Serial and Multiprocess Applications
Authors:
Matthew Andres Moreno,
Santiago Rodriguez Papa,
Emily Dolson
Abstract:
Since the advent of modern bioinformatics, the challenging, multifaceted problem of reconstructing phylogenetic history from biological sequences has hatched perennial statistical and algorithmic innovation. Studies of the phylogenetic dynamics of digital, agent-based evolutionary models motivate a peculiar converse question: how to best engineer tracking to facilitate fast, accurate, and memory-e…
▽ More
Since the advent of modern bioinformatics, the challenging, multifaceted problem of reconstructing phylogenetic history from biological sequences has hatched perennial statistical and algorithmic innovation. Studies of the phylogenetic dynamics of digital, agent-based evolutionary models motivate a peculiar converse question: how to best engineer tracking to facilitate fast, accurate, and memory-efficient lineage reconstructions? Here, we formally describe procedures for phylogenetic analysis in both serial and distributed computing scenarios. With respect to the former, we demonstrate reference-counting-based pruning of extinct lineages. For the latter, we introduce a trie-based phylogenetic reconstruction approach for "hereditary stratigraphy" genome annotations. This process allows phylogenetic relationships between genomes to be inferred by comparing their similarities, akin to reconstruction of natural history from biological DNA sequences. Phylogenetic analysis capabilities significantly advance distributed agent-based simulations as a tool for evolutionary research, and also benefit application-oriented evolutionary computing. Such tracing could extend also to other digital artifacts that proliferate through replication, like digital media and computer viruses.
△ Less
Submitted 4 March, 2024; v1 submitted 29 February, 2024;
originally announced March 2024.
-
Runtime phylogenetic analysis enables extreme subsampling for test-based problems
Authors:
Alexander Lale**i,
Marcos Sanson,
Jack Garbus,
Matthew Andres Moreno,
Emily Dolson
Abstract:
A phylogeny describes the evolutionary history of an evolving population. Evolutionary search algorithms can perfectly track the ancestry of candidate solutions, illuminating a population's trajectory through the search space. However, phylogenetic analyses are typically limited to post-hoc studies of search performance. We introduce phylogeny-informed subsampling, a new class of subsampling metho…
▽ More
A phylogeny describes the evolutionary history of an evolving population. Evolutionary search algorithms can perfectly track the ancestry of candidate solutions, illuminating a population's trajectory through the search space. However, phylogenetic analyses are typically limited to post-hoc studies of search performance. We introduce phylogeny-informed subsampling, a new class of subsampling methods that exploit runtime phylogenetic analyses for solving test-based problems. Specifically, we assess two phylogeny-informed subsampling methods -- individualized random subsampling and ancestor-based subsampling -- on three diagnostic problems and ten genetic programming (GP) problems from program synthesis benchmark suites. Overall, we found that phylogeny-informed subsampling methods enable problem-solving success at extreme subsampling levels where other subsampling methods fail. For example, phylogeny-informed subsampling methods more reliably solved program synthesis problems when evaluating just one training case per-individual, per-generation. However, at moderate subsampling levels, phylogeny-informed subsampling generally performed no better than random subsampling on GP problems. Our diagnostic experiments show that phylogeny-informed subsampling improves diversity maintenance relative to random subsampling, but its effects on a selection scheme's capacity to rapidly exploit fitness gradients varied by selection scheme. Continued refinements of phylogeny-informed subsampling techniques offer a promising new direction for scaling up evolutionary systems to handle problems with many expensive-to-evaluate fitness criteria.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Streamlining Advanced Taxi Assignment Strategies based on Legal Analysis
Authors:
Holger Billhardt,
José-Antonio Santos,
Alberto Fernández,
Mar Moreno,
Sascha Ossowski,
José A. Rodríguez
Abstract:
In recent years many novel applications have appeared that promote the provision of services and activities in a collaborative manner. The key idea behind such systems is to take advantage of idle or underused capacities of existing resources, in order to provide improved services that assist people in their daily tasks, with additional functionality, enhanced efficiency, and/or reduced cost. Part…
▽ More
In recent years many novel applications have appeared that promote the provision of services and activities in a collaborative manner. The key idea behind such systems is to take advantage of idle or underused capacities of existing resources, in order to provide improved services that assist people in their daily tasks, with additional functionality, enhanced efficiency, and/or reduced cost. Particularly in the domain of urban transportation, many researchers have put forward novel ideas, which are then implemented and evaluated through prototypes that usually draw upon AI methods and tools. However, such proposals also bring up multiple non-technical issues that need to be identified and addressed adequately if such systems are ever meant to be applied to the real world. While, in practice, legal and ethical aspects related to such AI-based systems are seldomly considered in the beginning of the research and development process, we argue that they not only restrict design decisions, but can also help guiding them. In this manuscript, we set out from a prototype of a taxi coordination service that mediates between individual (and autonomous) taxis and potential customers. After representing key aspects of its operation in a semi-structured manner, we analyse its viability from the viewpoint of current legal restrictions and constraints, so as to identify additional non-functional requirements as well as options to address them. Then, we go one step ahead, and actually modify the existing prototype to incorporate the previously identified recommendations. Performing experiments with this improved system helps us identify the most adequate option among several legally admissible alternatives.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
MolecularWebXR: Multiuser discussions about chemistry and biology in immersive and inclusive VR
Authors:
Fabio J. Cortes Rodriguez,
Gianfranco Frattini,
Fernando Teixeira Pinto Meireles,
Danae A. Terrien,
Sergio Cruz-Leon,
Matteo Dal Peraro,
Eva Schier,
Diego M. Moreno,
Luciano A. Abriata
Abstract:
MolecularWebXR is our new website for education, science communication and scientific peer discussion in chemistry and biology built on WebXR. It democratizes multi-user, inclusive virtual reality (VR) experiences that are deeply immersive for users wearing high-end headsets, yet allow participation by users with consumer devices such as smartphones, possibly inserted into cardboard goggles for im…
▽ More
MolecularWebXR is our new website for education, science communication and scientific peer discussion in chemistry and biology built on WebXR. It democratizes multi-user, inclusive virtual reality (VR) experiences that are deeply immersive for users wearing high-end headsets, yet allow participation by users with consumer devices such as smartphones, possibly inserted into cardboard goggles for immersivity, or even computers or tablets. With no installs as it is all web-served, MolecularWebXR enables multiple users to simultaneously explore, communicate and discuss chemistry and biology concepts in immersive 3D environments, manipulating objects with their bare hands, either present in the same real space or scattered throughout the globe thanks to built-in audio features. A series of preset rooms cover educational material on chemistry and structural biology, and an empty room can be populated with material prepared ad hoc using moleculARweb's VMD-based PDB2AR tool. We verified ease of use and versatility by users aged 12-80 in entirely virtual sessions or mixed real-virtual sessions at science outreach events, student instruction, scientific collaborations, and conference lectures. MolecularWebXR is available for free use without registration at https://molecularwebxr.org, and a blog post version of this preprint with embedded videos is available at https://go.epfl.ch/molecularwebxr-blog-post.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Learning From Peers: A Survey of Perception and Utilization of Online Peer Support Among Informal Dementia Caregivers
Authors:
Zhijun Yin,
Lauren Stratton,
Qingyuan Song,
Congning Ni,
Lijun Song,
Patricia A. Commiskey,
Qingxia Chen,
Monica Moreno,
Sam Fazio,
Bradley A. Malin
Abstract:
Informal dementia caregivers are those who care for a person living with dementia (PLWD) without receiving payment (e.g., family members, friends, or other unpaid caregivers). These informal caregivers are subject to substantial mental, physical, and financial burdens. Online communities enable these caregivers to exchange caregiving strategies and communicate experiences with other caregivers who…
▽ More
Informal dementia caregivers are those who care for a person living with dementia (PLWD) without receiving payment (e.g., family members, friends, or other unpaid caregivers). These informal caregivers are subject to substantial mental, physical, and financial burdens. Online communities enable these caregivers to exchange caregiving strategies and communicate experiences with other caregivers whom they generally do not know in real life. Research has demonstrated the benefits of peer support in online communities, but they are limited in focusing merely on caregivers who are already online users. In this paper, we designed and administered a survey to investigate the perception and utilization of online peer support from 140 informal dementia caregivers (with 100 online-community caregivers). Our findings show that the behavior to access any online community is only significantly associated with their belief in the value of online peer support (p = 0.006). Moreover, 33 (83%) of the 40 non-online-community caregivers had a belief score above 24, a score assigned when a neutral option is selected for each belief question. The reasons most articulated for not accessing any online community were no time to do so (14; 10%), and insufficient online information searching skills (9; 6%). Our findings suggest that online peer support is valuable, but practical strategies are needed to assist informal dementia caregivers who have limited time or searching skills.
△ Less
Submitted 31 August, 2023;
originally announced September 2023.
-
A Polystore Architecture Using Knowledge Graphs to Support Queries on Heterogeneous Data Stores
Authors:
Leonardo Guerreiro Azevedo,
Renan Francisco Santos Souza,
Elton F. de S. Soares,
Raphael M. Thiago,
Julio Cesar Cardoso Tesolin,
Ann C. Oliveira,
Marcio Ferreira Moreno
Abstract:
Modern applications commonly need to manage dataset types composed of heterogeneous data and schemas, making it difficult to access them in an integrated way. A single data store to manage heterogeneous data using a common data model is not effective in such a scenario, which results in the domain data being fragmented in the data stores that best fit their storage and access requirements (e.g., N…
▽ More
Modern applications commonly need to manage dataset types composed of heterogeneous data and schemas, making it difficult to access them in an integrated way. A single data store to manage heterogeneous data using a common data model is not effective in such a scenario, which results in the domain data being fragmented in the data stores that best fit their storage and access requirements (e.g., NoSQL, relational DBMS, or HDFS). Besides, organization workflows independently consume these fragments, and usually, there is no explicit link among the fragments that would be useful to support an integrated view. The research challenge tackled by this work is to provide the means to query heterogeneous data residing on distinct data repositories that are not explicitly connected. We propose a federated database architecture by providing a single abstract global conceptual schema to users, allowing them to write their queries, encapsulating data heterogeneity, location, and linkage by employing: (i) meta-models to represent the global conceptual schema, the remote data local conceptual schemas, and map**s among them; (ii) provenance to create explicit links among the consumed and generated data residing in separate datasets. We evaluated the architecture through its implementation as a polystore service, following a microservice architecture approach, in a scenario that simulates a real case in Oil \& Gas industry. Also, we compared the proposed architecture to a relational multidatabase system based on foreign data wrappers, measuring the user's cognitive load to write a query (or query complexity) and the query processing time. The results demonstrated that the proposed architecture allows query writing two times less complex than the one written for the relational multidatabase system, adding an excess of no more than 30% in query processing time.
△ Less
Submitted 15 March, 2024; v1 submitted 7 August, 2023;
originally announced August 2023.
-
A Knowledge-Oriented Approach to Enhance Integration and Communicability in the Polkadot Ecosystem
Authors:
Marcio Ferreira Moreno,
Rafael Rossi de Mello Brandão
Abstract:
The Polkadot ecosystem is a disruptive and highly complex multi-chain architecture that poses challenges in terms of data analysis and communicability. Currently, there is a lack of standardized and holistic approaches to retrieve and analyze data across parachains and applications, making it difficult for general users and developers to access ecosystem data consistently. This paper proposes a co…
▽ More
The Polkadot ecosystem is a disruptive and highly complex multi-chain architecture that poses challenges in terms of data analysis and communicability. Currently, there is a lack of standardized and holistic approaches to retrieve and analyze data across parachains and applications, making it difficult for general users and developers to access ecosystem data consistently. This paper proposes a conceptual framework that includes a domain ontology called POnto (a Polkadot Ontology) to address these challenges. POnto provides a structured representation of the ecosystem's concepts and relationships, enabling a formal understanding of the platform. The proposed knowledge-oriented approach enhances integration and communicability, enabling a wider range of users to participate in the ecosystem and facilitating the development of AI-based applications. The paper presents a case study methodology to validate the proposed framework, which includes expert feedback and insights from the Polkadot community. The POnto ontology and the roadmap for a query engine based on a Controlled Natural Language using the ontology, provide valuable contributions to the growth and adoption of the Polkadot ecosystem in heterogeneous socio-technical environments.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Your Code is 0000: An Analysis of the Disposable Phone Numbers Ecosystem
Authors:
José Miguel Moreno,
Srdjan Matic,
Narseo Vallina-Rodriguez,
Juan Tapiador
Abstract:
Short Message Service (SMS) is a popular channel for online service providers to verify accounts and authenticate users registered to a particular service. Specialized applications, called Public SMS Gateways (PSGs), offer free Disposable Phone Numbers (DPNs) that can be used to receive SMS messages. DPNs allow users to protect their privacy when creating online accounts. However, they can also be…
▽ More
Short Message Service (SMS) is a popular channel for online service providers to verify accounts and authenticate users registered to a particular service. Specialized applications, called Public SMS Gateways (PSGs), offer free Disposable Phone Numbers (DPNs) that can be used to receive SMS messages. DPNs allow users to protect their privacy when creating online accounts. However, they can also be abused for fraudulent activities and to bypass security mechanisms like Two-Factor Authentication (2FA). In this paper, we perform a large-scale and longitudinal study of the DPN ecosystem by monitoring 17,141 unique DPNs in 29 PSGs over the course of 12 months. Using a dataset of over 70M messages, we provide an overview of the ecosystem and study the different services that offer DPNs and their relationships. Next, we build a framework that (i) identifies and classifies the purpose of an SMS; and (ii) accurately attributes every message to more than 200 popular Internet services that require SMS for creating registered accounts. Our results indicate that the DPN ecosystem is globally used to support fraudulent account creation and access, and that this issue is ubiquitous and affects all major Internet platforms and specialized online services.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Phylogeny-informed fitness estimation
Authors:
Alexander Lale**i,
Matthew Andres Moreno,
Jose Guadalupe Hernandez,
Emily Dolson
Abstract:
Phylogenies (ancestry trees) depict the evolutionary history of an evolving population. In evolutionary computing, a phylogeny can reveal how an evolutionary algorithm steers a population through a search space, illuminating the step-by-step process by which any solutions evolve. Thus far, phylogenetic analyses have primarily been applied as post-hoc analyses used to deepen our understanding of ex…
▽ More
Phylogenies (ancestry trees) depict the evolutionary history of an evolving population. In evolutionary computing, a phylogeny can reveal how an evolutionary algorithm steers a population through a search space, illuminating the step-by-step process by which any solutions evolve. Thus far, phylogenetic analyses have primarily been applied as post-hoc analyses used to deepen our understanding of existing evolutionary algorithms. Here, we investigate whether phylogenetic analyses can be used at runtime to augment parent selection procedures during an evolutionary search. Specifically, we propose phylogeny-informed fitness estimation, which exploits a population's phylogeny to estimate fitness evaluations. We evaluate phylogeny-informed fitness estimation in the context of the down-sampled lexicase and cohort lexicase selection algorithms on two diagnostic analyses and four genetic programming (GP) problems. Our results indicate that phylogeny-informed fitness estimation can mitigate the drawbacks of down-sampled lexicase, improving diversity maintenance and search space exploration. However, the extent to which phylogeny-informed fitness estimation improves problem-solving success for GP varies by problem, subsampling method, and subsampling level. This work serves as an initial step toward improving evolutionary algorithms by exploiting runtime phylogenetic analysis.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Chrowned by an Extension: Abusing the Chrome DevTools Protocol through the Debugger API
Authors:
José Miguel Moreno,
Narseo Vallina-Rodriguez,
Juan Tapiador
Abstract:
The Chromium open-source project has become a fundamental piece of the Web as we know it today, with multiple vendors offering browsers based on its codebase. One of its most popular features is the possibility of altering or enhancing the browser functionality through third-party programs known as browser extensions. Extensions have access to a wide range of capabilities through the use of APIs e…
▽ More
The Chromium open-source project has become a fundamental piece of the Web as we know it today, with multiple vendors offering browsers based on its codebase. One of its most popular features is the possibility of altering or enhancing the browser functionality through third-party programs known as browser extensions. Extensions have access to a wide range of capabilities through the use of APIs exposed by Chromium. The Debugger API -- arguably the most powerful of such APIs -- allows extensions to use the Chrome DevTools Protocol (CDP), a capability-rich tool for debugging and instrumenting the browser. In this paper, we describe several vulnerabilities present in the Debugger API and in the granting of capabilities to extensions that can be used by an attacker to take control of the browser, escalate privileges, and break context isolation. We demonstrate their impact by introducing six attacks that allow an attacker to steal user information, monitor network traffic, modify site permissions (\eg access to camera or microphone), bypass security interstitials without user intervention, and change the browser settings. Our attacks work in all major Chromium-based browsers as they are rooted at the core of the Chromium project. We reported our findings to the Chromium Development Team, who already fixed some of them and are currently working on fixing the remaining ones. We conclude by discussing how questionable design decisions, lack of public specifications, and an overpowered Debugger API have contributed to enabling these attacks, and propose mitigations.
△ Less
Submitted 31 May, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Reimagining Demand-Side Management with Mean Field Learning
Authors:
Bianca Marin Moreno,
Margaux Brégère,
Pierre Gaillard,
Nadia Oudjane
Abstract:
Integrating renewable energy into the power grid while balancing supply and demand is a complex issue, given its intermittent nature. Demand side management (DSM) offers solutions to this challenge. We propose a new method for DSM, in particular the problem of controlling a large population of electrical devices to follow a desired consumption signal. We model it as a finite horizon Markovian mean…
▽ More
Integrating renewable energy into the power grid while balancing supply and demand is a complex issue, given its intermittent nature. Demand side management (DSM) offers solutions to this challenge. We propose a new method for DSM, in particular the problem of controlling a large population of electrical devices to follow a desired consumption signal. We model it as a finite horizon Markovian mean field control problem. We develop a new algorithm, MD-MFC, which provides theoretical guarantees for convex and Lipschitz objective functions. What distinguishes MD-MFC from the existing load control literature is its effectiveness in directly solving the target tracking problem without resorting to regularization techniques on the main problem. A non-standard Bregman divergence on a mirror descent scheme allows dynamic programming to be used to obtain simple closed-form solutions. In addition, we show that general mean-field game algorithms can be applied to this problem, which expands the possibilities for addressing load control problems. We illustrate our claims with experiments on a realistic data set.
△ Less
Submitted 25 May, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
Reviewing War: Unconventional User Reviews as a Side Channel to Circumvent Information Controls
Authors:
José Miguel Moreno,
Sergio Pastrana,
Jens Helge Reelfs,
Pelayo Vallina,
Andriy Panchenko,
Georgios Smaragdakis,
Oliver Hohlfeld,
Narseo Vallina-Rodriguez,
Juan Tapiador
Abstract:
During the first days of the 2022 Russian invasion of Ukraine, Russia's media regulator blocked access to many global social media platforms and news sites, including Twitter, Facebook, and the BBC. To bypass the information controls set by Russian authorities, pro-Ukrainian groups explored unconventional ways to reach out to the Russian population, such as posting war-related content in the user…
▽ More
During the first days of the 2022 Russian invasion of Ukraine, Russia's media regulator blocked access to many global social media platforms and news sites, including Twitter, Facebook, and the BBC. To bypass the information controls set by Russian authorities, pro-Ukrainian groups explored unconventional ways to reach out to the Russian population, such as posting war-related content in the user reviews of Russian business available on Google Maps or Tripadvisor. This paper provides a first analysis of this new phenomenon by analyzing the creative strategies to avoid state censorship. Specifically, we analyze reviews posted on these platforms from the beginning of the conflict to September 2022. We measure the channeling of war messages through user reviews in Tripadvisor and Google Maps, as well as in VK, a popular Russian social network. Our analysis of the content posted on these services reveals that users leveraged these platforms to seek and exchange humanitarian and travel advice, but also to disseminate disinformation and polarized messages. Finally, we analyze the response of platforms in terms of content moderation and their impact.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Augmenting a Physics-Informed Neural Network for the 2D Burgers Equation by Addition of Solution Data Points
Authors:
Marlon Sproesser Mathias,
Wesley Pereira de Almeida,
Marcel Rodrigues de Barros,
Jefferson Fialho Coelho,
Lucas Palmiro de Freitas,
Felipe Marino Moreno,
Caio Fabricio Deberaldini Netto,
Fabio Gagliardi Cozman,
Anna Helena Reali Costa,
Eduardo Aoun Tannuri,
Edson Satoshi Gomi,
Marcelo Dottori
Abstract:
We implement a Physics-Informed Neural Network (PINN) for solving the two-dimensional Burgers equations. This type of model can be trained with no previous knowledge of the solution; instead, it relies on evaluating the governing equations of the system in points of the physical domain. It is also possible to use points with a known solution during training. In this paper, we compare PINNs trained…
▽ More
We implement a Physics-Informed Neural Network (PINN) for solving the two-dimensional Burgers equations. This type of model can be trained with no previous knowledge of the solution; instead, it relies on evaluating the governing equations of the system in points of the physical domain. It is also possible to use points with a known solution during training. In this paper, we compare PINNs trained with different amounts of governing equation evaluation points and known solution points. Comparing models that were trained purely with known solution points to those that have also used the governing equations, we observe an improvement in the overall observance of the underlying physics in the latter. We also investigate how changing the number of each type of point affects the resulting models differently. Finally, we argue that the addition of the governing equations during training may provide a way to improve the overall performance of the model without relying on additional data, which is especially important for situations where the number of known solution points is limited.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
A Physics-Informed Neural Network to Model Port Channels
Authors:
Marlon S. Mathias,
Marcel R. de Barros,
Jefferson F. Coelho,
Lucas P. de Freitas,
Felipe M. Moreno,
Caio F. D. Netto,
Fabio G. Cozman,
Anna H. R. Costa,
Eduardo A. Tannuri,
Edson S. Gomi,
Marcelo Dottori
Abstract:
We describe a Physics-Informed Neural Network (PINN) that simulates the flow induced by the astronomical tide in a synthetic port channel, with dimensions based on the Santos - São Vicente - Bertioga Estuarine System. PINN models aim to combine the knowledge of physical systems and data-driven machine learning models. This is done by training a neural network to minimize the residuals of the gover…
▽ More
We describe a Physics-Informed Neural Network (PINN) that simulates the flow induced by the astronomical tide in a synthetic port channel, with dimensions based on the Santos - São Vicente - Bertioga Estuarine System. PINN models aim to combine the knowledge of physical systems and data-driven machine learning models. This is done by training a neural network to minimize the residuals of the governing equations in sample points. In this work, our flow is governed by the Navier-Stokes equations with some approximations. There are two main novelties in this paper. First, we design our model to assume that the flow is periodic in time, which is not feasible in conventional simulation methods. Second, we evaluate the benefit of resampling the function evaluation points during training, which has a near zero computational cost and has been verified to improve the final model, especially for small batch sizes. Finally, we discuss some limitations of the approximations used in the Navier-Stokes equations regarding the modeling of turbulence and how it interacts with PINNs.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Best-Effort Communication Improves Performance and Scales Robustly on Conventional Hardware
Authors:
Matthew Andres Moreno,
Charles Ofria
Abstract:
Here, we test the performance and scalability of fully-asynchronous, best-effort communication on existing, commercially-available HPC hardware.
A first set of experiments tested whether best-effort communication strategies can benefit performance compared to the traditional perfect communication model. At high CPU counts, best-effort communication improved both the number of computational steps…
▽ More
Here, we test the performance and scalability of fully-asynchronous, best-effort communication on existing, commercially-available HPC hardware.
A first set of experiments tested whether best-effort communication strategies can benefit performance compared to the traditional perfect communication model. At high CPU counts, best-effort communication improved both the number of computational steps executed per unit time and the solution quality achieved within a fixed-duration run window.
Under the best-effort model, characterizing the distribution of quality of service across processing components and over time is critical to understanding the actual computation being performed. Additionally, a complete picture of scalability under the best-effort model requires analysis of how such quality of service fares at scale. To answer these questions, we designed and measured a suite of quality of service metrics: simulation update period, message latency, message delivery failure rate, and message delivery coagulation. Under a lower communication-intensivity benchmark parameterization, we found that median values for all quality of service metrics were stable when scaling from 64 to 256 process. Under maximal communication intensivity, we found only minor -- and, in most cases, nil -- degradation in median quality of service.
In an additional set of experiments, we tested the effect of an apparently faulty compute node on performance and quality of service. Despite extreme quality of service degradation among that node and its clique, median performance and quality of service remained stable.
△ Less
Submitted 6 October, 2023; v1 submitted 20 November, 2022;
originally announced November 2022.
-
Enhancing Oceanic Variables Forecast in the Santos Channel by Estimating Model Error with Random Forests
Authors:
Felipe M. Moreno,
Caio F. D. Netto,
Marcel R. de Barros,
Jefferson F. Coelho,
Lucas P. de Freitas,
Marlon S. Mathias,
Luiz A. Schiaveto Neto,
Marcelo Dottori,
Fabio G. Cozman,
Anna H. R. Costa,
Edson S. Gomi,
Eduardo A. Tannuri
Abstract:
In this work we improve forecasting of Sea Surface Height (SSH) and current velocity (speed and direction) in oceanic scenarios. We do so by resorting to Random Forests so as to predict the error of a numerical forecasting system developed for the Santos Channel in Brazil. We have used the Santos Operational Forecasting System (SOFS) and data collected in situ between the years of 2019 and 2021. I…
▽ More
In this work we improve forecasting of Sea Surface Height (SSH) and current velocity (speed and direction) in oceanic scenarios. We do so by resorting to Random Forests so as to predict the error of a numerical forecasting system developed for the Santos Channel in Brazil. We have used the Santos Operational Forecasting System (SOFS) and data collected in situ between the years of 2019 and 2021. In previous studies we have applied similar methods for current velocity in the channel entrance, in this work we expand the application to improve the SHH forecast and include four other stations in the channel. We have obtained an average reduction of 11.9% in forecasting Root-Mean Square Error (RMSE) and 38.7% in bias with our approach. We also obtained an increase of Agreement (IOA) in 10 of the 14 combinations of forecasted variables and stations.
△ Less
Submitted 22 July, 2022;
originally announced August 2022.
-
Softening online extremes organically and at scale
Authors:
Elvira Maria Restrepo,
Martin Moreno,
Lucia Illari,
Neil F. Johnson
Abstract:
Calls are escalating for social media platforms to do more to mitigate extreme online communities whose views can lead to real-world harms, e.g., mis/disinformation and distrust that increased Covid-19 fatalities, and now extend to monkeypox, unsafe baby formula alternatives, cancer, abortions, and climate change; white replacement that inspired the 2022 Buffalo shooter and will likely inspire oth…
▽ More
Calls are escalating for social media platforms to do more to mitigate extreme online communities whose views can lead to real-world harms, e.g., mis/disinformation and distrust that increased Covid-19 fatalities, and now extend to monkeypox, unsafe baby formula alternatives, cancer, abortions, and climate change; white replacement that inspired the 2022 Buffalo shooter and will likely inspire others; anger that threatens elections, e.g., 2021 U.S. Capitol attack; notions of male supremacy that encourage abuse of women; anti-Semitism, anti-LGBQT hate and QAnon conspiracies. But should 'doing more' mean doing more of the same, or something different? If so, what? Here we start by showing why platforms doing more of the same will not solve the problem. Specifically, our analysis of nearly 100 million Facebook users entangled over vaccines and now Covid and beyond, shows that the extreme communities' ecology has a hidden resilience to Facebook's removal interventions; that Facebook's messaging interventions are missing key audience sectors and getting ridiculed; that a key piece of these online extremes' narratives is being mislabeled as incorrect science; and that the threat of censorship is inciting the creation of parallel presences on other platforms with potentially broader audiences. We then demonstrate empirically a new solution that can soften online extremes organically without having to censor or remove communities or their content, or check or correct facts, or promote any preventative messaging, or seek a consensus. This solution can be automated at scale across social media platforms quickly and with minimal cost.
△ Less
Submitted 29 May, 2022;
originally announced July 2022.
-
Modeling Oceanic Variables with Dynamic Graph Neural Networks
Authors:
Caio F. D. Netto,
Marcel R. de Barros,
Jefferson F. Coelho,
Lucas P. de Freitas,
Felipe M. Moreno,
Marlon S. Mathias,
Marcelo Dottori,
Fábio G. Cozman,
Anna H. R. Costa,
Edson S. Gomi,
Eduardo A. Tannuri
Abstract:
Researchers typically resort to numerical methods to understand and predict ocean dynamics, a key task in mastering environmental phenomena. Such methods may not be suitable in scenarios where the topographic map is complex, knowledge about the underlying processes is incomplete, or the application is time critical. On the other hand, if ocean dynamics are observed, they can be exploited by recent…
▽ More
Researchers typically resort to numerical methods to understand and predict ocean dynamics, a key task in mastering environmental phenomena. Such methods may not be suitable in scenarios where the topographic map is complex, knowledge about the underlying processes is incomplete, or the application is time critical. On the other hand, if ocean dynamics are observed, they can be exploited by recent machine learning methods. In this paper we describe a data-driven method to predict environmental variables such as current velocity and sea surface height in the region of Santos-Sao Vicente-Bertioga Estuarine System in the southeastern coast of Brazil. Our model exploits both temporal and spatial inductive biases by joining state-of-the-art sequence models (LSTM and Transformers) and relational models (Graph Neural Networks) in an end-to-end framework that learns both the temporal features and the spatial relationship shared among observation sites. We compare our results with the Santos Operational Forecasting System (SOFS). Experiments show that better results are attained by our model, while maintaining flexibility and little domain knowledge dependency.
△ Less
Submitted 25 June, 2022;
originally announced June 2022.
-
Agile-CMMI Alignment: CMMI V2.0 Contributions and To-dos for Organizations
Authors:
Valeria Henriquez,
Ana M. Moreno,
Jose A. Calvo-Manzano,
Tomas San Feliu
Abstract:
CMMI and Agile can work together. Over 80% of CMMI appraisals in 2018 were conducted at agile organizations, even though pre-2018 CMMI versions do not provide guidelines for agile contexts. A number of experience reports and research studies address the alignment between the two approaches but also pinpoint open tactical and organizational challenges. CMMI V2.0, published in 2018, was designed to…
▽ More
CMMI and Agile can work together. Over 80% of CMMI appraisals in 2018 were conducted at agile organizations, even though pre-2018 CMMI versions do not provide guidelines for agile contexts. A number of experience reports and research studies address the alignment between the two approaches but also pinpoint open tactical and organizational challenges. CMMI V2.0, published in 2018, was designed to be understandable, accessible, and flexible. It was intended to be integrated with other methodologies such as Agile. In this paper, we discuss to what extent the new CMMI V2.0 addresses the existing Agile-CMMI alignment challenges. We identify the two most significant CMMI V2.0 artifacts for this aim, the context-specific sections provided for most of the practice areas, and the value statements linked to the practices. We analyze how they contribute to each of the existing challenges and highlight important issues that organizations still need to tackle regarding this alignment.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
Matchmaker, Matchmaker, Make Me a Match: Geometric, Variational, and Evolutionary Implications of Criteria for Tag Affinity
Authors:
Matthew Andres Moreno,
Alexander Lale**i,
Charles Ofria
Abstract:
Genetic programming and artificial life systems commonly employ tag-matching schemes to determine interactions between model components. However, the implications of criteria used to determine affinity between tags with respect to constraints on emergent connectivity, canalization of changes to connectivity under mutation, and evolutionary dynamics have not been considered. We highlight difference…
▽ More
Genetic programming and artificial life systems commonly employ tag-matching schemes to determine interactions between model components. However, the implications of criteria used to determine affinity between tags with respect to constraints on emergent connectivity, canalization of changes to connectivity under mutation, and evolutionary dynamics have not been considered. We highlight differences between tag-matching criteria with respect to geometric constraint and variation generated under mutation. We find that tag-matching criteria can influence the rate of adaptive evolution and the quality of evolved solutions. Better understanding of the geometric, variational, and evolutionary properties of tag-matching criteria will facilitate more effective incorporation of tag matching into genetic programming and artificial life systems. By showing that tag-matching criteria influence connectivity patterns and evolutionary dynamics, our findings also raise fundamental questions about the properties of tag-matching systems in nature.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
SignalGP-Lite: Event Driven Genetic Programming Library for Large-Scale Artificial Life Applications
Authors:
Matthew Andres Moreno,
Santiago Rodriguez Papa,
Alexander Lale**i,
Charles Ofria
Abstract:
Event-driven genetic programming representations have been shown to outperform traditional imperative representations on interaction-intensive problems. The event-driven approach organizes genome content into modules that are triggered in response to environmental signals, simplifying simulation design and implementation. Existing work develo** event-driven genetic programming methodology has la…
▽ More
Event-driven genetic programming representations have been shown to outperform traditional imperative representations on interaction-intensive problems. The event-driven approach organizes genome content into modules that are triggered in response to environmental signals, simplifying simulation design and implementation. Existing work develo** event-driven genetic programming methodology has largely used the SignalGP library, which caters to traditional program synthesis applications. The SignalGP-Lite library enables larger-scale artificial life experiments with streamlined agents by reducing control flow overhead and trading run-time flexibility for better performance due to compile-time configuration. Here, we report benchmarking experiments that show an 8x to 30x speedup. We also report solution quality equivalent to SignalGP on two benchmark problems originally developed to test the ability of evolved programs to respond to a large number of signals and to modulate signal response based on context.
△ Less
Submitted 1 August, 2021;
originally announced August 2021.
-
Conduit: A C++ Library for Best-effort High Performance Computing
Authors:
Matthew Andres Moreno,
Santiago Rodriguez Papa,
Charles Ofria
Abstract:
Develo** software to effectively take advantage of growth in parallel and distributed processing capacity poses significant challenges. Traditional programming techniques allow a user to assume that execution, message passing, and memory are always kept synchronized. However, maintaining this consistency becomes increasingly costly at scale. One proposed strategy is "best-effort computing", whic…
▽ More
Develo** software to effectively take advantage of growth in parallel and distributed processing capacity poses significant challenges. Traditional programming techniques allow a user to assume that execution, message passing, and memory are always kept synchronized. However, maintaining this consistency becomes increasingly costly at scale. One proposed strategy is "best-effort computing", which relaxes synchronization and hardware reliability requirements, accepting nondeterminism in exchange for efficiency. Although many programming languages and frameworks aim to facilitate software development for high performance applications, existing tools do not directly provide a prepackaged best-effort interface. The Conduit C++ Library aims to provide such an interface for convenient implementation of software that uses best-effort inter-thread and inter-process communication. Here, we describe the motivation, objectives, design, and implementation of the library. Benchmarks on a communication-intensive graph coloring problem and a compute-intensive digital evolution simulation show that Conduit's best-effort model can improve scaling efficiency and solution quality, particularly in a distributed, multi-node context.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Exploring Evolved Multicellular Life Histories in a Open-Ended Digital Evolution System
Authors:
Matthew Andres Moreno,
Charles Ofria
Abstract:
Evolutionary transitions occur when previously-independent replicating entities unite to form more complex individuals. Such transitions have profoundly shaped natural evolutionary history and occur in two forms: fraternal transitions involve lower-level entities that are kin (e.g., transitions to multicellularity or to eusocial colonies), while egalitarian transitions involve unrelated individual…
▽ More
Evolutionary transitions occur when previously-independent replicating entities unite to form more complex individuals. Such transitions have profoundly shaped natural evolutionary history and occur in two forms: fraternal transitions involve lower-level entities that are kin (e.g., transitions to multicellularity or to eusocial colonies), while egalitarian transitions involve unrelated individuals (e.g., the origins of mitochondria). The necessary conditions and evolutionary mechanisms for these transitions to arise continue to be fruitful targets of scientific interest. Here, we examine a range of fraternal transitions in populations of open-ended self-replicating computer programs. These digital cells were allowed to form and replicate kin groups by selectively adjoining or expelling daughter cells. The capability to recognize kin-group membership enabled preferential communication and cooperation between cells. We repeatedly observed group-level traits that are characteristic of a fraternal transition. These included reproductive division of labor, resource sharing within kin groups, resource investment in offspring groups, asymmetrical behaviors mediated by messaging, morphological patterning, and adaptive apoptosis. We report eight case studies from replicates where transitions occurred and explore the diverse range of adaptive evolved multicellular strategies.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
Tag-based regulation of modules in genetic programming improves context-dependent problem solving
Authors:
Alexander Lale**i,
Matthew Andres Moreno,
Charles Ofria
Abstract:
We introduce and experimentally demonstrate the utility of tag-based genetic regulation, a new genetic programming (GP) technique that allows programs to dynamically adjust which code modules to express. Tags are evolvable labels that provide a flexible mechanism for referencing code modules. Tag-based genetic regulation extends existing tag-based naming schemes to allow programs to "promote" and…
▽ More
We introduce and experimentally demonstrate the utility of tag-based genetic regulation, a new genetic programming (GP) technique that allows programs to dynamically adjust which code modules to express. Tags are evolvable labels that provide a flexible mechanism for referencing code modules. Tag-based genetic regulation extends existing tag-based naming schemes to allow programs to "promote" and "repress" code modules in order to alter expression patterns. This extension allows evolution to structure a program as a gene regulatory network where modules are regulated based on instruction executions. We demonstrate the functionality of tag-based regulation on a range of program synthesis problems. We find that tag-based regulation improves problem-solving performance on context-dependent problems; that is, problems where programs must adjust how they respond to current inputs based on prior inputs. Indeed, the system could not evolve solutions to some context-dependent problems until regulation was added. Our implementation of tag-based genetic regulation is not universally beneficial, however. We identify scenarios where the correct response to a particular input never changes, rendering tag-based regulation an unneeded functionality that can sometimes impede adaptive evolution. Tag-based genetic regulation broadens our repertoire of techniques for evolving more dynamic genetic programs and can easily be incorporated into existing tag-enabled GP systems.
△ Less
Submitted 9 July, 2021; v1 submitted 16 December, 2020;
originally announced December 2020.
-
Workflow Provenance in the Lifecycle of Scientific Machine Learning
Authors:
Renan Souza,
Leonardo G. Azevedo,
Vítor Lourenço,
Elton Soares,
Raphael Thiago,
Rafael Brandão,
Daniel Civitarese,
Emilio Vital Brazil,
Marcio Moreno,
Patrick Valduriez,
Marta Mattoso,
Renato Cerqueira,
Marco A. S. Netto
Abstract:
Machine Learning (ML) has already fundamentally changed several businesses. More recently, it has also been profoundly impacting the computational science and engineering domains, like geoscience, climate science, and health science. In these domains, users need to perform comprehensive data analyses combining scientific data and ML models to provide for critical requirements, such as reproducibil…
▽ More
Machine Learning (ML) has already fundamentally changed several businesses. More recently, it has also been profoundly impacting the computational science and engineering domains, like geoscience, climate science, and health science. In these domains, users need to perform comprehensive data analyses combining scientific data and ML models to provide for critical requirements, such as reproducibility, model explainability, and experiment data understanding. However, scientific ML is multidisciplinary, heterogeneous, and affected by the physical constraints of the domain, making such analyses even more challenging. In this work, we leverage workflow provenance techniques to build a holistic view to support the lifecycle of scientific ML. We contribute with (i) characterization of the lifecycle and taxonomy for data analyses; (ii) design principles to build this view, with a W3C PROV compliant data representation and a reference system architecture; and (iii) lessons learned after an evaluation in an Oil & Gas case using an HPC cluster with 393 nodes and 946 GPUs. The experiments show that the principles enable queries that integrate domain semantics with ML models while kee** low overhead (<1%), high scalability, and an order of magnitude of query acceleration under certain workloads against without our representation.
△ Less
Submitted 25 August, 2021; v1 submitted 30 September, 2020;
originally announced October 2020.
-
Autonomous Driving: Framework for Pedestrian Intention Estimationin a Real World Scenario
Authors:
Walter Morales Alvarez,
Francisco Miguel Moreno,
Oscar Sipele,
Nikita Smirnov,
Cristina Olaverri-Monreal
Abstract:
Rapid advancements in driver-assistance technology will lead to the integration of fully autonomous vehicles on our roads that will interact with other road users. To address the problem that driverless vehicles make interaction through eye contact impossible, we describe a framework for estimating the crossing intentions of pedestrians in order to reduce the uncertainty that the lack of eye conta…
▽ More
Rapid advancements in driver-assistance technology will lead to the integration of fully autonomous vehicles on our roads that will interact with other road users. To address the problem that driverless vehicles make interaction through eye contact impossible, we describe a framework for estimating the crossing intentions of pedestrians in order to reduce the uncertainty that the lack of eye contact between road users creates. The framework was deployed in a real vehicle and tested with three experimental cases that showed a variety of communication messages to pedestrians in a shared space scenario. Results from the performed field tests showed the feasibility of the presented approach.
△ Less
Submitted 22 February, 2021; v1 submitted 4 June, 2020;
originally announced June 2020.
-
Managing Data Lineage of O&G Machine Learning Models: The Sweet Spot for Shale Use Case
Authors:
Raphael Thiago,
Renan Souza,
L. Azevedo,
E. Soares,
Rodrigo Santos,
Wallas Santos,
Max De Bayser,
M. Cardoso,
M. Moreno,
Renato Cerqueira
Abstract:
Machine Learning (ML) has increased its role, becoming essential in several industries. However, questions around training data lineage, such as "where has the dataset used to train this model come from?"; the introduction of several new data protection legislation; and, the need for data governance requirements, have hindered the adoption of ML models in the real world. In this paper, we discuss…
▽ More
Machine Learning (ML) has increased its role, becoming essential in several industries. However, questions around training data lineage, such as "where has the dataset used to train this model come from?"; the introduction of several new data protection legislation; and, the need for data governance requirements, have hindered the adoption of ML models in the real world. In this paper, we discuss how data lineage can be leveraged to benefit the ML lifecycle to build ML models to discover sweet-spots for shale oil and gas production, a major application in the Oil and Gas O&G Industry.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Effective Integration of Symbolic and Connectionist Approaches through a Hybrid Representation
Authors:
Marcio Moreno,
Daniel Civitarese,
Rafael Brandao,
Renato Cerqueira
Abstract:
In this paper, we present our position for a neuralsymbolic integration strategy, arguing in favor of a hybrid representation to promote an effective integration. Such description differs from others fundamentally, since its entities aim at representing AI models in general, allowing to describe both nonsymbolic and symbolic knowledge, the integration between them and their corresponding processor…
▽ More
In this paper, we present our position for a neuralsymbolic integration strategy, arguing in favor of a hybrid representation to promote an effective integration. Such description differs from others fundamentally, since its entities aim at representing AI models in general, allowing to describe both nonsymbolic and symbolic knowledge, the integration between them and their corresponding processors. Moreover, the entities also support representing workflows, leveraging traceability to keep track of every change applied to models and their related entities (e.g., data or concepts) throughout the lifecycle of the models.
△ Less
Submitted 18 December, 2019;
originally announced December 2019.
-
Managing Machine Learning Workflow Components
Authors:
Marcio Moreno,
Vítor Lourenço,
Sandro Rama Fiorini,
Polyana Costa,
Rafael Brandão,
Daniel Civitarese,
Renato Cerqueira
Abstract:
Machine Learning Workflows (MLWfs) have become essential and a disruptive approach in problem-solving over several industries. However, the development process of MLWfs may be complicated, hard to achieve, time-consuming, and error-prone. To handle this problem, in this paper, we introduce machine learning workflow management (MLWfM) as a technique to aid the development and reuse of MLWfs and the…
▽ More
Machine Learning Workflows (MLWfs) have become essential and a disruptive approach in problem-solving over several industries. However, the development process of MLWfs may be complicated, hard to achieve, time-consuming, and error-prone. To handle this problem, in this paper, we introduce machine learning workflow management (MLWfM) as a technique to aid the development and reuse of MLWfs and their components through three aspects: representation, execution, and creation. More precisely, we discuss our approach to structure the MLWfs' components and their metadata to aid retrieval and reuse of components in new MLWfs. Also, we consider the execution of these components within a tool. The hybrid knowledge representation, called Hyperknowledge, frames our methodology, supporting the three MLWfM's aspects. To validate our approach, we show a practical use case in the Oil & Gas industry.
△ Less
Submitted 25 September, 2020; v1 submitted 9 December, 2019;
originally announced December 2019.
-
Bridging the Gap between Semantics and Multimedia Processing
Authors:
Marcio Ferreira Moreno,
Guilherme Lima,
Rodrigo Costa Mesquita Santos,
Roberto Azevedo,
Markus Endler
Abstract:
In this paper, we give an overview of the semantic gap problem in multimedia and discuss how machine learning and symbolic AI can be combined to narrow this gap. We describe the gap in terms of a classical architecture for multimedia processing and discuss a structured approach to bridge it. This approach combines machine learning (for map** signals to objects) and symbolic AI (for linking objec…
▽ More
In this paper, we give an overview of the semantic gap problem in multimedia and discuss how machine learning and symbolic AI can be combined to narrow this gap. We describe the gap in terms of a classical architecture for multimedia processing and discuss a structured approach to bridge it. This approach combines machine learning (for map** signals to objects) and symbolic AI (for linking objects to meanings). Our main goal is to raise awareness and discuss the challenges involved in this structured approach to multimedia understanding, especially in the view of the latest developments in machine learning and symbolic AI.
△ Less
Submitted 2 December, 2019; v1 submitted 25 November, 2019;
originally announced November 2019.
-
An Introduction to Symbolic Artificial Intelligence Applied to Multimedia
Authors:
Guilherme Lima,
Rodrigo Costa,
Marcio Ferreira Moreno
Abstract:
In this chapter, we give an introduction to symbolic artificial intelligence (AI) and discuss its relation and application to multimedia. We begin by defining what symbolic AI is, what distinguishes it from non-symbolic approaches, such as machine learning, and how it can used in the construction of advanced multimedia applications. We then introduce description logic (DL) and use it to discuss sy…
▽ More
In this chapter, we give an introduction to symbolic artificial intelligence (AI) and discuss its relation and application to multimedia. We begin by defining what symbolic AI is, what distinguishes it from non-symbolic approaches, such as machine learning, and how it can used in the construction of advanced multimedia applications. We then introduce description logic (DL) and use it to discuss symbolic representation and reasoning. DL is the logical underpinning of OWL, the most successful family of ontology languages. After discussing DL, we present OWL and related Semantic Web technologies, such as RDF and SPARQL. We conclude the chapter by discussing a hybrid model for multimedia representation, called Hyperknowledge. Throughout the text, we make references to technologies and extensions specifically designed to solve the kinds of problems that arise in multimedia representation.
△ Less
Submitted 28 November, 2019; v1 submitted 21 November, 2019;
originally announced November 2019.
-
Multimedia Search and Temporal Reasoning
Authors:
Marcio Ferreira Moreno,
Rodrigo Costa Mesquita Santos,
Wallas Henrique Sousa dos Santos,
Sandro Rama Fiorini,
Reinaldo Mozart da Gama Silva
Abstract:
Properly modelling dynamic information that changes over time still is an open issue. Most modern knowledge bases are unable to represent relationships that are valid only during a given time interval. In this work, we revisit a previous extension to the hyperknowledge framework to deal with temporal facts and propose a temporal query language and engine. We validate our proposal by discussing a q…
▽ More
Properly modelling dynamic information that changes over time still is an open issue. Most modern knowledge bases are unable to represent relationships that are valid only during a given time interval. In this work, we revisit a previous extension to the hyperknowledge framework to deal with temporal facts and propose a temporal query language and engine. We validate our proposal by discussing a qualitative analysis of the modelling of a real-world use case in the Oil & Gas industry.
△ Less
Submitted 19 November, 2019;
originally announced November 2019.
-
Provenance Data in the Machine Learning Lifecycle in Computational Science and Engineering
Authors:
Renan Souza,
Leonardo Azevedo,
Vítor Lourenço,
Elton Soares,
Raphael Thiago,
Rafael Brandão,
Daniel Civitarese,
Emilio Vital Brazil,
Marcio Moreno,
Patrick Valduriez,
Marta Mattoso,
Renato Cerqueira,
Marco A. S. Netto
Abstract:
Machine Learning (ML) has become essential in several industries. In Computational Science and Engineering (CSE), the complexity of the ML lifecycle comes from the large variety of data, scientists' expertise, tools, and workflows. If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to explain to stakeholders how it was created. The m…
▽ More
Machine Learning (ML) has become essential in several industries. In Computational Science and Engineering (CSE), the complexity of the ML lifecycle comes from the large variety of data, scientists' expertise, tools, and workflows. If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to explain to stakeholders how it was created. The main limitation of provenance tracking solutions is that they cannot cope with provenance capture and integration of domain and ML data processed in the multiple workflows in the lifecycle while kee** the provenance capture overhead low. To handle this problem, in this paper we contribute with a detailed characterization of provenance data in the ML lifecycle in CSE; a new provenance data representation, called PROV-ML, built on top of W3C PROV and ML Schema; and extensions to a system that tracks provenance from multiple workflows to address the characteristics of ML and CSE, and to allow for provenance queries with a standard vocabulary. We show a practical use in a real case in the Oil and Gas industry, along with its evaluation using 48 GPUs in parallel.
△ Less
Submitted 21 October, 2019; v1 submitted 9 October, 2019;
originally announced October 2019.
-
General Fragment Model for Information Artifacts
Authors:
Sandro Rama Fiorini,
Wallas Sousa dos Santos,
Rodrigo Costa Mesquita,
Guilherme Ferreira Lima,
Marcio F. Moreno
Abstract:
The use of semantic descriptions in data intensive domains require a systematic model for linking semantic descriptions with their manifestations in fragments of heterogeneous information and data objects. Such information heterogeneity requires a fragment model that is general enough to support the specification of anchors from conceptual models to multiple types of information artifacts. While d…
▽ More
The use of semantic descriptions in data intensive domains require a systematic model for linking semantic descriptions with their manifestations in fragments of heterogeneous information and data objects. Such information heterogeneity requires a fragment model that is general enough to support the specification of anchors from conceptual models to multiple types of information artifacts. While diverse proposals of anchoring models exist in the literature, they are usually focused in audiovisual information. We propose a generalized fragment model that can be instantiated to different kinds of information artifacts. Our objective is to systematize the way in which fragments and anchors can be described in conceptual models, without committing to a specific vocabulary.
△ Less
Submitted 9 September, 2019;
originally announced September 2019.
-
BirdNet: a 3D Object Detection Framework from LiDAR information
Authors:
Jorge Beltran,
Carlos Guindel,
Francisco Miguel Moreno,
Daniel Cruzado,
Fernando Garcia,
Arturo de la Escalera
Abstract:
Understanding driving situations regardless the conditions of the traffic scene is a cornerstone on the path towards autonomous vehicles; however, despite common sensor setups already include complementary devices such as LiDAR or radar, most of the research on perception systems has traditionally focused on computer vision. We present a LiDAR-based 3D object detection pipeline entailing three sta…
▽ More
Understanding driving situations regardless the conditions of the traffic scene is a cornerstone on the path towards autonomous vehicles; however, despite common sensor setups already include complementary devices such as LiDAR or radar, most of the research on perception systems has traditionally focused on computer vision. We present a LiDAR-based 3D object detection pipeline entailing three stages. First, laser information is projected into a novel cell encoding for bird's eye view projection. Later, both object location on the plane and its heading are estimated through a convolutional neural network originally designed for image processing. Finally, 3D oriented detections are computed in a post-processing phase. Experiments on KITTI dataset show that the proposed framework achieves state-of-the-art results among comparable methods. Further tests with different LiDAR sensors in real scenarios assess the multi-device capabilities of the approach.
△ Less
Submitted 3 May, 2018;
originally announced May 2018.
-
Evaluating Accessible Synchronous CMC Applications
Authors:
Fernando G. Lobo,
Marielba Zacarias,
Paulo A. Condado,
Teresa Romão,
Rui Godinho,
Manuel Moreno
Abstract:
This paper proposes a more comprehensive evaluation methodology to measure the usability and user experience qualities of accessible synchronous computer-mediated communication applications. The methodology goes beyond current practices by evaluating how the interaction between a user and a product influences the user experience of those at the other endpoint of the communication channel. A major…
▽ More
This paper proposes a more comprehensive evaluation methodology to measure the usability and user experience qualities of accessible synchronous computer-mediated communication applications. The methodology goes beyond current practices by evaluating how the interaction between a user and a product influences the user experience of those at the other endpoint of the communication channel. A major contribution is given with the proposal of a user test where one of the participants tries to guess whether the other participant has a disability or not. The proposed test is inspired in the Turing Test, and is a consequence of user requirements elicited from a group of individuals with motor and speech disabilities. These ideas are tested and validated with two examples of synchronous communication applications.
△ Less
Submitted 16 February, 2011; v1 submitted 7 May, 2010;
originally announced May 2010.
-
Improving the Performance of PieceWise Linear Separation Incremental Algorithms for Practical Hardware Implementations
Authors:
Alejandro Chinea Manrique De Lara,
Juan Manuel Moreno,
Arostegui Jordi Madrenas,
Joan Cabestany
Abstract:
In this paper we shall review the common problems associated with Piecewise Linear Separation incremental algorithms. This kind of neural models yield poor performances when dealing with some classification problems, due to the evolving schemes used to construct the resulting networks. So as to avoid this undesirable behavior we shall propose a modification criterion. It is based upon the defini…
▽ More
In this paper we shall review the common problems associated with Piecewise Linear Separation incremental algorithms. This kind of neural models yield poor performances when dealing with some classification problems, due to the evolving schemes used to construct the resulting networks. So as to avoid this undesirable behavior we shall propose a modification criterion. It is based upon the definition of a function which will provide information about the quality of the network growth process during the learning phase. This function is evaluated periodically as the network structure evolves, and will permit, as we shall show through exhaustive benchmarks, to considerably improve the performance(measured in terms of network complexity and generalization capabilities) offered by the networks generated by these incremental models.
△ Less
Submitted 21 December, 2007;
originally announced December 2007.