Search | arXiv e-print repository

Diffusion posterior sampling for simulation-based inference in tall data settings

Authors: Julia Linhart, Gabriel Victorino Cardoso, Alexandre Gramfort, Sylvain Le Corff, Pedro L. C. Rodrigues

Abstract: Determining which parameters of a non-linear model best describe a set of experimental data is a fundamental problem in science and it has gained much traction lately with the rise of complex large-scale simulators. The likelihood of such models is typically intractable, which is why classical MCMC methods can not be used. Simulation-based inference (SBI) stands out in this context by only requiri… ▽ More Determining which parameters of a non-linear model best describe a set of experimental data is a fundamental problem in science and it has gained much traction lately with the rise of complex large-scale simulators. The likelihood of such models is typically intractable, which is why classical MCMC methods can not be used. Simulation-based inference (SBI) stands out in this context by only requiring a dataset of simulations to train deep generative models capable of approximating the posterior distribution that relates input parameters to a given observation. In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model. The proposed method is built upon recent developments from the flourishing score-based diffusion literature and allows to estimate the tall data posterior distribution, while simply using information from a score network trained for a single context observation. We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost. △ Less

Submitted 7 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: 49 pages, 24 figures, 3 tables, 2 algorithms, 12 appendices, in proceedings

arXiv:2403.13831 [pdf]

Dual-sided transparent display

Authors: Suman Halder, Yunho Shin, Yidan Peng, Long Wang, Liye Duan, Paul Schmalenberg, Guangkui Qin, Yuxi Gao, Ercan M. Dede, Deng-Ke Yang, Sean P. Rodrigues

Abstract: In the past decade, display technology has been reimagined to meet the needs of the virtual world. By map** information onto a scene through a transparent display, users can simultaneously visualize both the real world and layers of virtual elements. However, advances in augmented reality (AR) technology have primarily focused on wearable gear or personal devices. Here we present a single displa… ▽ More In the past decade, display technology has been reimagined to meet the needs of the virtual world. By map** information onto a scene through a transparent display, users can simultaneously visualize both the real world and layers of virtual elements. However, advances in augmented reality (AR) technology have primarily focused on wearable gear or personal devices. Here we present a single display capable of delivering visual information to observers positioned on either side of the transparent device. This dual-sided display system employs a polymer stabilized liquid crystal waveguide technology to achieve a transparency window of 65% while offering active-matrix control. An early-stage prototype exhibits full-color information via time-sequential processing of a red-green-blue (RGB) light-emitting diode (LED) strip. The dual-sided display provides a perspective on transparent mediums as display devices for human-centric and service-related experiences that can support both enhanced bi-directional user interactions and new media platforms. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.07454 [pdf, other]

Fast, accurate and lightweight sequential simulation-based inference using Gaussian locally linear map**s

Authors: Henrik Häggström, Pedro L. C. Rodrigues, Geoffroy Oudoumanessah, Florence Forbes, Umberto Picchini

Abstract: Bayesian inference for complex models with an intractable likelihood can be tackled using algorithms performing many calls to computer simulators. These approaches are collectively known as "simulation-based inference" (SBI). Recent SBI methods have made use of neural networks (NN) to provide approximate, yet expressive constructs for the unavailable likelihood function and the posterior distribut… ▽ More Bayesian inference for complex models with an intractable likelihood can be tackled using algorithms performing many calls to computer simulators. These approaches are collectively known as "simulation-based inference" (SBI). Recent SBI methods have made use of neural networks (NN) to provide approximate, yet expressive constructs for the unavailable likelihood function and the posterior distribution. However, the trade-off between accuracy and computational demand leaves much space for improvement. In this work, we propose an alternative that provides both approximations to the likelihood and the posterior distribution, using structured mixtures of probability distributions. Our approach produces accurate posterior inference when compared to state-of-the-art NN-based SBI methods, even for multimodal posteriors, while exhibiting a much smaller computational footprint. We illustrate our results on several benchmark models from the SBI literature and on a biological model of the translation kinetics after mRNA transfection. △ Less

Submitted 22 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: 69 pages, 66 figures: new case study added (Biological model of the translation kinetics after mRNA transfection)

arXiv:2401.02469 [pdf, other]

doi 10.1016/j.teler.2024.100116

Modern Computing: Vision and Challenges

Authors: Sukhpal Singh Gill, Huaming Wu, Panos Patros, Carlo Ottaviani, Priyansh Arora, Victor Casamayor Pujol, David Haunschild, Ajith Kumar Parlikad, Oktay Cetinkaya, Hanan Lutfiyya, Vlado Stankovski, Ruidong Li, Yuemin Ding, Junaid Qadir, Ajith Abraham, Soumya K. Ghosh, Houbing Herbert Song, Rizos Sakellariou, Omer Rana, Joel J. P. C. Rodrigues, Salil S. Kanhere, Schahram Dustdar, Steve Uhlig, Kotagiri Ramamohanarao, Rajkumar Buyya

Abstract: Over the past six decades, the computing systems field has experienced significant transformations, profoundly impacting society with transformational developments, such as the Internet and the commodification of computing. Underpinned by technological advancements, computer systems, far from being static, have been continuously evolving and adapting to cover multifaceted societal niches. This has… ▽ More Over the past six decades, the computing systems field has experienced significant transformations, profoundly impacting society with transformational developments, such as the Internet and the commodification of computing. Underpinned by technological advancements, computer systems, far from being static, have been continuously evolving and adapting to cover multifaceted societal niches. This has led to new paradigms such as cloud, fog, edge computing, and the Internet of Things (IoT), which offer fresh economic and creative opportunities. Nevertheless, this rapid change poses complex research challenges, especially in maximizing potential and enhancing functionality. As such, to maintain an economical level of performance that meets ever-tighter requirements, one must understand the drivers of new model emergence and expansion, and how contemporary challenges differ from past ones. To that end, this article investigates and assesses the factors influencing the evolution of computing systems, covering established systems and architectures as well as newer developments, such as serverless computing, quantum computing, and on-device AI on edge devices. Trends emerge when one traces technological trajectory, which includes the rapid obsolescence of frameworks due to business and technical constraints, a move towards specialized systems and models, and varying approaches to centralized and decentralized control. This comprehensive review of modern computing systems looks ahead to the future of research in the field, highlighting key challenges and emerging trends, and underscoring their importance in cost-effectively driving technological progress. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: Preprint submitted to Telematics and Informatics Reports, Elsevier (2024)

Journal ref: Elsevier Telematics and Informatics Reports, Volume 13, March 2024

arXiv:2311.11895 [pdf]

Controlled Natural Languages for Specifying Business Intelligence Applications

Authors: Pedro das Neves Rodrigues, Alberto Rodrigues da Silva

Abstract: This study examines the use of controlled natural languages (CNLs) to specify business intelligence (BI) application requirements. Two varieties of CNLs, CNL-BI and ITLingo ASL (ASL), were employed. A hypothetical BI application, MEDBuddy-BI, was developed for the National Health Service (NHS) to demonstrate how the languages can be used. MEDBuddy-BI leverages patient data, including interactions… ▽ More This study examines the use of controlled natural languages (CNLs) to specify business intelligence (BI) application requirements. Two varieties of CNLs, CNL-BI and ITLingo ASL (ASL), were employed. A hypothetical BI application, MEDBuddy-BI, was developed for the National Health Service (NHS) to demonstrate how the languages can be used. MEDBuddy-BI leverages patient data, including interactions and appointments, to improve healthcare services. The research outlines the application of CNL-BI and ASL in BI. It details how these languages effectively describe complex data, user interfaces, and various BI application functions. Using the MEDBuddy-BI running example. △ Less

Submitted 21 November, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: 29 pages, 13 figures, 5 tables. New version of the publication to fix a cross reference error to the Appendix section

arXiv:2310.03121 [pdf]

OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials

Authors: Peter Eastman, Raimondas Galvelis, Raúl P. Peláez, Charlles R. A. Abreu, Stephen E. Farr, Emilio Gallicchio, Anton Gorenko, Michael M. Henry, Frank Hu, **g Huang, Andreas Krämer, Julien Michel, Joshua A. Mitchell, Vijay S. Pande, João PGLM Rodrigues, Jaime Rodriguez-Guerra, Andrew C. Simmonett, Sukrit Singh, Jason Swails, Philip Turner, Yuanqing Wang, Ivy Zhang, John D. Chodera, Gianni De Fabritiis, Thomas E. Markland

Abstract: Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general… ▽ More Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general purpose, pretrained potential functions. A collection of optimized CUDA kernels and custom PyTorch operations greatly improves the speed of simulations. We demonstrate these features on simulations of cyclin-dependent kinase 8 (CDK8) and the green fluorescent protein (GFP) chromophore in water. Taken together, these features make it practical to use machine learning to improve the accuracy of simulations at only a modest increase in cost. △ Less

Submitted 29 November, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: 16 pages, 5 figures

ACM Class: J.2; J.3

arXiv:2306.03580 [pdf, other]

L-C2ST: Local Diagnostics for Posterior Approximations in Simulation-Based Inference

Authors: Julia Linhart, Alexandre Gramfort, Pedro L. C. Rodrigues

Abstract: Many recent works in simulation-based inference (SBI) rely on deep generative models to approximate complex, high-dimensional posterior distributions. However, evaluating whether or not these approximations can be trusted remains a challenge. Most approaches evaluate the posterior estimator only in expectation over the observation space. This limits their interpretability and is not sufficient to… ▽ More Many recent works in simulation-based inference (SBI) rely on deep generative models to approximate complex, high-dimensional posterior distributions. However, evaluating whether or not these approximations can be trusted remains a challenge. Most approaches evaluate the posterior estimator only in expectation over the observation space. This limits their interpretability and is not sufficient to identify for which observations the approximation can be trusted or should be improved. Building upon the well-known classifier two-sample test (C2ST), we introduce L-C2ST, a new method that allows for a local evaluation of the posterior estimator at any given observation. It offers theoretically grounded and easy to interpret -- e.g. graphical -- diagnostics, and unlike C2ST, does not require access to samples from the true posterior. In the case of normalizing flow-based posterior estimators, L-C2ST can be specialized to offer better statistical power, while being computationally more efficient. On standard SBI benchmarks, L-C2ST provides comparable results to C2ST and outperforms alternative local approaches such as coverage tests based on highest predictive density (HPD). We further highlight the importance of local evaluation and the benefit of interpretability of L-C2ST on a challenging application from computational neuroscience. △ Less

Submitted 9 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: 27 pages, 6 figures, 3 tables, 6 appendices, NeurIPS 2023

arXiv:2211.09602 [pdf, other]

Validation Diagnostics for SBI algorithms based on Normalizing Flows

Authors: Julia Linhart, Alexandre Gramfort, Pedro L. C. Rodrigues

Abstract: Building on the recent trend of new deep generative models known as Normalizing Flows (NF), simulation-based inference (SBI) algorithms can now efficiently accommodate arbitrary complex and high-dimensional data distributions. The development of appropriate validation methods however has fallen behind. Indeed, most of the existing metrics either require access to the true posterior distribution, o… ▽ More Building on the recent trend of new deep generative models known as Normalizing Flows (NF), simulation-based inference (SBI) algorithms can now efficiently accommodate arbitrary complex and high-dimensional data distributions. The development of appropriate validation methods however has fallen behind. Indeed, most of the existing metrics either require access to the true posterior distribution, or fail to provide theoretical guarantees on the consistency of the inferred approximation beyond the one-dimensional setting. This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF. It also offers theoretical guarantees based on results of local consistency. The proposed workflow can be used to check, analyse and guarantee consistent behavior of the estimator. The method is illustrated with a challenging example that involves tightly coupled parameters in the context of computational neuroscience. This work should help the design of better specified models or drive the development of novel SBI-algorithms, hence allowing to build up trust on their ability to address important questions in experimental science. △ Less

Submitted 24 November, 2022; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: 7 pages, 2 figures, 1 appendix, published at "Machine Learning and the Physical Sciences" workshop (NeurIPS 2022): https://ml4physicalsciences.github.io/2022/

arXiv:2208.02606 [pdf, other]

doi 10.1016/j.jocs.2022.101811

TunaOil: A Tuning Algorithm Strategy for Reservoir Simulation Workloads

Authors: Felipe Albuquerque Portella, David Buchaca Prats, José Roberto Pereira Rodrigues, Josep Lluís Berral

Abstract: Reservoir simulations for petroleum fields and seismic imaging are known as the most demanding workloads for high-performance computing (HPC) in the oil and gas (O&G) industry. The optimization of the simulator numerical parameters plays a vital role as it could save considerable computational efforts. State-of-the-art optimization techniques are based on running numerous simulations, specific for… ▽ More Reservoir simulations for petroleum fields and seismic imaging are known as the most demanding workloads for high-performance computing (HPC) in the oil and gas (O&G) industry. The optimization of the simulator numerical parameters plays a vital role as it could save considerable computational efforts. State-of-the-art optimization techniques are based on running numerous simulations, specific for that purpose, to find good parameter candidates. However, using such an approach is highly costly in terms of time and computing resources. This work presents TunaOil, a new methodology to enhance the search for optimal numerical parameters of reservoir flow simulations using a performance model. In the O&G industry, it is common to use ensembles of models in different workflows to reduce the uncertainty associated with forecasting O&G production. We leverage the runs of those ensembles in such workflows to extract information from each simulation and optimize the numerical parameters in their subsequent runs. To validate the methodology, we implemented it in a history matching (HM) process that uses a Kalman filter algorithm to adjust an ensemble of reservoir models to match the observed data from the real field. We mine past execution logs from many simulations with different numerical configurations and build a machine learning model based on extracted features from the data. These features include properties of the reservoir models themselves, such as the number of active cells, to statistics of the simulation's behavior, such as the number of iterations of the linear solver. A sampling technique is used to query the oracle to find the numerical parameters that can reduce the elapsed time without significantly impacting the quality of the results. Our experiments show that the predictions can improve the overall HM workflow runtime on average by 31%. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: 21 pages, 9 figures. Preprint submitted to Journal of Computational Science

ACM Class: J.2; C.4; I.2

arXiv:2111.08693 [pdf, other]

Inverting brain grey matter models with likelihood-free inference: a tool for trustable cytoarchitecture measurements

Authors: Maëliss Jallais, Pedro Luiz Coelho Rodrigues, Alexandre Gramfort, Demian Wassermann

Abstract: Effective characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in diffusion MRI (dMRI). Solving the problem of relating the dMRI signal with cytoarchitectural characteristics calls for the definition of a mathematical model that describes brain tissue via a handful of physiologically-relevant parameters an… ▽ More Effective characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in diffusion MRI (dMRI). Solving the problem of relating the dMRI signal with cytoarchitectural characteristics calls for the definition of a mathematical model that describes brain tissue via a handful of physiologically-relevant parameters and an algorithm for inverting the model. To address this issue, we propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells. We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model. As opposed to other approaches from the literature, our algorithm yields not only an estimation of the parameter vector $θ$ that best describes a given observed data point $x_0$, but also a full posterior distribution $p(θ|x_0)$ over the parameter space. This enables a richer description of the model inversion, providing indicators such as credible intervals for the estimated parameters and a complete characterization of the parameter regions where the model may present indeterminacies. We approximate the posterior distribution using deep neural density estimators, known as normalizing flows, and fit them using a set of repeated simulations from the forward model. We validate our approach on simulations using dmipy and then apply the whole pipeline on two publicly available datasets. △ Less

Submitted 4 May, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

Journal ref: Journal of Machine Learning for Biomedical Imaging, Melba editors, 2022, pp.1-27

arXiv:2107.05684 [pdf, other]

Accenture at CheckThat! 2021: Interesting claim identification and ranking with contextually sensitive lexical training data augmentation

Authors: Evan Williams, Paul Rodrigues, Sieu Tran

Abstract: This paper discusses the approach used by the Accenture Team for CLEF2021 CheckThat! Lab, Task 1, to identify whether a claim made in social media would be interesting to a wide audience and should be fact-checked. Twitter training and test data were provided in English, Arabic, Spanish, Turkish, and Bulgarian. Claims were to be classified (check-worthy/not check-worthy) and ranked in priority ord… ▽ More This paper discusses the approach used by the Accenture Team for CLEF2021 CheckThat! Lab, Task 1, to identify whether a claim made in social media would be interesting to a wide audience and should be fact-checked. Twitter training and test data were provided in English, Arabic, Spanish, Turkish, and Bulgarian. Claims were to be classified (check-worthy/not check-worthy) and ranked in priority order for the fact-checker. Our method used deep neural network transformer models with contextually sensitive lexical augmentation applied on the supplied training datasets to create additional training samples. This augmentation approach improved the performance for all languages. Overall, our architecture and data augmentation pipeline produced the best submitted system for Arabic, and performance scales according to the quantity of provided training data for English, Spanish, Turkish, and Bulgarian. This paper investigates the deep neural network architectures for each language as well as the provided data to examine why the approach worked so effectively for Arabic, and discusses additional data augmentation measures that should could be useful to this problem. △ Less

Submitted 12 July, 2021; originally announced July 2021.

Comments: To Appear As: Evan Williams, Paul Rodrigues, Sieu Tran. Accenture at CheckThat! 2021: Interesting claim identification and ranking with contextually sensitive lexical training data augmentation. In: Faggioli et al. Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum. Bucharest, Romania. 21-24 September 2021

arXiv:2106.05086 [pdf, ps, other]

Design and Implementation of 5G eHealth Systems, Technologies, Use Cases and Future Challenges

Authors: Di Zhang, Joel J. P. C. Rodrigues, Yunkai Zhai, Takuro Sato

Abstract: Fifth generation (5G) aims to connect massive devices with even higher reliability, lower latency and even faster transmission speed, which are vital for implementing the e-health systems. However, the current efforts on 5G e-health systems are still not enough to accomplish its full blueprint. In this article, we first discuss the related technologies from physical layer, upper layer and cross la… ▽ More Fifth generation (5G) aims to connect massive devices with even higher reliability, lower latency and even faster transmission speed, which are vital for implementing the e-health systems. However, the current efforts on 5G e-health systems are still not enough to accomplish its full blueprint. In this article, we first discuss the related technologies from physical layer, upper layer and cross layer perspectives on designing the 5G e-health systems. We afterwards elaborate two use cases according to our implementations, i.e., 5G e-health systems for remote health and 5G e-health systems for Covid-19 pandemic containment. We finally envision the future research trends and challenges of 5G e-health systems. △ Less

Submitted 10 July, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

arXiv:2102.06477 [pdf, other]

HNPE: Leveraging Global Parameters for Neural Posterior Estimation

Authors: Pedro L. C. Rodrigues, Thomas Moreau, Gilles Louppe, Alexandre Gramfort

Abstract: Inferring the parameters of a stochastic model based on experimental observations is central to the scientific method. A particularly challenging setting is when the model is strongly indeterminate, i.e. when distinct sets of parameters yield identical observations. This arises in many practical situations, such as when inferring the distance and power of a radio source (is the source close and we… ▽ More Inferring the parameters of a stochastic model based on experimental observations is central to the scientific method. A particularly challenging setting is when the model is strongly indeterminate, i.e. when distinct sets of parameters yield identical observations. This arises in many practical situations, such as when inferring the distance and power of a radio source (is the source close and weak or far and strong?) or when estimating the amplifier gain and underlying brain activity of an electrophysiological experiment. In this work, we present hierarchical neural posterior estimation (HNPE), a novel method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters. Our method extends recent developments in simulation-based inference (SBI) based on normalizing flows to Bayesian hierarchical models. We validate quantitatively our proposal on a motivating example amenable to analytical solutions and then apply it to invert a well known non-linear model from computational neuroscience, using both simulated and real EEG data. △ Less

Submitted 9 November, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

arXiv:2012.02807 [pdf, other]

Learning summary features of time series for likelihood free inference

Authors: Pedro L. C. Rodrigues, Alexandre Gramfort

Abstract: There has been an increasing interest from the scientific community in using likelihood-free inference (LFI) to determine which parameters of a given simulator model could best describe a set of experimental data. Despite exciting recent results and a wide range of possible applications, an important bottleneck of LFI when applied to time series data is the necessity of defining a set of summary f… ▽ More There has been an increasing interest from the scientific community in using likelihood-free inference (LFI) to determine which parameters of a given simulator model could best describe a set of experimental data. Despite exciting recent results and a wide range of possible applications, an important bottleneck of LFI when applied to time series data is the necessity of defining a set of summary features, often hand-tailored based on domain knowledge. In this work, we present a data-driven strategy for automatically learning summary features from univariate time series and apply it to signals generated from autoregressive-moving-average (ARMA) models and the Van der Pol Oscillator. Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values such as autocorrelation coefficients even in the linear case. △ Less

Submitted 4 December, 2020; originally announced December 2020.

arXiv:2011.08273 [pdf]

Machine Learning and Soil Humidity Sensing: Signal Strength Approach

Authors: Lea Dujić Rodić, Tomislav Županović, Toni Perković, Petar Šolić, Joel J. P. C. Rodrigues

Abstract: The IoT vision of ubiquitous and pervasive computing gives rise to future smart irrigation systems comprising physical and digital world. Smart irrigation ecosystem combined with Machine Learning can provide solutions that successfully solve the soil humidity sensing task in order to ensure optimal water usage. Existing solutions are based on data received from the power hungry/expensive sensors t… ▽ More The IoT vision of ubiquitous and pervasive computing gives rise to future smart irrigation systems comprising physical and digital world. Smart irrigation ecosystem combined with Machine Learning can provide solutions that successfully solve the soil humidity sensing task in order to ensure optimal water usage. Existing solutions are based on data received from the power hungry/expensive sensors that are transmitting the sensed data over the wireless channel. Over time, the systems become difficult to maintain, especially in remote areas due to the battery replacement issues with large number of devices. Therefore, a novel solution must provide an alternative, cost and energy effective device that has unique advantage over the existing solutions. This work explores a concept of a novel, low-power, LoRa-based, cost-effective system which achieves humidity sensing using Deep learning techniques that can be employed to sense soil humidity with the high accuracy simply by measuring signal strength of the given underground beacon device. △ Less

Submitted 16 November, 2020; originally announced November 2020.

arXiv:2009.02431 [pdf, ps, other]

Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models

Authors: Evan Williams, Paul Rodrigues, Valerie Novak

Abstract: We introduce the strategies used by the Accenture Team for the CLEF2020 CheckThat! Lab, Task 1, on English and Arabic. This shared task evaluated whether a claim in social media text should be professionally fact checked. To a journalist, a statement presented as fact, which would be of interest to a large audience, requires professional fact-checking before dissemination. We utilized BERT and RoB… ▽ More We introduce the strategies used by the Accenture Team for the CLEF2020 CheckThat! Lab, Task 1, on English and Arabic. This shared task evaluated whether a claim in social media text should be professionally fact checked. To a journalist, a statement presented as fact, which would be of interest to a large audience, requires professional fact-checking before dissemination. We utilized BERT and RoBERTa models to identify claims in social media text a professional fact-checker should review, and rank these in priority order for the fact-checker. For the English challenge, we fine-tuned a RoBERTa model and added an extra mean pooling layer and a dropout layer to enhance generalizability to unseen text. For the Arabic task, we fine-tuned Arabic-language BERT models and demonstrate the use of back-translation to amplify the minority class and balance the dataset. The work presented here was scored 1st place in the English track, and 1st, 2nd, 3rd, and 4th place in the Arabic track. △ Less

Submitted 4 September, 2020; originally announced September 2020.

Comments: To Appear As: Evan Williams, Paul Rodrigues, Valerie Novak. Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models. In: Cappellato et al. Working Notes of CLEF 2020-Conference and Labs of the Evaluation Forum. Thessaloniki, Greece. 22-25 September 2020

arXiv:2008.06310 [pdf]

doi 10.1109/THMS.2014.2325837

Improving Smart Conference Participation through Socially-Aware Recommendation

Authors: Nana Yaw Asabere, Feng Xia, Wei Wang, Joel J. P. C. Rodrigues, Filippo Basso, Jianhua Ma

Abstract: This research addresses recommending presentation sessions at smart conferences to participants. We propose a venue recommendation algorithm, Socially-Aware Recommendation of Venues and Environments (SARVE). SARVE computes correlation and social characteristic information of conference participants. In order to model a recommendation process using distributed community detection, SARVE further int… ▽ More This research addresses recommending presentation sessions at smart conferences to participants. We propose a venue recommendation algorithm, Socially-Aware Recommendation of Venues and Environments (SARVE). SARVE computes correlation and social characteristic information of conference participants. In order to model a recommendation process using distributed community detection, SARVE further integrates the current context of both the smart conference community and participants. SARVE recommends presentation sessions that may be of high interest to each participant. We evaluate SARVE using a real world dataset. In our experiments, we compare SARVE to two related state-of-the-art methods, namely: Context-Aware Mobile Recommendation Services (CAMRS) and Conference Navigator (Recommender) Model. Our experimental results show that in terms of the utilized evaluation metrics: precision, recall, and f-measure, SARVE achieves more reliable and favorable social (relations and context) recommendation results. △ Less

Submitted 8 August, 2020; originally announced August 2020.

Comments: 12 pages, 9 figures. arXiv admin note: substantial text overlap with arXiv:1312.6808

Journal ref: IEEE Transactions on Human-Machine Systems, 44(5): 689-700, 2014

arXiv:1905.05182 [pdf]

Building Brain Invaders: EEG data of an experimental validation

Authors: Gijsbrecht Van Veen, Alexandre Barachant, Anton Andreev, Grégoire Cattan, Pedro Coelho Rodrigues, Marco Congedo

Abstract: We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.2649006 in mat and csv formats. This dataset contains electroencephalographic (EEG) recordings of 25 subjects testing the Brain Invaders (Congedo, 2011), a visual P300 Brain-Computer Interface inspired by the famous vintage video game Space Invaders (Taito, Tokyo, Japan). Th… ▽ More We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.2649006 in mat and csv formats. This dataset contains electroencephalographic (EEG) recordings of 25 subjects testing the Brain Invaders (Congedo, 2011), a visual P300 Brain-Computer Interface inspired by the famous vintage video game Space Invaders (Taito, Tokyo, Japan). The visual P300 is an event-related potential elicited by a visual stimulation, peaking 240-600 ms after stimulus onset. EEG data were recorded by 16 electrodes in an experiment that took place in the GIPSA-lab, Grenoble, France, in 2012 (Van Veen, 2013 and Congedo, 2013). Python code for manipulating the data is available at https://github.com/plcrodrigues/py.BI.EEG.2012-GIPSA. The ID of this dataset is BI.EEG.2012-GIPSA. △ Less

Submitted 13 May, 2019; originally announced May 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1904.09111

arXiv:1904.12626 [pdf, other]

doi 10.32614/RJ-2020-021

tsmp: An R Package for Time Series with Matrix Profile

Authors: Francisco Bischoff, Pedro Pereira Rodrigues

Abstract: This article describes tsmp, an R package that implements the matrix profile concept for time series. The tsmp package is a toolkit that allows all-pairs similarity joins, motif, discords and chains discovery, semantic segmentation, etc. Here we describe how the tsmp package may be used by showing some of the use-cases from the original articles and evaluate the algorithm speed in the R environmen… ▽ More This article describes tsmp, an R package that implements the matrix profile concept for time series. The tsmp package is a toolkit that allows all-pairs similarity joins, motif, discords and chains discovery, semantic segmentation, etc. Here we describe how the tsmp package may be used by showing some of the use-cases from the original articles and evaluate the algorithm speed in the R environment. This package can be downloaded at https://CRAN.R-project.org/package=tsmp. △ Less

Submitted 18 April, 2019; originally announced April 2019.

arXiv:1904.09111 [pdf]

Brain Invaders Adaptive versus Non-Adaptive P300 Brain-Computer Interface dataset

Authors: Erwan Vaineau, Alexandre Barachant, Anton Andreev, Pedro C. Rodrigues, Grégoire Cattan, Marco Congedo

Abstract: We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.1494163 in mat and csv formats. This dataset contains electroencephalographic (EEG) recordings of 24 subjects doing a visual P300 Brain-Computer Interface experiment on PC. The visual P300 is an event-related potential elicited by visual stimulation, peaking 240-600 ms afte… ▽ More We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.1494163 in mat and csv formats. This dataset contains electroencephalographic (EEG) recordings of 24 subjects doing a visual P300 Brain-Computer Interface experiment on PC. The visual P300 is an event-related potential elicited by visual stimulation, peaking 240-600 ms after stimulus onset. The experiment was designed in order to compare the use of a P300-based brain-computer interface on a PC with and without adaptive calibration using Riemannian geometry. The brain-computer interface is based on electroencephalography (EEG). EEG data were recorded thanks to 16 electrodes. Data were recorded during an experiment taking place in the GIPSA-lab, Grenoble, France, in 2013 (Congedo, 2013). Python code for manipulating the data is available at https://github.com/plcrodrigues/py.BI.EEG.2013-GIPSA. The ID of this dataset is BI.EEG.2013-GIPSA. △ Less

Submitted 19 April, 2019; originally announced April 2019.

arXiv:1904.01171 [pdf, other]

An Efficient Blockchain-based Hierarchical Authentication Mechanism for Energy Trading in V2G Environment

Authors: Sahil Garg, Kuljeet Kaur, Georges Kaddoum, François Gagnon, Joel J. P. C. Rodrigues

Abstract: Vehicle-to-grid (V2G) networks have emerged as a new technology in modern electric power transmission networks. It allows bi-directional flow of communication and electricity between electric vehicles (EVs) and the Smart Grid (SG), in order to provide more sophisticated energy trading. However, due to the involvement of a huge amount of trading data and the presence of untrusted entities in the vi… ▽ More Vehicle-to-grid (V2G) networks have emerged as a new technology in modern electric power transmission networks. It allows bi-directional flow of communication and electricity between electric vehicles (EVs) and the Smart Grid (SG), in order to provide more sophisticated energy trading. However, due to the involvement of a huge amount of trading data and the presence of untrusted entities in the visiting networks, the underlying V2G infrastructure suffers from various security and privacy challenges. Although, several solutions have been proposed in the literature to address these problems, issues like lack of mutual authentication and anonymity, incapability to protect against several attack vectors, generation of huge overhead, and dependency on centralized infrastructures make security and privacy issues even more challenging. To address the above mentioned problems, in this paper, we propose a blockchain oriented hierarchical authentication mechanism for rewarding EVs. The overall process is broadly classified into the following phases: 1) System Initialization, 2) Registration, 3) Hierarchical Mutual Authentication, and 4) Consensus; wherein blockchain's distributed ledger has been employed for transaction execution in distributed V2G environments while Elliptic curve cryptography (ECC) has been used for hierarchical authentication. The designed hierarchical authentication mechanism has been employed to preserve the anonymity of EVs and support mutual authentication between EVs, charging stations (CSs) and the central aggregator (CAG). Additionally, it also supports minimal communicational and computational overheads on resource constrained EVs. Further, formal security verification of the proposed scheme on widely accepted Automated Validation of Internet Security Protocols and Applications (AVISPA) tool validates its safeness against different security attacks. △ Less

Submitted 1 April, 2019; originally announced April 2019.

Comments: Accepted for publication in IEEE ICC 2019 Workshop on Research Advancements in Future Networking Technologies (RAFNET)

arXiv:1904.00609 [pdf]

Passive Head-Mounted Display Music-Listening EEG dataset

Authors: Grégoire Cattan, Pedro C. Rodrigues, Marco Congedo

Abstract: We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.2617084 in mat (Mathworks, Natick, USA) and csv formats. This dataset contains electroencephalographic recordings of 12 subjects listening to music with and without a passive head-mounted display, that is, a head-mounted display which does not include any electronics at the… ▽ More We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.2617084 in mat (Mathworks, Natick, USA) and csv formats. This dataset contains electroencephalographic recordings of 12 subjects listening to music with and without a passive head-mounted display, that is, a head-mounted display which does not include any electronics at the exception of a smartphone. The electroencephalographic headset consisted of 16 electrodes. Data were recorded during a pilot experiment taking place in the GIPSA-lab, Grenoble, France, in 2017 (Cattan and al, 2018). Python code for manipulating the data is available at https://github.com/plcrodrigues/py.PHMDML.EEG.2017-GIPSA. The ID of this dataset is PHMDML.EEG.2017-GIPSA. △ Less

Submitted 2 April, 2019; v1 submitted 1 April, 2019; originally announced April 2019.

arXiv:1903.11297 [pdf]

Dataset of an EEG-based BCI experiment in Virtual Reality and on a Personal Computer

Authors: Grégoire Cattan, A. Andreev, P. Rodrigues, M. Congedo

Abstract: We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.2605204 in mat (Mathworks, Natick, USA) and csv formats. This dataset contains electroencephalographic recordings on 21 subjects doing a visual P300 experiment on PC (personal computer) and VR (virtual reality). The visual P300 is an event-related potential elicited by a vi… ▽ More We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.2605204 in mat (Mathworks, Natick, USA) and csv formats. This dataset contains electroencephalographic recordings on 21 subjects doing a visual P300 experiment on PC (personal computer) and VR (virtual reality). The visual P300 is an event-related potential elicited by a visual stimulation, peaking 240-600 ms after stimulus onset. The experiment was designed in order to compare the use of a P300-based brain-computer interface on a PC and with a virtual reality headset, concerning the physiological, subjective and performance aspects. The brain-computer interface is based on electroencephalography (EEG). EEG were recorded thanks to 16 electrodes. The virtual reality headset consisted of a passive head-mounted display, that is, a head-mounted display which does not include any electronics at the exception of a smartphone. This experiment was carried out at GIPSA-lab (University of Grenoble Alpes, CNRS, Grenoble-INP) in 2018, and promoted by the IHMTEK Company (Interaction Homme-Machine Technologie). The study was approved by the Ethical Committee of the University of Grenoble Alpes (Comit{é} d'Ethique pour la Recherche Non-Interventionnelle). Python code for manipulating the data is available at https://github.com/plcrodrigues/py.VR.EEG.2018-GIPSA. The ID of this dataset is VR.EEG.2018-GIPSA. △ Less

Submitted 27 March, 2019; originally announced March 2019.

arXiv:1902.09291 [pdf, other]

MIRA: A Computational Neuro-Based Cognitive Architecture Applied to Movie Recommender Systems

Authors: Mariana B. Santos, Amanda M. Lima, Lucas A. Silva, Felipe S. Vargas, Guilherme A. Wachs-Lopes, Paulo S. Rodrigues

Abstract: The human mind is still an unknown process of neuroscience in many aspects. Nevertheless, for decades the scientific community has proposed computational models that try to simulate their parts, specific applications, or their behavior in different situations. The most complete model in this line is undoubtedly the LIDA model, proposed by Stan Franklin with the aim of serving as a generic computat… ▽ More The human mind is still an unknown process of neuroscience in many aspects. Nevertheless, for decades the scientific community has proposed computational models that try to simulate their parts, specific applications, or their behavior in different situations. The most complete model in this line is undoubtedly the LIDA model, proposed by Stan Franklin with the aim of serving as a generic computational architecture for several applications. The present project is inspired by the LIDA model to apply it to the process of movie recommendation, the model called MIRA (Movie Intelligent Recommender Agent) presented percentages of precision similar to a traditional model when submitted to the same assay conditions. Moreover, the proposed model reinforced the precision indexes when submitted to tests with volunteers, proving once again its performance as a cognitive model, when executed with small data volumes. Considering that the proposed model achieved a similar behavior to the traditional models under conditions expected to be similar for natural systems, it can be said that MIRA reinforces the applicability of LIDA as a path to be followed for the study and generation of computational agents inspired by neural behaviors. △ Less

Submitted 27 February, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

arXiv:1805.10181 [pdf]

doi 10.1021/acs.nanolett.8b03171

A Generative Model for Inverse Design of Metamaterials

Authors: Zhaocheng Liu, Dayu Zhu, Sean P. Rodrigues, Kyu-Tae Lee, Wenshan Cai

Abstract: The advent of two-dimensional metamaterials in recent years has ushered in a revolutionary means to manipulate the behavior of light on the nanoscale. The effective parameters of these architected materials render unprecedented control over the optical properties of light, thereby eliciting previously unattainable applications in flat lenses, holographic imaging, and emission control among others.… ▽ More The advent of two-dimensional metamaterials in recent years has ushered in a revolutionary means to manipulate the behavior of light on the nanoscale. The effective parameters of these architected materials render unprecedented control over the optical properties of light, thereby eliciting previously unattainable applications in flat lenses, holographic imaging, and emission control among others. The design of such structures, to date, has relied on the expertise of an optical scientist to guide a progression of electromagnetic simulations that iteratively solve Maxwell's equations until a locally optimized solution can be attained. In this work, we identify a solution to circumvent this intuition-guided design by means of a deep learning architecture. When fed an input set of optical spectra, the constructed generative network assimilates a candidate pattern from a user-defined dataset of geometric structures in order to match the input spectra. The generated metamaterial patterns demonstrate high fidelity, yielding equivalent optical spectra at an average accuracy of about 0.9. This approach reveals an opportunity to expedite the discovery and design of metasurfaces for tailored optical responses in a systematic, inverse-design manner. △ Less

Submitted 25 May, 2018; originally announced May 2018.

Comments: 15 pages, 4 figures

arXiv:1611.07910 [pdf, other]

Map-aided Dead-reckoning --- A Study on Locational Privacy in Insurance Telematics

Authors: Johan Wahlström, Isaac Skog, João G. P. Rodrigues, Peter Händel, Ana Aguiar

Abstract: We present a particle-based framework for estimating the position of a vehicle using map information and measurements of speed. Two measurement functions are considered. The first is based on the assumption that the lateral force on the vehicle does not exceed critical limits derived from physical constraints. The second is based on the assumption that the driver approaches a target speed derived… ▽ More We present a particle-based framework for estimating the position of a vehicle using map information and measurements of speed. Two measurement functions are considered. The first is based on the assumption that the lateral force on the vehicle does not exceed critical limits derived from physical constraints. The second is based on the assumption that the driver approaches a target speed derived from the speed limits along the upcoming trajectory. Performance evaluations of the proposed method indicate that end destinations often can be estimated with an accuracy in the order of $100\,[m]$. These results expose the sensitivity and commercial value of data collected in many of today's insurance telematics programs, and thereby have privacy implications for millions of policyholders. We end by discussing the strengths and weaknesses of different methods for anonymization and privacy preservation in telematics programs. △ Less

Submitted 14 November, 2016; originally announced November 2016.

arXiv:1605.07116 [pdf, other]

A Formal Evaluation of PSNR as Quality Measurement Parameter for Image Segmentation Algorithms

Authors: Fernando A. Fardo, Victor H. Conforto, Francisco C. de Oliveira, Paulo S. Rodrigues

Abstract: Quality evaluation of image segmentation algorithms are still subject of debate and research. Currently, there is no generic metric that could be applied to any algorithm reliably. This article contains an evaluation for the PSRN (Peak Signal-To-Noise Ratio) as a metric which has been used to evaluate threshold level selection as well as the number of thresholds in the case of multi-level segmenta… ▽ More Quality evaluation of image segmentation algorithms are still subject of debate and research. Currently, there is no generic metric that could be applied to any algorithm reliably. This article contains an evaluation for the PSRN (Peak Signal-To-Noise Ratio) as a metric which has been used to evaluate threshold level selection as well as the number of thresholds in the case of multi-level segmentation. The results obtained in this study suggest that the PSNR is not an adequate quality measurement for segmentation algorithms. △ Less

Submitted 23 May, 2016; originally announced May 2016.

Comments: 11 pages, 8 figures

arXiv:1605.00452 [pdf, other]

Fourier Analysis and q-Gaussian Functions: Analytical and Numerical Results

Authors: Paulo Sérgio Silva Rodrigues, Gilson Antonio Giraldi

Abstract: It is a consensus in signal processing that the Gaussian kernel and its partial derivatives enable the development of robust algorithms for feature detection. Fourier analysis and convolution theory have central role in such development. In this paper we collect theoretical elements to follow this avenue but using the q-Gaussian kernel that is a nonextensive generalization of the Gaussian one. Fir… ▽ More It is a consensus in signal processing that the Gaussian kernel and its partial derivatives enable the development of robust algorithms for feature detection. Fourier analysis and convolution theory have central role in such development. In this paper we collect theoretical elements to follow this avenue but using the q-Gaussian kernel that is a nonextensive generalization of the Gaussian one. Firstly, we review some theoretical elements behind the one-dimensional q-Gaussian and its Fourier transform. Then, we consider the two-dimensional q-Gaussian and we highlight the issues behind its analytical Fourier transform computation. We analyze the q-Gaussian kernel in the space and Fourier domains using the concepts of space window, cut-off frequency, and the Heisenberg inequality. △ Less

Submitted 2 May, 2016; originally announced May 2016.

Comments: 31 pages, 6 figures

Report number: arXiv:1549348

arXiv:1601.03976 [pdf, other]

Modeling and Analysis of Converged Network-Cloud Services

Authors: Eduardo Hargreaves, Paulo H De Aguiar Rodrigues, Daniel S. Menasché

Abstract: Networks connecting distributed cloud services through multiple data centers are called cloud networks. These types of networks play a crucial role in cloud computing and a holistic performance evaluation is essential before planning a converged network-cloud environment. We analyze a specific case where some resources can be centralized in one datacenter or distributed among multiple data centers… ▽ More Networks connecting distributed cloud services through multiple data centers are called cloud networks. These types of networks play a crucial role in cloud computing and a holistic performance evaluation is essential before planning a converged network-cloud environment. We analyze a specific case where some resources can be centralized in one datacenter or distributed among multiple data centers. The economy of scale in centralizing resources in a sin- gle pool of resources can be overcome by an increase in communication costs. We propose an analytical model to evaluate tradeoffs in terms of application requirements, usage patterns, number of resources and communication costs. We numerically evaluate the proposed model in a case study inspired by the oil and gas industry, indicating how to cope with the tradeoff between statisti- cal multiplexing advantages of centralization and the corresponding increase in communication infrastructure costs. △ Less

Submitted 15 January, 2016; originally announced January 2016.

Comments: XIII Workshop em Clouds e Aplicações (WCGA2015)

arXiv:1412.2070 [pdf, other]

SenseMyCity: Crowdsourcing an Urban Sensor

Authors: João G. P. Rodrigues, Ana Aguiar, João Barros

Abstract: People treat smartphones as a second skin, having them around nearly 24/7 and constantly interacting with them. Although smartphones are used mainly for personal communication, social networking and web browsing, they have many connectivity capabilities, and are at the same time equipped with a wide range of embedded sensors. Additionally, bluetooth connectivity can be leveraged to collect data fr… ▽ More People treat smartphones as a second skin, having them around nearly 24/7 and constantly interacting with them. Although smartphones are used mainly for personal communication, social networking and web browsing, they have many connectivity capabilities, and are at the same time equipped with a wide range of embedded sensors. Additionally, bluetooth connectivity can be leveraged to collect data from external sensors, greatly extending the sensing capabilities. However, massive data-gathering using smartphones still poses many architectural challenges, such as limited battery and processing power, and possibly connectivity costs. This article describes SenseMyCity (SMC), an Internet of Things mobile urban sensor that is extensible and fully configurable. The platform consists of an app, a backoffice and a frontoffice. The SMC app can collect data from embedded sensors, like GPS, wifi, accelerometer, magnetometer, etc, as well as from external bluetooth sensors, ranging from On-Board Diagnostics gathering data from vehicles, to wearable cardiac sensors. Adding support for new internal or external sensors is straightforward due to the modular architecture. Data transmission to our servers can occur either on-demand or in real-time, while kee** costs down by only using the configured type of Internet connectivity. We discuss our experience implementing the platform and using it to make longitudinal studies with many users. Further, we present results on bandwidth utilization and energy consumption for different sensors and sampling rates. Finally, we show two use cases: map** fuel consumption and user stress extracted from cardiac sensors. △ Less

Submitted 5 December, 2014; originally announced December 2014.

Comments: 10 pages, 11 figures

arXiv:1410.8553 [pdf, other]

A random forest system combination approach for error detection in digital dictionaries

Authors: Michael Bloodgood, Peng Ye, Paul Rodrigues, David Zajic, David Doermann

Abstract: When digitizing a print bilingual dictionary, whether via optical character recognition or manual entry, it is inevitable that errors are introduced into the electronic version that is created. We investigate automating the process of detecting errors in an XML representation of a digitized print dictionary using a hybrid approach that combines rule-based, feature-based, and language model-based m… ▽ More When digitizing a print bilingual dictionary, whether via optical character recognition or manual entry, it is inevitable that errors are introduced into the electronic version that is created. We investigate automating the process of detecting errors in an XML representation of a digitized print dictionary using a hybrid approach that combines rule-based, feature-based, and language model-based methods. We investigate combining methods and show that using random forests is a promising approach. We find that in isolation, unsupervised methods rival the performance of supervised methods. Random forests typically require training data so we investigate how we can apply random forests to combine individual base methods that are themselves unsupervised without requiring large amounts of training data. Experiments reveal empirically that a relatively small amount of data is sufficient and can potentially be further reduced through specific selection criteria. △ Less

Submitted 30 October, 2014; originally announced October 2014.

Comments: 9 pages, 7 figures, 10 tables; appeared in Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data, April 2012

ACM Class: I.2.7; I.2.6; I.5.1; I.5.4

Journal ref: In Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data, pages 78-86, Avignon, France, April 2012. Association for Computational Linguistics

arXiv:1410.8149 [pdf]

Detecting Structural Irregularity in Electronic Dictionaries Using Language Modeling

Authors: Paul Rodrigues, David Zajic, David Doermann, Michael Bloodgood, Peng Ye

Abstract: Dictionaries are often developed using tools that save to Extensible Markup Language (XML)-based standards. These standards often allow high-level repeating elements to represent lexical entries, and utilize descendants of these repeating elements to represent the structure within each lexical entry, in the form of an XML tree. In many cases, dictionaries are published that have errors and inconsi… ▽ More Dictionaries are often developed using tools that save to Extensible Markup Language (XML)-based standards. These standards often allow high-level repeating elements to represent lexical entries, and utilize descendants of these repeating elements to represent the structure within each lexical entry, in the form of an XML tree. In many cases, dictionaries are published that have errors and inconsistencies that are expensive to find manually. This paper discusses a method for dictionary writers to quickly audit structural regularity across entries in a dictionary by using statistical language modeling. The approach learns the patterns of XML nodes that could occur within an XML tree, and then calculates the probability of each XML tree in the dictionary against these patterns to look for entries that diverge from the norm. △ Less

Submitted 29 October, 2014; originally announced October 2014.

Comments: 6 pages, 2 figures, 11 tables; appeared in Proceedings of Electronic Lexicography in the 21st Century (eLex), November 2011

ACM Class: I.2.7; I.2.6; I.5.1; I.5.4

Journal ref: In Proceedings of Electronic Lexicography in the 21st Century (eLex), pages 227-232, Bled, Slovenia, November 2011. Tro**a Institute for Applied Slovene Studies

arXiv:1410.7787 [pdf]

Correcting Errors in Digital Lexicographic Resources Using a Dictionary Manipulation Language

Authors: David Zajic, Michael Maxwell, David Doermann, Paul Rodrigues, Michael Bloodgood

Abstract: We describe a paradigm for combining manual and automatic error correction of noisy structured lexicographic data. Modifications to the structure and underlying text of the lexicographic data are expressed in a simple, interpreted programming language. Dictionary Manipulation Language (DML) commands identify nodes by unique identifiers, and manipulations are performed using simple commands such as… ▽ More We describe a paradigm for combining manual and automatic error correction of noisy structured lexicographic data. Modifications to the structure and underlying text of the lexicographic data are expressed in a simple, interpreted programming language. Dictionary Manipulation Language (DML) commands identify nodes by unique identifiers, and manipulations are performed using simple commands such as create, move, set text, etc. Corrected lexicons are produced by applying sequences of DML commands to the source version of the lexicon. DML commands can be written manually to repair one-off errors or generated automatically to correct recurring problems. We discuss advantages of the paradigm for the task of editing digital bilingual dictionaries. △ Less

Submitted 28 October, 2014; originally announced October 2014.

Comments: 5 pages, 3 figures, 1 table; appeared in Proceedings of Electronic Lexicography in the 21st Century (eLex), November 2011

Journal ref: In Proceedings of Electronic Lexicography in the 21st Century (eLex), pages 297-301, Bled, Slovenia, November 2011. Tro**a Institute for Applied Slovene Studies

arXiv:1312.6808 [pdf]

doi 10.1109/UIC-ATC.2013.81

Socially-Aware Venue Recommendation for Conference Participants

Authors: Feng Xia, Nana Yaw Asabere, Joel J. P. C. Rodrigues, Filippo Basso, Nakema Deonauth, Wei Wang

Abstract: Current research environments are witnessing high enormities of presentations occurring in different sessions at academic conferences. This situation makes it difficult for researchers (especially juniors) to attend the right presentation session(s) for effective collaboration. In this paper, we propose an innovative venue recommendation algorithm to enhance smart conference participation. Our pro… ▽ More Current research environments are witnessing high enormities of presentations occurring in different sessions at academic conferences. This situation makes it difficult for researchers (especially juniors) to attend the right presentation session(s) for effective collaboration. In this paper, we propose an innovative venue recommendation algorithm to enhance smart conference participation. Our proposed algorithm, Social Aware Recommendation of Venues and Environments (SARVE), computes the Pearson Correlation and social characteristic information of conference participants. SARVE further incorporates the current context of both the smart conference community and participants in order to model a recommendation process using distributed community detection. Through the integration of the above computations and techniques, we are able to recommend presentation sessions of active participant presenters that may be of high interest to a particular participant. We evaluate SARVE using a real world dataset. Our experimental results demonstrate that SARVE outperforms other state-of-the-art methods. △ Less

Submitted 24 December, 2013; originally announced December 2013.

MSC Class: 68P20 ACM Class: H.3.3; H.1.2

Journal ref: The 10th IEEE International Conference on Ubiquitous Intelligence and Computing (UIC), Vietri sul Mare, Italy, December 2013

arXiv:1311.5904 [pdf, ps, other]

doi 10.1016/j.jpdc.2014.08.001

The IceProd Framework: Distributed Data Processing for the IceCube Neutrino Observatory

Authors: M. G. Aartsen, R. Abbasi, M. Ackermann, J. Adams, J. A. Aguilar, M. Ahlers, D. Altmann, C. Arguelles, J. Auffenberg, X. Bai, M. Baker, S. W. Barwick, V. Baum, R. Bay, J. J. Beatty, J. Becker Tjus, K. -H. Becker, S. BenZvi, P. Berghaus, D. Berley, E. Bernardini, A. Bernhard, D. Z. Besson, G. Binder, D. Bindig , et al. (262 additional authors not shown)

Abstract: IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, iden- tify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. IceProd is a distributed management system based on Python, XML-RPC and GridFTP. It… ▽ More IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, iden- tify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. IceProd is a distributed management system based on Python, XML-RPC and GridFTP. It is driven by a central database in order to coordinate and admin- ister production of simulations and processing of data produced by the IceCube detector. IceProd runs as a separate layer on top of other middleware and can take advantage of a variety of computing resources, including grids and batch systems such as CREAM, Condor, and PBS. This is accomplished by a set of dedicated daemons that process job submission in a coordinated fashion through the use of middleware plugins that serve to abstract the details of job submission and job management from the framework. △ Less

Submitted 22 August, 2014; v1 submitted 22 November, 2013; originally announced November 2013.

Journal ref: Journal of Parallel & Distributed Computing 75:198,2015

arXiv:cs/0508002 [pdf, ps, other]

Methods for Analytical Understanding of Agent-Based Modeling of Complex Systems

Authors: Gilson A. Giraldi, Luis C. da Costa, Adilson V. Xavier, Paulo S. Rodrigues

Abstract: Von Neuman's work on universal machines and the hardware development have allowed the simulation of dynamical systems through a large set of interacting agents. This is a bottom-up approach which tries to derive global properties of a complex system through local interaction rules and agent behaviour. Traditionally, such systems are modeled and simulated through top-down methods based on differe… ▽ More Von Neuman's work on universal machines and the hardware development have allowed the simulation of dynamical systems through a large set of interacting agents. This is a bottom-up approach which tries to derive global properties of a complex system through local interaction rules and agent behaviour. Traditionally, such systems are modeled and simulated through top-down methods based on differential equations. Agent-Based Modeling has the advantage of simplicity and low computational cost. However, unlike differential equations, there is no standard way to express agent behaviour. Besides, it is not clear how to analytically predict the results obtained by the simulation. In this paper we survey some of these methods. For expressing agent behaviour formal methods, like Stochastic Process Algebras have been used. Such approach is useful if the global properties of interest can be expressed as a function of stochastic time series. However, if space variables must be considered, we shall change the focus. In this case, multiscale techniques, based on Chapman-Enskog expansion, was used to establish the connection between the microscopic dynamics and the macroscopic observables. Also, we use data mining techniques,like Principal Component Analysis (PCA), to study agent systems like Cellular Automata. With the help of these tools we will discuss a simple society model, a Lattice Gas Automaton for fluid modeling, and knowledge discovery in CA databases. Besides, we show the capabilities of the NetLogo, a software for agent simulation of complex system and show our experience about. △ Less

Submitted 30 July, 2005; originally announced August 2005.

arXiv:cs/0507012 [pdf, ps, other]

Lattice Gas Cellular Automata for Computational Fluid Animation

Authors: Gilson A. Giraldi, Adilson V. Xavier, Antonio L. Apolinario Jr, Paulo S. Rodrigues

Abstract: The past two decades showed a rapid growing of physically-based modeling of fluids for computer graphics applications. In this area, a common top down approach is to model the fluid dynamics by Navier-Stokes equations and apply a numerical techniques such as Finite Differences or Finite Elements for the simulation. In this paper we focus on fluid modeling through Lattice Gas Cellular Automata (L… ▽ More The past two decades showed a rapid growing of physically-based modeling of fluids for computer graphics applications. In this area, a common top down approach is to model the fluid dynamics by Navier-Stokes equations and apply a numerical techniques such as Finite Differences or Finite Elements for the simulation. In this paper we focus on fluid modeling through Lattice Gas Cellular Automata (LGCA) for computer graphics applications. LGCA are discrete models based on point particles that move on a lattice, according to suitable and simple rules in order to mimic a fully molecular dynamics. By Chapman-Enskog expansion, a known multiscale technique in this area, it can be demonstrated that the Navier-Stokes model can be reproduced by the LGCA technique. Thus, with LGCA we get a fluid model that does not require solution of complicated equations. Therefore, we combine the advantage of the low computational cost of LGCA and its ability to mimic the realistic fluid dynamics to develop a new animating framework for computer graphics applications. In this work, we discuss the theoretical elements of our proposal and show experimental results. △ Less

Submitted 5 July, 2005; originally announced July 2005.

arXiv:cs/0502095 [pdf]

Gradient Vector Flow Models for Boundary Extraction in 2D Images

Authors: Gilson A. Giraldi, Leandro S. Marturelli, Paulo S. Rodrigues

Abstract: The Gradient Vector Flow (GVF) is a vector diffusion approach based on Partial Differential Equations (PDEs). This method has been applied together with snake models for boundary extraction medical images segmentation. The key idea is to use a diffusion-reaction PDE to generate a new external force field that makes snake models less sensitivity to initialization as well as improves the snake's a… ▽ More The Gradient Vector Flow (GVF) is a vector diffusion approach based on Partial Differential Equations (PDEs). This method has been applied together with snake models for boundary extraction medical images segmentation. The key idea is to use a diffusion-reaction PDE to generate a new external force field that makes snake models less sensitivity to initialization as well as improves the snake's ability to move into boundary concavities. In this paper, we firstly review basic results about convergence and numerical analysis of usual GVF schemes. We point out that GVF presents numerical problems due to discontinuities image intensity. This point is considered from a practical viewpoint from which the GVF parameters must follow a relationship in order to improve numerical convergence. Besides, we present an analytical analysis of the GVF dependency from the parameters values. Also, we observe that the method can be used for multiply connected domains by just imposing the suitable boundary condition. In the experimental results we verify these theoretical points and demonstrate the utility of GVF on a segmentation approach that we have developed based on snakes. △ Less

Submitted 20 July, 2005; v1 submitted 28 February, 2005; originally announced February 2005.

Comments: 8 pages, 11 figures

Showing 1–38 of 38 results for author: Rodrigues, P