Search | arXiv e-print repository

Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations

Authors: José Luiz Nunes, Guilherme F. C. F. Almeida, Marcelo de Araujo, Simone D. J. Barbosa

Abstract: Large language models (LLMs) have taken centre stage in debates on Artificial Intelligence. Yet there remains a gap in how to assess LLMs' conformity to important human values. In this paper, we investigate whether state-of-the-art LLMs, GPT-4 and Claude 2.1 (Gemini Pro and LLAMA 2 did not generate valid results) are moral hypocrites. We employ two research instruments based on the Moral Foundatio… ▽ More Large language models (LLMs) have taken centre stage in debates on Artificial Intelligence. Yet there remains a gap in how to assess LLMs' conformity to important human values. In this paper, we investigate whether state-of-the-art LLMs, GPT-4 and Claude 2.1 (Gemini Pro and LLAMA 2 did not generate valid results) are moral hypocrites. We employ two research instruments based on the Moral Foundations Theory: (i) the Moral Foundations Questionnaire (MFQ), which investigates which values are considered morally relevant in abstract moral judgements; and (ii) the Moral Foundations Vignettes (MFVs), which evaluate moral cognition in concrete scenarios related to each moral foundation. We characterise conflicts in values between these different abstractions of moral evaluation as hypocrisy. We found that both models displayed reasonable consistency within each instrument compared to humans, but they displayed contradictory and hypocritical behaviour when we compared the abstract values present in the MFQ to the evaluation of concrete moral violations of the MFV. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 13 pages, 4 figures, 2 tables

arXiv:2310.18159 [pdf, other]

DESiRED -- Dynamic, Enhanced, and Smart iRED: A P4-AQM with Deep Reinforcement Learning and In-band Network Telemetry

Authors: Leandro C. de Almeida, Washington Rodrigo Dias da Silva, Thiago C. Tavares, Rafael Pasquini, Chrysa Papagianni, Fábio L. Verdi

Abstract: Active Queue Management (AQM) is a mechanism employed to alleviate transient congestion in network device buffers, such as routers and switches. Traditional AQM algorithms use fixed thresholds, like target delay or queue occupancy, to compute random packet drop probabilities. A very small target delay can increase packet losses and reduce link utilization, while a large target delay may increase q… ▽ More Active Queue Management (AQM) is a mechanism employed to alleviate transient congestion in network device buffers, such as routers and switches. Traditional AQM algorithms use fixed thresholds, like target delay or queue occupancy, to compute random packet drop probabilities. A very small target delay can increase packet losses and reduce link utilization, while a large target delay may increase queueing delays while lowering drop probability. Due to dynamic network traffic characteristics, where traffic fluctuations can lead to significant queue variations, maintaining a fixed threshold AQM may not suit all applications. Consequently, we explore the question: \textit{What is the ideal threshold (target delay) for AQMs?} In this work, we introduce DESiRED (Dynamic, Enhanced, and Smart iRED), a P4-based AQM that leverages precise network feedback from In-band Network Telemetry (INT) to feed a Deep Reinforcement Learning (DRL) model. This model dynamically adjusts the target delay based on rewards that maximize application Quality of Service (QoS). We evaluate DESiRED in a realistic P4-based test environment running an MPEG-DASH service. Our findings demonstrate up to a 90x reduction in video stall and a 42x increase in high-resolution video playback quality when the target delay is adjusted dynamically by DESiRED. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: Preprint (Computer Networks under review)

arXiv:2310.18088 [pdf, other]

iRED: A disaggregated P4-AQM fully implemented in programmable data plane hardware

Authors: Leandro C. de Almeida, Rafael Pasquini, Chrysa Papagianni, Fábio L. Verdi

Abstract: Routers employ queues to temporarily hold packets when the scheduler cannot immediately process them. Congestion occurs when the arrival rate of packets exceeds the processing capacity, leading to increased queueing delay. Over time, Active Queue Management (AQM) strategies have focused on directly draining packets from queues to alleviate congestion and reduce queuing delay. On Programmable Data… ▽ More Routers employ queues to temporarily hold packets when the scheduler cannot immediately process them. Congestion occurs when the arrival rate of packets exceeds the processing capacity, leading to increased queueing delay. Over time, Active Queue Management (AQM) strategies have focused on directly draining packets from queues to alleviate congestion and reduce queuing delay. On Programmable Data Plane (PDP) hardware, AQMs traditionally reside in the Egress pipeline due to the availability of queue delay information there. We argue that this approach wastes the router's resources because the dropped packet has already consumed the entire pipeline of the device. In this work, we propose ingress Random Early Detection (iRED), a more efficient approach that addresses the Egress drop problem. iRED is a disaggregated P4-AQM fully implemented in programmable data plane hardware and also supports Low Latency, Low Loss, and Scalable Throughput (L4S) framework, saving device pipeline resources by drop** packets in the Ingress block. To evaluate iRED, we conducted three experiments using a Tofino2 programmable switch: i) An in-depth analysis of state-of-the-art AQMs on PDP hardware, using 12 different network configurations varying in bandwidth, Round-Trip Time (RTT), and Maximum Transmission Unit (MTU). The results demonstrate that iRED can significantly reduce router resource consumption, with up to a 10x reduction in memory usage, 12x fewer processing cycles, and 8x less power consumption for the same traffic load; ii) A performance evaluation regarding the L4S framework. The results prove that iRED achieves fairness in bandwidth usage for different types of traffic (classic and scalable); iii) A comprehensive analysis of the QoS in a real setup of a DASH) technology. iRED demonstrated up to a 2.34x improvement in FPS and a 4.77x increase in the video player buffer fill. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: Preprint (TNSM under review)

arXiv:2308.01264 [pdf]

doi 10.1016/j.artint.2024.104145

Exploring the psychology of LLMs' Moral and Legal Reasoning

Authors: Guilherme F. C. F. Almeida, José Luiz Nunes, Neele Engelmann, Alex Wiegmann, Marcelo de Araújo

Abstract: Large language models (LLMs) exhibit expert-level performance in tasks across a wide range of different domains. Ethical issues raised by LLMs and the need to align future versions makes it important to know how state of the art models reason about moral and legal issues. In this paper, we employ the methods of experimental psychology to probe into this question. We replicate eight studies from th… ▽ More Large language models (LLMs) exhibit expert-level performance in tasks across a wide range of different domains. Ethical issues raised by LLMs and the need to align future versions makes it important to know how state of the art models reason about moral and legal issues. In this paper, we employ the methods of experimental psychology to probe into this question. We replicate eight studies from the experimental literature with instances of Google's Gemini Pro, Anthropic's Claude 2.1, OpenAI's GPT-4, and Meta's Llama 2 Chat 70b. We find that alignment with human responses shifts from one experiment to another, and that models differ amongst themselves as to their overall alignment, with GPT-4 taking a clear lead over all other models we tested. Nonetheless, even when LLM-generated responses are highly correlated to human responses, there are still systematic differences, with a tendency for models to exaggerate effects that are present among humans, in part by reducing variance. This recommends caution with regards to proposals of replacing human participants with current state-of-the-art LLMs in psychological research and highlights the need for further research about the distinctive aspects of machine psychology. △ Less

Submitted 4 March, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

Journal ref: Exploring the psychology of LLMs' moral and legal reasoning. Artificial Intelligence, Volume 224, 2024

arXiv:2305.08346 [pdf]

doi 10.1016/j.infsof.2022.107142

Negative Effects of Gamification in Education Software: Systematic Map** and Practitioner Perceptions

Authors: Clauvin Almeida, Marcos Kalinowski, Anderson Uchoa, Bruno Feijo

Abstract: Context: While most research shows positive effects of gamification, the focus on its adverse effects is considerably smaller and further understanding is needed. Objective: To provide a comprehensive overview on research reporting negative effects of game design elements and to provide insights into the awareness of developers on these effects and into how they could be considered in practice. Me… ▽ More Context: While most research shows positive effects of gamification, the focus on its adverse effects is considerably smaller and further understanding is needed. Objective: To provide a comprehensive overview on research reporting negative effects of game design elements and to provide insights into the awareness of developers on these effects and into how they could be considered in practice. Method: We conducted a systematic map** study of the negative effects of game design elements on education/learning systems. We also held a focus group discussion with developers of a gamified software, discussing the map** study results with regard to their awareness and perceptions on the reported negative effects in practice. Results: The map** study revealed 87 papers reporting undesired effects of game design elements. We found that badges, leaderboards, competitions, and points are the game design elements most often reported as causing negative effects. The most cited negative effects were lack of effect, worsened performance, motivational issues, lack of understanding, and irrelevance. The ethical issues of gaming the system and cheating were also often reported. As part of our results, we map the relations between game design elements and the negative effects that they may cause. The focus group revealed that developers were not aware of many of the possible negative effects and that they consider this type of information useful. The discussion revealed their agreement on some of those potential negative effects and also some positive counterparts. Conclusions: Gamification, when properly applied, can have positive effects on education/learning software. However, gamified software is also prone to generate harmful effects. Revealing and discussing potentially negative effects can help to make more informed decisions considering their trade-off with respect to the expected benefits. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Journal ref: Information and Software Technology, Volume 156, April 2023, 107142

arXiv:2211.11928 [pdf, ps, other]

A case study of proactive auto-scaling for an ecommerce workload

Authors: Marcella Medeiros Siqueira Coutinho de Almeida, Thiago Emmanuel Pereira, Fabio Morais

Abstract: Preliminary data obtained from a partnership between the Federal University of Campina Grande and an ecommerce company indicates that some applications have issues when dealing with variable demand. This happens because a delay in scaling resources leads to performance degradation and, in literature, is a matter usually treated by improving the auto-scaling. To better understand the current state-… ▽ More Preliminary data obtained from a partnership between the Federal University of Campina Grande and an ecommerce company indicates that some applications have issues when dealing with variable demand. This happens because a delay in scaling resources leads to performance degradation and, in literature, is a matter usually treated by improving the auto-scaling. To better understand the current state-of-the-art on this subject, we re-evaluate an auto-scaling algorithm proposed in the literature, in the context of ecommerce, using a long-term real workload. Experimental results show that our proactive approach is able to achieve an accuracy of up to 94 percent and led the auto-scaling to a better performance than the reactive approach currently used by the ecommerce company. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2209.01900 [pdf, other]

doi 10.3390/math11010074

A Robust Learning Methodology for Uncertainty-aware Scientific Machine Learning models

Authors: Erbet Costa Almeida, Carine de Menezes Rebello, Marcio Fontana, Leizer Schnitman, Idelfonso Bessa dos Reis Nogueira

Abstract: Robust learning is an important issue in Scientific Machine Learning (SciML). There are several works in the literature addressing this topic. However, there is an increasing demand for methods that can simultaneously consider all the different uncertainty components involved in SciML model identification. Hence, this work proposes a comprehensive methodology for uncertainty evaluation of the SciM… ▽ More Robust learning is an important issue in Scientific Machine Learning (SciML). There are several works in the literature addressing this topic. However, there is an increasing demand for methods that can simultaneously consider all the different uncertainty components involved in SciML model identification. Hence, this work proposes a comprehensive methodology for uncertainty evaluation of the SciML that also considers several possible sources of uncertainties involved in the identification process. The uncertainties considered in the proposed method are the absence of theory and causal models, the sensitiveness to data corruption or imperfection, and the computational effort. Therefore, it was possible to provide an overall strategy for the uncertainty-aware models in the SciML field. The methodology is validated through a case study, develo** a Soft Sensor for a polymerization reactor. The results demonstrated that the identified Soft Sensor are robust for uncertainties, corroborating with the consistency of the proposed approach. △ Less

Submitted 5 September, 2022; originally announced September 2022.

Comments: 23 pages

MSC Class: 68T07 ACM Class: J.2

arXiv:2107.11966 [pdf, other]

When SRv6 meets 5G Core: Implementation and Deployment of a Network Service Chaining Function in SmartNICs

Authors: Guilherme Matos, Fabio Luciano Verdi, Luis Miguel Contreras, Leandro C. de Almeida

Abstract: Currently, we have witnessed a myriad of solutions that benefit from programmable hardware. The 5G Core (5GC) can and should also benefit from such paradigm to offload certain functions to the dataplane. In this work, we designed and implemented a P4-based solution for traffic identification and chaining using the Netronome Agilo SmartNIC. The solution here presented is deployed in-between the RAN… ▽ More Currently, we have witnessed a myriad of solutions that benefit from programmable hardware. The 5G Core (5GC) can and should also benefit from such paradigm to offload certain functions to the dataplane. In this work, we designed and implemented a P4-based solution for traffic identification and chaining using the Netronome Agilo SmartNIC. The solution here presented is deployed in-between the RAN and UPF (User Plane Function) so that traffic coming from the RAN is identified and chained using SRv6 based on different rules defined by the control plane. The traffic identification and the construction of the SRv6 list of segments are done entirely in the SmartNIC. A minimalist Proof-of-Concept (PoC) was deployed and evaluated to show that this function is perfectly capable to build service function chainings in a transparent and efficient way. △ Less

Submitted 26 July, 2021; originally announced July 2021.

Comments: 2021 P4 Workshop

arXiv:2106.00799 [pdf, other]

doi 10.1016/j.isprsjprs.2021.07.001

Multi-task fully convolutional network for tree species map** in dense forests using small training hyperspectral data

Authors: Laura Elena Cué La Rosa, Camile Sothe, Raul Queiroz Feitosa, Cláudia Maria de Almeida, Marcos Benedito Schimalski, Dario Augusto Borges Oliveira

Abstract: This work proposes a multi-task fully convolutional architecture for tree species map** in dense forests from sparse and scarce polygon-level annotations using hyperspectral UAV-borne data. Our model implements a partial loss function that enables dense tree semantic labeling outcomes from non-dense training samples, and a distance regression complementary task that enforces tree crown boundary… ▽ More This work proposes a multi-task fully convolutional architecture for tree species map** in dense forests from sparse and scarce polygon-level annotations using hyperspectral UAV-borne data. Our model implements a partial loss function that enables dense tree semantic labeling outcomes from non-dense training samples, and a distance regression complementary task that enforces tree crown boundary constraints and substantially improves the model performance. Our multi-task architecture uses a shared backbone network that learns common representations for both tasks and two task-specific decoders, one for the semantic segmentation output and one for the distance map regression. We report that introducing the complementary task boosts the semantic segmentation performance compared to the single-task counterpart in up to 11% reaching an average user's accuracy of 88.63% and an average producer's accuracy of 88.59%, achieving state-of-art performance for tree species classification in tropical forests. △ Less

Submitted 6 September, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: Full version of preprint accepted at ISPRS Journal of Photogrammetry and Remote Sensing

arXiv:2104.05516 [pdf, other]

Machine-checked ZKP for NP-relations: Formally Verified Security Proofs and Implementations of MPC-in-the-Head

Authors: José Carlos Bacelar Almeida, Manuel Barbosa, Karim Eldefrawy, Stéphane Graham-Lengrand, Hugo Pacheco, Vitor Pereira

Abstract: MPC-in-the-Head (MitH) is a general framework that allows constructing efficient Zero Knowledge protocols for general NP-relations from secure multiparty computation (MPC) protocols. In this paper we give the first machine-checked implementation of this transformation. We begin with an EasyCrypt formalization of MitH that preserves the modular structure of MitH and can be instantiated with arbitra… ▽ More MPC-in-the-Head (MitH) is a general framework that allows constructing efficient Zero Knowledge protocols for general NP-relations from secure multiparty computation (MPC) protocols. In this paper we give the first machine-checked implementation of this transformation. We begin with an EasyCrypt formalization of MitH that preserves the modular structure of MitH and can be instantiated with arbitrary MPC protocols that satisfy standard notions of security, which allows us to leverage an existing machine-checked secret-sharing-based MPC protocol development. The resulting concrete ZK protocol is proved secure and correct in EasyCrypt. Using a recently developed code extraction mechanism for EasyCrypt we synthesize a formally verified implementation of the protocol, which we benchmark to get an indication of the overhead associated with our formalization choices and code extraction mechanism. △ Less

Submitted 19 May, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

arXiv:2102.00461 [pdf, other]

Multilingual Email Zoning

Authors: Bruno Jardim, Ricardo Rei, Mariana S. C. Almeida

Abstract: The segmentation of emails into functional zones (also dubbed email zoning) is a relevant preprocessing step for most NLP tasks that deal with emails. However, despite the multilingual character of emails and their applications, previous literature regarding email zoning corpora and systems was developed essentially for English. In this paper, we analyse the existing email zoning corpora and pro… ▽ More The segmentation of emails into functional zones (also dubbed email zoning) is a relevant preprocessing step for most NLP tasks that deal with emails. However, despite the multilingual character of emails and their applications, previous literature regarding email zoning corpora and systems was developed essentially for English. In this paper, we analyse the existing email zoning corpora and propose a new multilingual benchmark composed of 625 emails in Portuguese, Spanish and French. Moreover, we introduce OKAPI, the first multilingual email segmentation model based on a language agnostic sentence encoder. Besides generalizing well for unseen languages, our model is competitive with current English benchmarks, and reached new state-of-the-art performances for domain adaptation tasks in English. △ Less

Submitted 13 February, 2021; v1 submitted 31 January, 2021; originally announced February 2021.

Comments: Accepted at EACL 2021 SRW (https://sites.google.com/view/eaclsrw2021/home); 6 pages with 2 Figures and 8 Tables, plus references; Cleverly Multilingual Zoning Corpus available at https://github.com/cleverly-ai/multilingual-email-zoning

arXiv:2009.10181 [pdf, other]

doi 10.1109/ACCESS.2021.3077415

Towards Image-based Automatic Meter Reading in Unconstrained Scenarios: A Robust and Efficient Approach

Authors: Rayson Laroca, Alessandra B. Araujo, Luiz A. Zanlorensi, Eduardo C. de Almeida, David Menotti

Abstract: Existing approaches for image-based Automatic Meter Reading (AMR) have been evaluated on images captured in well-controlled scenarios. However, real-world meter reading presents unconstrained scenarios that are way more challenging due to dirt, various lighting conditions, scale variations, in-plane and out-of-plane rotations, among other factors. In this work, we present an end-to-end approach fo… ▽ More Existing approaches for image-based Automatic Meter Reading (AMR) have been evaluated on images captured in well-controlled scenarios. However, real-world meter reading presents unconstrained scenarios that are way more challenging due to dirt, various lighting conditions, scale variations, in-plane and out-of-plane rotations, among other factors. In this work, we present an end-to-end approach for AMR focusing on unconstrained scenarios. Our main contribution is the insertion of a new stage in the AMR pipeline, called corner detection and counter classification, which enables the counter region to be rectified -- as well as the rejection of illegible/faulty meters -- prior to the recognition stage. We also introduce a publicly available dataset, called Copel-AMR, that contains 12,500 meter images acquired in the field by the service company's employees themselves, including 2,500 images of faulty meters or cases where the reading is illegible due to occlusions. Experimental evaluation demonstrates that the proposed system, which has three networks operating in a cascaded mode, outperforms all baselines in terms of recognition rate while still being quite efficient. Moreover, as very few reading errors are tolerated in real-world applications, we show that our AMR system achieves impressive recognition rates (i.e., > 99%) when rejecting readings made with lower confidence values. △ Less

Submitted 12 May, 2021; v1 submitted 21 September, 2020; originally announced September 2020.

Journal ref: IEEE Access, vol. 9, pp. 67569-67584, 2021

arXiv:2006.13897 [pdf, other]

From form to information: Analysing built environments in different spatial cultures

Authors: Vinicius M. Netto, Edgardo Brigatti, Caio Cacholas

Abstract: Cities are different around the world, but does this fact have any relation to culture? The idea that urban form embodies idiosyncrasies related to cultural identities captures the imagination of many in urban studies, but it is an assumption yet to be carefully examined. Approaching spatial configurations in the built environment as a proxy of urban culture, this paper searches for differences po… ▽ More Cities are different around the world, but does this fact have any relation to culture? The idea that urban form embodies idiosyncrasies related to cultural identities captures the imagination of many in urban studies, but it is an assumption yet to be carefully examined. Approaching spatial configurations in the built environment as a proxy of urban culture, this paper searches for differences potentially consistent with specific regional cultures or cultures of planning in urban development. It does so focusing on the elementary components sha** cities: buildings and how they are aggregated in cellular complexes of built form. Exploring Shannon's work, we introduce an entropy measure to analyse the probability distribution of cellular arrangements in built form systems. We apply it to downtown areas of 45 cities from different regions of the world as a similarity measure to compare and cluster cities potentially consistent with specific spatial cultures. Findings suggest a classification scheme that sheds further light on what we call the "cultural hypothesis": the possibility that different cultures and regions find different ways of ordering space. △ Less

Submitted 26 June, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

Comments: 19 pages, 10 figures

arXiv:2004.13675 [pdf, other]

Cloud Network Slicing: A systematic map** study from scientific publications

Authors: Leandro C. de Almeida, Paulo Ditarso Maciel Jr, Fábio L. Verdi

Abstract: Cloud Network Slicing is a new research area that brings together cloud computing and network slicing in an end-to-end environment. In this context, understanding the existing scientific contributions and gaps is crucial to driving new research in this field. This article presents a complete quantitative analysis of scientific publications on the Cloud Network Slicing, based on a systematic mappin… ▽ More Cloud Network Slicing is a new research area that brings together cloud computing and network slicing in an end-to-end environment. In this context, understanding the existing scientific contributions and gaps is crucial to driving new research in this field. This article presents a complete quantitative analysis of scientific publications on the Cloud Network Slicing, based on a systematic map** study. The results indicate the situation of the last ten years in the research area, presenting data such as industry involvement, most cited articles, most active researchers, publications over the years, main places of publication, as well as well-developed areas and gaps. Future guidelines for scientific research are also discussed. △ Less

Submitted 4 May, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

arXiv:2003.06634 [pdf, ps, other]

Text Similarity Using Word Embeddings to Classify Misinformation

Authors: Caio Almeida, Débora Santos

Abstract: Fake news is a growing problem in the last years, especially during elections. It's hard work to identify what is true and what is false among all the user generated content that circulates every day. Technology can help with that work and optimize the fact-checking process. In this work, we address the challenge of finding similar content in order to be able to suggest to a fact-checker articles… ▽ More Fake news is a growing problem in the last years, especially during elections. It's hard work to identify what is true and what is false among all the user generated content that circulates every day. Technology can help with that work and optimize the fact-checking process. In this work, we address the challenge of finding similar content in order to be able to suggest to a fact-checker articles that could have been verified before and thus avoid that the same information is verified more than once. This is especially important in collaborative approaches to fact-checking where members of large teams will not know what content others have already fact-checked. △ Less

Submitted 14 March, 2020; originally announced March 2020.

arXiv:2002.05988 [pdf, other]

doi 10.1145/3394486.3403361

Interleaved Sequence RNNs for Fraud Detection

Authors: Bernardo Branco, Pedro Abreu, Ana Sofia Gomes, Mariana S. C. Almeida, João Tiago Ascensão, Pedro Bizarro

Abstract: Payment card fraud causes multibillion dollar losses for banks and merchants worldwide, often fueling complex criminal activities. To address this, many real-time fraud detection systems use tree-based models, demanding complex feature engineering systems to efficiently enrich transactions with historical data while complying with millisecond-level latencies. In this work, we do not require thos… ▽ More Payment card fraud causes multibillion dollar losses for banks and merchants worldwide, often fueling complex criminal activities. To address this, many real-time fraud detection systems use tree-based models, demanding complex feature engineering systems to efficiently enrich transactions with historical data while complying with millisecond-level latencies. In this work, we do not require those expensive features by using recurrent neural networks and treating payments as an interleaved sequence, where the history of each card is an unbounded, irregular sub-sequence. We present a complete RNN framework to detect fraud in real-time, proposing an efficient ML pipeline from preprocessing to deployment. We show that these feature-free, multi-sequence RNNs outperform state-of-the-art models saving millions of dollars in fraud detection and using fewer computational resources. △ Less

Submitted 17 June, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

Comments: 9 pages, 4 figures, to appear in SIGKDD'20 Industry Track

arXiv:1510.09092 [pdf, ps, other]

Formalization of context-free language theory

Authors: Marcus V. M. Ramos, Ruy J. G. B. de Queiroz, Nelma Moreira, José Carlos Bacelar Almeida

Abstract: Context-free language theory is a subject of high importance in computer language processing technology as well as in formal language theory. This paper presents a formalization, using the Coq proof assistant, of fundamental results related to context-free grammars and languages. These include closure properties (union, concatenation and Kleene star), grammar simplification (elimination of useless… ▽ More Context-free language theory is a subject of high importance in computer language processing technology as well as in formal language theory. This paper presents a formalization, using the Coq proof assistant, of fundamental results related to context-free grammars and languages. These include closure properties (union, concatenation and Kleene star), grammar simplification (elimination of useless symbols inaccessible symbols, empty rules and unit rules) and the existence of a Chomsky Normal Form for context-free grammars. △ Less

Submitted 30 October, 2015; originally announced October 2015.

arXiv:1510.04748 [pdf, ps, other]

Formalization of the pum** lemma for context-free languages

Authors: Marcus V. M. Ramos, Ruy J. G. B. de Queiroz, Nelma Moreira, José Carlos Bacelar Almeida

Abstract: Context-free languages (CFLs) are highly important in computer language processing technology as well as in formal language theory. The Pum** Lemma is a property that is valid for all context-free languages, and is used to show the existence of non context-free languages. This paper presents a formalization, using the Coq proof assistant, of the Pum** Lemma for context-free languages. Context-free languages (CFLs) are highly important in computer language processing technology as well as in formal language theory. The Pum** Lemma is a property that is valid for all context-free languages, and is used to show the existence of non context-free languages. This paper presents a formalization, using the Coq proof assistant, of the Pum** Lemma for context-free languages. △ Less

Submitted 15 October, 2015; originally announced October 2015.

arXiv:1210.2687 [pdf, ps, other]

doi 10.1109/TIP.2013.2258354

Deconvolving Images with Unknown Boundaries Using the Alternating Direction Method of Multipliers

Authors: Mariana S. C. Almeida, Mário A. T. Figueiredo

Abstract: The alternating direction method of multipliers (ADMM) has recently sparked interest as a flexible and efficient optimization tool for imaging inverse problems, namely deconvolution and reconstruction under non-smooth convex regularization. ADMM achieves state-of-the-art speed by adopting a divide and conquer strategy, wherein a hard problem is split into simpler, efficiently solvable sub-problems… ▽ More The alternating direction method of multipliers (ADMM) has recently sparked interest as a flexible and efficient optimization tool for imaging inverse problems, namely deconvolution and reconstruction under non-smooth convex regularization. ADMM achieves state-of-the-art speed by adopting a divide and conquer strategy, wherein a hard problem is split into simpler, efficiently solvable sub-problems (e.g., using fast Fourier or wavelet transforms, or simple proximity operators). In deconvolution, one of these sub-problems involves a matrix inversion (i.e., solving a linear system), which can be done efficiently (in the discrete Fourier domain) if the observation operator is circulant, i.e., under periodic boundary conditions. This paper extends ADMM-based image deconvolution to the more realistic scenario of unknown boundary, where the observation operator is modeled as the composition of a convolution (with arbitrary boundary conditions) with a spatial mask that keeps only pixels that do not depend on the unknown boundary. The proposed approach also handles, at no extra cost, problems that combine the recovery of missing pixels (i.e., inpainting) with deconvolution. We show that the resulting algorithms inherit the convergence guarantees of ADMM and illustrate its performance on non-periodic deblurring (with and without inpainting of interior pixels) under total-variation and frame-based regularization. △ Less

Submitted 7 March, 2013; v1 submitted 9 October, 2012; originally announced October 2012.

Comments: Submitted to the IEEE Transactions on Image Processing in August 2012

MSC Class: 68U10 ACM Class: I.4.4

arXiv:1209.6580 [pdf, other]

Testing MapReduce-Based Systems

Authors: João Eugenio Marynowski, Michel Albonico, Eduardo Cunha de Almeida, Gerson Sunyé

Abstract: MapReduce (MR) is the most popular solution to build applications for large-scale data processing. These applications are often deployed on large clusters of commodity machines, where failures happen constantly due to bugs, hardware problems, and outages. Testing MR-based systems is hard, since it is needed a great effort of test harness to execute distributed test cases upon failures. In this pap… ▽ More MapReduce (MR) is the most popular solution to build applications for large-scale data processing. These applications are often deployed on large clusters of commodity machines, where failures happen constantly due to bugs, hardware problems, and outages. Testing MR-based systems is hard, since it is needed a great effort of test harness to execute distributed test cases upon failures. In this paper, we present a novel testing solution to tackle this issue called HadoopTest. This solution is based on a scalable harness approach, where distributed tester components are hung around each map and reduce worker (i.e., node). Testers are allowed to stimulate each worker to inject failures on them, monitor their behavior, and validate testing results. HadoopTest was used to test two applications bundled into Hadoop, the Apache open source MapReduce implementation. Our initial implementation demonstrates promising results, with HadoopTest coordinating test cases across distributed MapReduce workers, and finding bugs. △ Less

Submitted 7 February, 2013; v1 submitted 28 September, 2012; originally announced September 2012.

arXiv:1206.6273 [pdf, other]

IACTalks: an on-line archive of astronomy-related seminars

Authors: Johan H. Knapen, Jorge A. Pérez Prieto, Tariq Shahbaz, Anna Ferré-Mateu, Nicola Caon, Cristina Ramos Almeida, Brandon Tingley, Valentina Luridiana, Inés Flores-Cacho, Orlagh Creevey, Arturo Manchado Torres, Ignacio Trujillo, Maria Rosa Zapatero Osorio, Francisco Sánchez Martínez, Francisco López Molina, Gabriel Pérez Díaz, Miguel Briganti, Inés Bonet

Abstract: We present IACTalks, a free and open access seminars archive (http://iactalks.iac.es) aimed at promoting astronomy and the exchange of ideas by providing high-quality scientific seminars to the astronomical community. The archive of seminars and talks given at the Instituto de Astrofiísica de Canarias goes back to 2008. Over 360 talks and seminars are now freely available by streaming over the int… ▽ More We present IACTalks, a free and open access seminars archive (http://iactalks.iac.es) aimed at promoting astronomy and the exchange of ideas by providing high-quality scientific seminars to the astronomical community. The archive of seminars and talks given at the Instituto de Astrofiísica de Canarias goes back to 2008. Over 360 talks and seminars are now freely available by streaming over the internet. We describe the user interface, which includes two video streams, one showing the speaker, the other the presentation. A search function is available, and seminars are indexed by keywords and in some cases by series, such as special training courses or the 2011 Winter School of Astrophysics, on secular evolution of galaxies. The archive is made available as an open resource, to be used by scientists and the public. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: 2 pages, 2 figures

arXiv:1108.0729 [pdf]

Estudo de Viabilidade de uma Plataforma de Baixo Custo para Data Warehouse

Authors: Eduardo Cunha de Almeida

Abstract: Often corporations need tools to improve their decision making in a competitive market. In general, these tools are based on data warehouse platforms to mange and analyze large amounts of data. However, several of these corporations do not have enough resources to buy such platforms because of the high cost. This work is dedicated to a feasibility study of a low cost platform to data warehouse. We… ▽ More Often corporations need tools to improve their decision making in a competitive market. In general, these tools are based on data warehouse platforms to mange and analyze large amounts of data. However, several of these corporations do not have enough resources to buy such platforms because of the high cost. This work is dedicated to a feasibility study of a low cost platform to data warehouse. We consider as a low cost platform the use of open source software like the PostgreSQL database system and the GNU/Linux operational system. We verify the feasibility of this platform by executing two benchmarks that simulate a data warehouse workload. The workload reproduces a multi-user environment with the execution of complex queries, which executes: aggregations, nested sub queries, multi joins, in-line views and more. Considering the results we were able to highlight some problems on the PostgreSQL database system, and discuss improvements in the context of data warehouse. △ Less

Submitted 2 August, 2011; originally announced August 2011.

Comments: Masters dissertation, 90 pages, 2004. (Advisor: Marcos Sfair Sunyé); Masters dissertation, Universidade Federal do Paraná, 2004

ACM Class: H.2.7

Showing 1–22 of 22 results for author: Almeida, C