-
Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations
Authors:
José Luiz Nunes,
Guilherme F. C. F. Almeida,
Marcelo de Araujo,
Simone D. J. Barbosa
Abstract:
Large language models (LLMs) have taken centre stage in debates on Artificial Intelligence. Yet there remains a gap in how to assess LLMs' conformity to important human values. In this paper, we investigate whether state-of-the-art LLMs, GPT-4 and Claude 2.1 (Gemini Pro and LLAMA 2 did not generate valid results) are moral hypocrites. We employ two research instruments based on the Moral Foundatio…
▽ More
Large language models (LLMs) have taken centre stage in debates on Artificial Intelligence. Yet there remains a gap in how to assess LLMs' conformity to important human values. In this paper, we investigate whether state-of-the-art LLMs, GPT-4 and Claude 2.1 (Gemini Pro and LLAMA 2 did not generate valid results) are moral hypocrites. We employ two research instruments based on the Moral Foundations Theory: (i) the Moral Foundations Questionnaire (MFQ), which investigates which values are considered morally relevant in abstract moral judgements; and (ii) the Moral Foundations Vignettes (MFVs), which evaluate moral cognition in concrete scenarios related to each moral foundation. We characterise conflicts in values between these different abstractions of moral evaluation as hypocrisy. We found that both models displayed reasonable consistency within each instrument compared to humans, but they displayed contradictory and hypocritical behaviour when we compared the abstract values present in the MFQ to the evaluation of concrete moral violations of the MFV.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
DESiRED -- Dynamic, Enhanced, and Smart iRED: A P4-AQM with Deep Reinforcement Learning and In-band Network Telemetry
Authors:
Leandro C. de Almeida,
Washington Rodrigo Dias da Silva,
Thiago C. Tavares,
Rafael Pasquini,
Chrysa Papagianni,
Fábio L. Verdi
Abstract:
Active Queue Management (AQM) is a mechanism employed to alleviate transient congestion in network device buffers, such as routers and switches. Traditional AQM algorithms use fixed thresholds, like target delay or queue occupancy, to compute random packet drop probabilities. A very small target delay can increase packet losses and reduce link utilization, while a large target delay may increase q…
▽ More
Active Queue Management (AQM) is a mechanism employed to alleviate transient congestion in network device buffers, such as routers and switches. Traditional AQM algorithms use fixed thresholds, like target delay or queue occupancy, to compute random packet drop probabilities. A very small target delay can increase packet losses and reduce link utilization, while a large target delay may increase queueing delays while lowering drop probability. Due to dynamic network traffic characteristics, where traffic fluctuations can lead to significant queue variations, maintaining a fixed threshold AQM may not suit all applications. Consequently, we explore the question: \textit{What is the ideal threshold (target delay) for AQMs?} In this work, we introduce DESiRED (Dynamic, Enhanced, and Smart iRED), a P4-based AQM that leverages precise network feedback from In-band Network Telemetry (INT) to feed a Deep Reinforcement Learning (DRL) model. This model dynamically adjusts the target delay based on rewards that maximize application Quality of Service (QoS). We evaluate DESiRED in a realistic P4-based test environment running an MPEG-DASH service. Our findings demonstrate up to a 90x reduction in video stall and a 42x increase in high-resolution video playback quality when the target delay is adjusted dynamically by DESiRED.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
iRED: A disaggregated P4-AQM fully implemented in programmable data plane hardware
Authors:
Leandro C. de Almeida,
Rafael Pasquini,
Chrysa Papagianni,
Fábio L. Verdi
Abstract:
Routers employ queues to temporarily hold packets when the scheduler cannot immediately process them. Congestion occurs when the arrival rate of packets exceeds the processing capacity, leading to increased queueing delay. Over time, Active Queue Management (AQM) strategies have focused on directly draining packets from queues to alleviate congestion and reduce queuing delay. On Programmable Data…
▽ More
Routers employ queues to temporarily hold packets when the scheduler cannot immediately process them. Congestion occurs when the arrival rate of packets exceeds the processing capacity, leading to increased queueing delay. Over time, Active Queue Management (AQM) strategies have focused on directly draining packets from queues to alleviate congestion and reduce queuing delay. On Programmable Data Plane (PDP) hardware, AQMs traditionally reside in the Egress pipeline due to the availability of queue delay information there. We argue that this approach wastes the router's resources because the dropped packet has already consumed the entire pipeline of the device. In this work, we propose ingress Random Early Detection (iRED), a more efficient approach that addresses the Egress drop problem. iRED is a disaggregated P4-AQM fully implemented in programmable data plane hardware and also supports Low Latency, Low Loss, and Scalable Throughput (L4S) framework, saving device pipeline resources by drop** packets in the Ingress block. To evaluate iRED, we conducted three experiments using a Tofino2 programmable switch: i) An in-depth analysis of state-of-the-art AQMs on PDP hardware, using 12 different network configurations varying in bandwidth, Round-Trip Time (RTT), and Maximum Transmission Unit (MTU). The results demonstrate that iRED can significantly reduce router resource consumption, with up to a 10x reduction in memory usage, 12x fewer processing cycles, and 8x less power consumption for the same traffic load; ii) A performance evaluation regarding the L4S framework. The results prove that iRED achieves fairness in bandwidth usage for different types of traffic (classic and scalable); iii) A comprehensive analysis of the QoS in a real setup of a DASH) technology. iRED demonstrated up to a 2.34x improvement in FPS and a 4.77x increase in the video player buffer fill.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Exploring the psychology of LLMs' Moral and Legal Reasoning
Authors:
Guilherme F. C. F. Almeida,
José Luiz Nunes,
Neele Engelmann,
Alex Wiegmann,
Marcelo de Araújo
Abstract:
Large language models (LLMs) exhibit expert-level performance in tasks across a wide range of different domains. Ethical issues raised by LLMs and the need to align future versions makes it important to know how state of the art models reason about moral and legal issues. In this paper, we employ the methods of experimental psychology to probe into this question. We replicate eight studies from th…
▽ More
Large language models (LLMs) exhibit expert-level performance in tasks across a wide range of different domains. Ethical issues raised by LLMs and the need to align future versions makes it important to know how state of the art models reason about moral and legal issues. In this paper, we employ the methods of experimental psychology to probe into this question. We replicate eight studies from the experimental literature with instances of Google's Gemini Pro, Anthropic's Claude 2.1, OpenAI's GPT-4, and Meta's Llama 2 Chat 70b. We find that alignment with human responses shifts from one experiment to another, and that models differ amongst themselves as to their overall alignment, with GPT-4 taking a clear lead over all other models we tested. Nonetheless, even when LLM-generated responses are highly correlated to human responses, there are still systematic differences, with a tendency for models to exaggerate effects that are present among humans, in part by reducing variance. This recommends caution with regards to proposals of replacing human participants with current state-of-the-art LLMs in psychological research and highlights the need for further research about the distinctive aspects of machine psychology.
△ Less
Submitted 4 March, 2024; v1 submitted 2 August, 2023;
originally announced August 2023.
-
Negative Effects of Gamification in Education Software: Systematic Map** and Practitioner Perceptions
Authors:
Clauvin Almeida,
Marcos Kalinowski,
Anderson Uchoa,
Bruno Feijo
Abstract:
Context: While most research shows positive effects of gamification, the focus on its adverse effects is considerably smaller and further understanding is needed. Objective: To provide a comprehensive overview on research reporting negative effects of game design elements and to provide insights into the awareness of developers on these effects and into how they could be considered in practice. Me…
▽ More
Context: While most research shows positive effects of gamification, the focus on its adverse effects is considerably smaller and further understanding is needed. Objective: To provide a comprehensive overview on research reporting negative effects of game design elements and to provide insights into the awareness of developers on these effects and into how they could be considered in practice. Method: We conducted a systematic map** study of the negative effects of game design elements on education/learning systems. We also held a focus group discussion with developers of a gamified software, discussing the map** study results with regard to their awareness and perceptions on the reported negative effects in practice. Results: The map** study revealed 87 papers reporting undesired effects of game design elements. We found that badges, leaderboards, competitions, and points are the game design elements most often reported as causing negative effects. The most cited negative effects were lack of effect, worsened performance, motivational issues, lack of understanding, and irrelevance. The ethical issues of gaming the system and cheating were also often reported. As part of our results, we map the relations between game design elements and the negative effects that they may cause. The focus group revealed that developers were not aware of many of the possible negative effects and that they consider this type of information useful. The discussion revealed their agreement on some of those potential negative effects and also some positive counterparts. Conclusions: Gamification, when properly applied, can have positive effects on education/learning software. However, gamified software is also prone to generate harmful effects. Revealing and discussing potentially negative effects can help to make more informed decisions considering their trade-off with respect to the expected benefits.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
A case study of proactive auto-scaling for an ecommerce workload
Authors:
Marcella Medeiros Siqueira Coutinho de Almeida,
Thiago Emmanuel Pereira,
Fabio Morais
Abstract:
Preliminary data obtained from a partnership between the Federal University of Campina Grande and an ecommerce company indicates that some applications have issues when dealing with variable demand. This happens because a delay in scaling resources leads to performance degradation and, in literature, is a matter usually treated by improving the auto-scaling. To better understand the current state-…
▽ More
Preliminary data obtained from a partnership between the Federal University of Campina Grande and an ecommerce company indicates that some applications have issues when dealing with variable demand. This happens because a delay in scaling resources leads to performance degradation and, in literature, is a matter usually treated by improving the auto-scaling. To better understand the current state-of-the-art on this subject, we re-evaluate an auto-scaling algorithm proposed in the literature, in the context of ecommerce, using a long-term real workload. Experimental results show that our proactive approach is able to achieve an accuracy of up to 94 percent and led the auto-scaling to a better performance than the reactive approach currently used by the ecommerce company.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
A Robust Learning Methodology for Uncertainty-aware Scientific Machine Learning models
Authors:
Erbet Costa Almeida,
Carine de Menezes Rebello,
Marcio Fontana,
Leizer Schnitman,
Idelfonso Bessa dos Reis Nogueira
Abstract:
Robust learning is an important issue in Scientific Machine Learning (SciML). There are several works in the literature addressing this topic. However, there is an increasing demand for methods that can simultaneously consider all the different uncertainty components involved in SciML model identification. Hence, this work proposes a comprehensive methodology for uncertainty evaluation of the SciM…
▽ More
Robust learning is an important issue in Scientific Machine Learning (SciML). There are several works in the literature addressing this topic. However, there is an increasing demand for methods that can simultaneously consider all the different uncertainty components involved in SciML model identification. Hence, this work proposes a comprehensive methodology for uncertainty evaluation of the SciML that also considers several possible sources of uncertainties involved in the identification process. The uncertainties considered in the proposed method are the absence of theory and causal models, the sensitiveness to data corruption or imperfection, and the computational effort. Therefore, it was possible to provide an overall strategy for the uncertainty-aware models in the SciML field. The methodology is validated through a case study, develo** a Soft Sensor for a polymerization reactor. The results demonstrated that the identified Soft Sensor are robust for uncertainties, corroborating with the consistency of the proposed approach.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
When SRv6 meets 5G Core: Implementation and Deployment of a Network Service Chaining Function in SmartNICs
Authors:
Guilherme Matos,
Fabio Luciano Verdi,
Luis Miguel Contreras,
Leandro C. de Almeida
Abstract:
Currently, we have witnessed a myriad of solutions that benefit from programmable hardware. The 5G Core (5GC) can and should also benefit from such paradigm to offload certain functions to the dataplane. In this work, we designed and implemented a P4-based solution for traffic identification and chaining using the Netronome Agilo SmartNIC. The solution here presented is deployed in-between the RAN…
▽ More
Currently, we have witnessed a myriad of solutions that benefit from programmable hardware. The 5G Core (5GC) can and should also benefit from such paradigm to offload certain functions to the dataplane. In this work, we designed and implemented a P4-based solution for traffic identification and chaining using the Netronome Agilo SmartNIC. The solution here presented is deployed in-between the RAN and UPF (User Plane Function) so that traffic coming from the RAN is identified and chained using SRv6 based on different rules defined by the control plane. The traffic identification and the construction of the SRv6 list of segments are done entirely in the SmartNIC. A minimalist Proof-of-Concept (PoC) was deployed and evaluated to show that this function is perfectly capable to build service function chainings in a transparent and efficient way.
△ Less
Submitted 26 July, 2021;
originally announced July 2021.
-
Multi-task fully convolutional network for tree species map** in dense forests using small training hyperspectral data
Authors:
Laura Elena Cué La Rosa,
Camile Sothe,
Raul Queiroz Feitosa,
Cláudia Maria de Almeida,
Marcos Benedito Schimalski,
Dario Augusto Borges Oliveira
Abstract:
This work proposes a multi-task fully convolutional architecture for tree species map** in dense forests from sparse and scarce polygon-level annotations using hyperspectral UAV-borne data. Our model implements a partial loss function that enables dense tree semantic labeling outcomes from non-dense training samples, and a distance regression complementary task that enforces tree crown boundary…
▽ More
This work proposes a multi-task fully convolutional architecture for tree species map** in dense forests from sparse and scarce polygon-level annotations using hyperspectral UAV-borne data. Our model implements a partial loss function that enables dense tree semantic labeling outcomes from non-dense training samples, and a distance regression complementary task that enforces tree crown boundary constraints and substantially improves the model performance. Our multi-task architecture uses a shared backbone network that learns common representations for both tasks and two task-specific decoders, one for the semantic segmentation output and one for the distance map regression. We report that introducing the complementary task boosts the semantic segmentation performance compared to the single-task counterpart in up to 11% reaching an average user's accuracy of 88.63% and an average producer's accuracy of 88.59%, achieving state-of-art performance for tree species classification in tropical forests.
△ Less
Submitted 6 September, 2021; v1 submitted 1 June, 2021;
originally announced June 2021.
-
Machine-checked ZKP for NP-relations: Formally Verified Security Proofs and Implementations of MPC-in-the-Head
Authors:
José Carlos Bacelar Almeida,
Manuel Barbosa,
Karim Eldefrawy,
Stéphane Graham-Lengrand,
Hugo Pacheco,
Vitor Pereira
Abstract:
MPC-in-the-Head (MitH) is a general framework that allows constructing efficient Zero Knowledge protocols for general NP-relations from secure multiparty computation (MPC) protocols. In this paper we give the first machine-checked implementation of this transformation. We begin with an EasyCrypt formalization of MitH that preserves the modular structure of MitH and can be instantiated with arbitra…
▽ More
MPC-in-the-Head (MitH) is a general framework that allows constructing efficient Zero Knowledge protocols for general NP-relations from secure multiparty computation (MPC) protocols. In this paper we give the first machine-checked implementation of this transformation. We begin with an EasyCrypt formalization of MitH that preserves the modular structure of MitH and can be instantiated with arbitrary MPC protocols that satisfy standard notions of security, which allows us to leverage an existing machine-checked secret-sharing-based MPC protocol development. The resulting concrete ZK protocol is proved secure and correct in EasyCrypt. Using a recently developed code extraction mechanism for EasyCrypt we synthesize a formally verified implementation of the protocol, which we benchmark to get an indication of the overhead associated with our formalization choices and code extraction mechanism.
△ Less
Submitted 19 May, 2021; v1 submitted 12 April, 2021;
originally announced April 2021.
-
Multilingual Email Zoning
Authors:
Bruno Jardim,
Ricardo Rei,
Mariana S. C. Almeida
Abstract:
The segmentation of emails into functional zones (also dubbed email zoning) is a relevant preprocessing step for most NLP tasks that deal with emails. However, despite the multilingual character of emails and their applications, previous literature regarding email zoning corpora and systems was developed essentially for English.
In this paper, we analyse the existing email zoning corpora and pro…
▽ More
The segmentation of emails into functional zones (also dubbed email zoning) is a relevant preprocessing step for most NLP tasks that deal with emails. However, despite the multilingual character of emails and their applications, previous literature regarding email zoning corpora and systems was developed essentially for English.
In this paper, we analyse the existing email zoning corpora and propose a new multilingual benchmark composed of 625 emails in Portuguese, Spanish and French. Moreover, we introduce OKAPI, the first multilingual email segmentation model based on a language agnostic sentence encoder. Besides generalizing well for unseen languages, our model is competitive with current English benchmarks, and reached new state-of-the-art performances for domain adaptation tasks in English.
△ Less
Submitted 13 February, 2021; v1 submitted 31 January, 2021;
originally announced February 2021.
-
Towards Image-based Automatic Meter Reading in Unconstrained Scenarios: A Robust and Efficient Approach
Authors:
Rayson Laroca,
Alessandra B. Araujo,
Luiz A. Zanlorensi,
Eduardo C. de Almeida,
David Menotti
Abstract:
Existing approaches for image-based Automatic Meter Reading (AMR) have been evaluated on images captured in well-controlled scenarios. However, real-world meter reading presents unconstrained scenarios that are way more challenging due to dirt, various lighting conditions, scale variations, in-plane and out-of-plane rotations, among other factors. In this work, we present an end-to-end approach fo…
▽ More
Existing approaches for image-based Automatic Meter Reading (AMR) have been evaluated on images captured in well-controlled scenarios. However, real-world meter reading presents unconstrained scenarios that are way more challenging due to dirt, various lighting conditions, scale variations, in-plane and out-of-plane rotations, among other factors. In this work, we present an end-to-end approach for AMR focusing on unconstrained scenarios. Our main contribution is the insertion of a new stage in the AMR pipeline, called corner detection and counter classification, which enables the counter region to be rectified -- as well as the rejection of illegible/faulty meters -- prior to the recognition stage. We also introduce a publicly available dataset, called Copel-AMR, that contains 12,500 meter images acquired in the field by the service company's employees themselves, including 2,500 images of faulty meters or cases where the reading is illegible due to occlusions. Experimental evaluation demonstrates that the proposed system, which has three networks operating in a cascaded mode, outperforms all baselines in terms of recognition rate while still being quite efficient. Moreover, as very few reading errors are tolerated in real-world applications, we show that our AMR system achieves impressive recognition rates (i.e., > 99%) when rejecting readings made with lower confidence values.
△ Less
Submitted 12 May, 2021; v1 submitted 21 September, 2020;
originally announced September 2020.
-
From form to information: Analysing built environments in different spatial cultures
Authors:
Vinicius M. Netto,
Edgardo Brigatti,
Caio Cacholas
Abstract:
Cities are different around the world, but does this fact have any relation to culture? The idea that urban form embodies idiosyncrasies related to cultural identities captures the imagination of many in urban studies, but it is an assumption yet to be carefully examined. Approaching spatial configurations in the built environment as a proxy of urban culture, this paper searches for differences po…
▽ More
Cities are different around the world, but does this fact have any relation to culture? The idea that urban form embodies idiosyncrasies related to cultural identities captures the imagination of many in urban studies, but it is an assumption yet to be carefully examined. Approaching spatial configurations in the built environment as a proxy of urban culture, this paper searches for differences potentially consistent with specific regional cultures or cultures of planning in urban development. It does so focusing on the elementary components sha** cities: buildings and how they are aggregated in cellular complexes of built form. Exploring Shannon's work, we introduce an entropy measure to analyse the probability distribution of cellular arrangements in built form systems. We apply it to downtown areas of 45 cities from different regions of the world as a similarity measure to compare and cluster cities potentially consistent with specific spatial cultures. Findings suggest a classification scheme that sheds further light on what we call the "cultural hypothesis": the possibility that different cultures and regions find different ways of ordering space.
△ Less
Submitted 26 June, 2020; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Cloud Network Slicing: A systematic map** study from scientific publications
Authors:
Leandro C. de Almeida,
Paulo Ditarso Maciel Jr,
Fábio L. Verdi
Abstract:
Cloud Network Slicing is a new research area that brings together cloud computing and network slicing in an end-to-end environment. In this context, understanding the existing scientific contributions and gaps is crucial to driving new research in this field. This article presents a complete quantitative analysis of scientific publications on the Cloud Network Slicing, based on a systematic mappin…
▽ More
Cloud Network Slicing is a new research area that brings together cloud computing and network slicing in an end-to-end environment. In this context, understanding the existing scientific contributions and gaps is crucial to driving new research in this field. This article presents a complete quantitative analysis of scientific publications on the Cloud Network Slicing, based on a systematic map** study. The results indicate the situation of the last ten years in the research area, presenting data such as industry involvement, most cited articles, most active researchers, publications over the years, main places of publication, as well as well-developed areas and gaps. Future guidelines for scientific research are also discussed.
△ Less
Submitted 4 May, 2020; v1 submitted 28 April, 2020;
originally announced April 2020.
-
Text Similarity Using Word Embeddings to Classify Misinformation
Authors:
Caio Almeida,
Débora Santos
Abstract:
Fake news is a growing problem in the last years, especially during elections. It's hard work to identify what is true and what is false among all the user generated content that circulates every day. Technology can help with that work and optimize the fact-checking process. In this work, we address the challenge of finding similar content in order to be able to suggest to a fact-checker articles…
▽ More
Fake news is a growing problem in the last years, especially during elections. It's hard work to identify what is true and what is false among all the user generated content that circulates every day. Technology can help with that work and optimize the fact-checking process. In this work, we address the challenge of finding similar content in order to be able to suggest to a fact-checker articles that could have been verified before and thus avoid that the same information is verified more than once. This is especially important in collaborative approaches to fact-checking where members of large teams will not know what content others have already fact-checked.
△ Less
Submitted 14 March, 2020;
originally announced March 2020.
-
Interleaved Sequence RNNs for Fraud Detection
Authors:
Bernardo Branco,
Pedro Abreu,
Ana Sofia Gomes,
Mariana S. C. Almeida,
João Tiago Ascensão,
Pedro Bizarro
Abstract:
Payment card fraud causes multibillion dollar losses for banks and merchants worldwide, often fueling complex criminal activities. To address this, many real-time fraud detection systems use tree-based models, demanding complex feature engineering systems to efficiently enrich transactions with historical data while complying with millisecond-level latencies.
In this work, we do not require thos…
▽ More
Payment card fraud causes multibillion dollar losses for banks and merchants worldwide, often fueling complex criminal activities. To address this, many real-time fraud detection systems use tree-based models, demanding complex feature engineering systems to efficiently enrich transactions with historical data while complying with millisecond-level latencies.
In this work, we do not require those expensive features by using recurrent neural networks and treating payments as an interleaved sequence, where the history of each card is an unbounded, irregular sub-sequence. We present a complete RNN framework to detect fraud in real-time, proposing an efficient ML pipeline from preprocessing to deployment.
We show that these feature-free, multi-sequence RNNs outperform state-of-the-art models saving millions of dollars in fraud detection and using fewer computational resources.
△ Less
Submitted 17 June, 2020; v1 submitted 14 February, 2020;
originally announced February 2020.
-
Formalization of context-free language theory
Authors:
Marcus V. M. Ramos,
Ruy J. G. B. de Queiroz,
Nelma Moreira,
José Carlos Bacelar Almeida
Abstract:
Context-free language theory is a subject of high importance in computer language processing technology as well as in formal language theory. This paper presents a formalization, using the Coq proof assistant, of fundamental results related to context-free grammars and languages. These include closure properties (union, concatenation and Kleene star), grammar simplification (elimination of useless…
▽ More
Context-free language theory is a subject of high importance in computer language processing technology as well as in formal language theory. This paper presents a formalization, using the Coq proof assistant, of fundamental results related to context-free grammars and languages. These include closure properties (union, concatenation and Kleene star), grammar simplification (elimination of useless symbols inaccessible symbols, empty rules and unit rules) and the existence of a Chomsky Normal Form for context-free grammars.
△ Less
Submitted 30 October, 2015;
originally announced October 2015.
-
Formalization of the pum** lemma for context-free languages
Authors:
Marcus V. M. Ramos,
Ruy J. G. B. de Queiroz,
Nelma Moreira,
José Carlos Bacelar Almeida
Abstract:
Context-free languages (CFLs) are highly important in computer language processing technology as well as in formal language theory. The Pum** Lemma is a property that is valid for all context-free languages, and is used to show the existence of non context-free languages. This paper presents a formalization, using the Coq proof assistant, of the Pum** Lemma for context-free languages.
Context-free languages (CFLs) are highly important in computer language processing technology as well as in formal language theory. The Pum** Lemma is a property that is valid for all context-free languages, and is used to show the existence of non context-free languages. This paper presents a formalization, using the Coq proof assistant, of the Pum** Lemma for context-free languages.
△ Less
Submitted 15 October, 2015;
originally announced October 2015.
-
Deconvolving Images with Unknown Boundaries Using the Alternating Direction Method of Multipliers
Authors:
Mariana S. C. Almeida,
Mário A. T. Figueiredo
Abstract:
The alternating direction method of multipliers (ADMM) has recently sparked interest as a flexible and efficient optimization tool for imaging inverse problems, namely deconvolution and reconstruction under non-smooth convex regularization. ADMM achieves state-of-the-art speed by adopting a divide and conquer strategy, wherein a hard problem is split into simpler, efficiently solvable sub-problems…
▽ More
The alternating direction method of multipliers (ADMM) has recently sparked interest as a flexible and efficient optimization tool for imaging inverse problems, namely deconvolution and reconstruction under non-smooth convex regularization. ADMM achieves state-of-the-art speed by adopting a divide and conquer strategy, wherein a hard problem is split into simpler, efficiently solvable sub-problems (e.g., using fast Fourier or wavelet transforms, or simple proximity operators). In deconvolution, one of these sub-problems involves a matrix inversion (i.e., solving a linear system), which can be done efficiently (in the discrete Fourier domain) if the observation operator is circulant, i.e., under periodic boundary conditions. This paper extends ADMM-based image deconvolution to the more realistic scenario of unknown boundary, where the observation operator is modeled as the composition of a convolution (with arbitrary boundary conditions) with a spatial mask that keeps only pixels that do not depend on the unknown boundary. The proposed approach also handles, at no extra cost, problems that combine the recovery of missing pixels (i.e., inpainting) with deconvolution. We show that the resulting algorithms inherit the convergence guarantees of ADMM and illustrate its performance on non-periodic deblurring (with and without inpainting of interior pixels) under total-variation and frame-based regularization.
△ Less
Submitted 7 March, 2013; v1 submitted 9 October, 2012;
originally announced October 2012.
-
Testing MapReduce-Based Systems
Authors:
João Eugenio Marynowski,
Michel Albonico,
Eduardo Cunha de Almeida,
Gerson Sunyé
Abstract:
MapReduce (MR) is the most popular solution to build applications for large-scale data processing. These applications are often deployed on large clusters of commodity machines, where failures happen constantly due to bugs, hardware problems, and outages. Testing MR-based systems is hard, since it is needed a great effort of test harness to execute distributed test cases upon failures. In this pap…
▽ More
MapReduce (MR) is the most popular solution to build applications for large-scale data processing. These applications are often deployed on large clusters of commodity machines, where failures happen constantly due to bugs, hardware problems, and outages. Testing MR-based systems is hard, since it is needed a great effort of test harness to execute distributed test cases upon failures. In this paper, we present a novel testing solution to tackle this issue called HadoopTest. This solution is based on a scalable harness approach, where distributed tester components are hung around each map and reduce worker (i.e., node). Testers are allowed to stimulate each worker to inject failures on them, monitor their behavior, and validate testing results. HadoopTest was used to test two applications bundled into Hadoop, the Apache open source MapReduce implementation. Our initial implementation demonstrates promising results, with HadoopTest coordinating test cases across distributed MapReduce workers, and finding bugs.
△ Less
Submitted 7 February, 2013; v1 submitted 28 September, 2012;
originally announced September 2012.
-
IACTalks: an on-line archive of astronomy-related seminars
Authors:
Johan H. Knapen,
Jorge A. Pérez Prieto,
Tariq Shahbaz,
Anna Ferré-Mateu,
Nicola Caon,
Cristina Ramos Almeida,
Brandon Tingley,
Valentina Luridiana,
Inés Flores-Cacho,
Orlagh Creevey,
Arturo Manchado Torres,
Ignacio Trujillo,
Maria Rosa Zapatero Osorio,
Francisco Sánchez Martínez,
Francisco López Molina,
Gabriel Pérez Díaz,
Miguel Briganti,
Inés Bonet
Abstract:
We present IACTalks, a free and open access seminars archive (http://iactalks.iac.es) aimed at promoting astronomy and the exchange of ideas by providing high-quality scientific seminars to the astronomical community. The archive of seminars and talks given at the Instituto de Astrofiísica de Canarias goes back to 2008. Over 360 talks and seminars are now freely available by streaming over the int…
▽ More
We present IACTalks, a free and open access seminars archive (http://iactalks.iac.es) aimed at promoting astronomy and the exchange of ideas by providing high-quality scientific seminars to the astronomical community. The archive of seminars and talks given at the Instituto de Astrofiísica de Canarias goes back to 2008. Over 360 talks and seminars are now freely available by streaming over the internet. We describe the user interface, which includes two video streams, one showing the speaker, the other the presentation. A search function is available, and seminars are indexed by keywords and in some cases by series, such as special training courses or the 2011 Winter School of Astrophysics, on secular evolution of galaxies. The archive is made available as an open resource, to be used by scientists and the public.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Estudo de Viabilidade de uma Plataforma de Baixo Custo para Data Warehouse
Authors:
Eduardo Cunha de Almeida
Abstract:
Often corporations need tools to improve their decision making in a competitive market. In general, these tools are based on data warehouse platforms to mange and analyze large amounts of data. However, several of these corporations do not have enough resources to buy such platforms because of the high cost. This work is dedicated to a feasibility study of a low cost platform to data warehouse. We…
▽ More
Often corporations need tools to improve their decision making in a competitive market. In general, these tools are based on data warehouse platforms to mange and analyze large amounts of data. However, several of these corporations do not have enough resources to buy such platforms because of the high cost. This work is dedicated to a feasibility study of a low cost platform to data warehouse. We consider as a low cost platform the use of open source software like the PostgreSQL database system and the GNU/Linux operational system. We verify the feasibility of this platform by executing two benchmarks that simulate a data warehouse workload. The workload reproduces a multi-user environment with the execution of complex queries, which executes: aggregations, nested sub queries, multi joins, in-line views and more. Considering the results we were able to highlight some problems on the PostgreSQL database system, and discuss improvements in the context of data warehouse.
△ Less
Submitted 2 August, 2011;
originally announced August 2011.