-
ProcessGPT: Transforming Business Process Management with Generative Artificial Intelligence
Authors:
Amin Beheshti,
Jian Yang,
Quan Z. Sheng,
Boualem Benatallah,
Fabio Casati,
Schahram Dustdar,
Hamid Reza Motahari Nezhad,
Xuyun Zhang,
Shan Xue
Abstract:
Generative Pre-trained Transformer (GPT) is a state-of-the-art machine learning model capable of generating human-like text through natural language processing (NLP). GPT is trained on massive amounts of text data and uses deep learning techniques to learn patterns and relationships within the data, enabling it to generate coherent and contextually appropriate text. This position paper proposes us…
▽ More
Generative Pre-trained Transformer (GPT) is a state-of-the-art machine learning model capable of generating human-like text through natural language processing (NLP). GPT is trained on massive amounts of text data and uses deep learning techniques to learn patterns and relationships within the data, enabling it to generate coherent and contextually appropriate text. This position paper proposes using GPT technology to generate new process models when/if needed. We introduce ProcessGPT as a new technology that has the potential to enhance decision-making in data-centric and knowledge-intensive processes. ProcessGPT can be designed by training a generative pre-trained transformer model on a large dataset of business process data. This model can then be fine-tuned on specific process domains and trained to generate process flows and make decisions based on context and user input. The model can be integrated with NLP and machine learning techniques to provide insights and recommendations for process improvement. Furthermore, the model can automate repetitive tasks and improve process efficiency while enabling knowledge workers to communicate analysis findings, supporting evidence, and make decisions. ProcessGPT can revolutionize business process management (BPM) by offering a powerful tool for process augmentation, automation and improvement. Finally, we demonstrate how ProcessGPT can be a powerful tool for augmenting data engineers in maintaining data ecosystem processes within large bank organizations. Our scenario highlights the potential of this approach to improve efficiency, reduce costs, and enhance the quality of business operations through the automation of data-centric and knowledge-intensive processes. These results underscore the promise of ProcessGPT as a transformative technology for organizations looking to improve their process workflows.
△ Less
Submitted 28 May, 2023;
originally announced June 2023.
-
Crowdsourcing Diverse Paraphrases for Training Task-oriented Bots
Authors:
Jorge Ramírez,
Auday Berro,
Marcos Baez,
Boualem Benatallah,
Fabio Casati
Abstract:
A prominent approach to build datasets for training task-oriented bots is crowd-based paraphrasing. Current approaches, however, assume the crowd would naturally provide diverse paraphrases or focus only on lexical diversity. In this WiP we addressed an overlooked aspect of diversity, introducing an approach for guiding the crowdsourcing process towards paraphrases that are syntactically diverse.
A prominent approach to build datasets for training task-oriented bots is crowd-based paraphrasing. Current approaches, however, assume the crowd would naturally provide diverse paraphrases or focus only on lexical diversity. In this WiP we addressed an overlooked aspect of diversity, introducing an approach for guiding the crowdsourcing process towards paraphrases that are syntactically diverse.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
On the state of reporting in crowdsourcing experiments and a checklist to aid current practices
Authors:
Jorge Ramírez,
Burcu Sayin,
Marcos Baez,
Fabio Casati,
Luca Cernuzzi,
Boualem Benatallah,
Gianluca Demartini
Abstract:
Crowdsourcing is being increasingly adopted as a platform to run studies with human subjects. Running a crowdsourcing experiment involves several choices and strategies to successfully port an experimental design into an otherwise uncontrolled research environment, e.g., sampling crowd workers, map** experimental conditions to micro-tasks, or ensure quality contributions. While several guideline…
▽ More
Crowdsourcing is being increasingly adopted as a platform to run studies with human subjects. Running a crowdsourcing experiment involves several choices and strategies to successfully port an experimental design into an otherwise uncontrolled research environment, e.g., sampling crowd workers, map** experimental conditions to micro-tasks, or ensure quality contributions. While several guidelines inform researchers in these choices, guidance of how and what to report from crowdsourcing experiments has been largely overlooked. If under-reported, implementation choices constitute variability sources that can affect the experiment's reproducibility and prevent a fair assessment of research outcomes. In this paper, we examine the current state of reporting of crowdsourcing experiments and offer guidance to address associated reporting issues. We start by identifying sensible implementation choices, relying on existing literature and interviews with experts, to then extensively analyze the reporting of 171 crowdsourcing experiments. Informed by this process, we propose a checklist for reporting crowdsourcing experiments.
△ Less
Submitted 9 September, 2021; v1 submitted 28 July, 2021;
originally announced July 2021.
-
A Query Language for Summarizing and Analyzing Business Process Data
Authors:
Amin Beheshti,
Boualem Benatallah,
Hamid Reza Motahari-Nezhad,
Samira Ghodratnama,
Farhad Amouzgar
Abstract:
In modern enterprises, Business Processes (BPs) are realized over a mix of workflows, IT systems, Web services and direct collaborations of people. Accordingly, process data (i.e., BP execution data such as logs containing events, interaction messages and other process artifacts) is scattered across several systems and data sources, and increasingly show all typical properties of the Big Data. Und…
▽ More
In modern enterprises, Business Processes (BPs) are realized over a mix of workflows, IT systems, Web services and direct collaborations of people. Accordingly, process data (i.e., BP execution data such as logs containing events, interaction messages and other process artifacts) is scattered across several systems and data sources, and increasingly show all typical properties of the Big Data. Understanding the execution of process data is challenging as key business insights remain hidden in the interactions among process entities: most objects are interconnected, forming complex, heterogeneous but often semi-structured networks. In the context of business processes, we consider the Big Data problem as a massive number of interconnected data islands from personal, shared and business data. We present a framework to model process data as graphs, i.e., Process Graph, and present abstractions to summarize the process graph and to discover concept hierarchies for entities based on both data objects and their interactions in process graphs. We present a language, namely BP-SPARQL, for the explorative querying and understanding of process graphs from various user perspectives. We have implemented a scalable architecture for querying, exploration and analysis of process graphs. We report on experiments performed on both synthetic and real-world datasets that show the viability and efficiency of the approach.
△ Less
Submitted 23 May, 2021;
originally announced May 2021.
-
An Internet of Things Service Roadmap
Authors:
Athman Bouguettaya,
Quan Z. Sheng,
Boualem Benatallah,
Azadeh Ghari Neiat,
Sajib Mistry,
Aditya Ghose,
Surya Nepal,
Lina Yao
Abstract:
We propose a roadmap for leveraging the tremendous opportunities the Internet of Things (IoT) has to offer. We argue that the combination of the recent advances in service computing and IoT technology provide a unique framework for innovations not yet envisaged, as well as the emergence of yet-to-be-developed IoT applications. This roadmap covers: emerging novel IoT services, articulation of major…
▽ More
We propose a roadmap for leveraging the tremendous opportunities the Internet of Things (IoT) has to offer. We argue that the combination of the recent advances in service computing and IoT technology provide a unique framework for innovations not yet envisaged, as well as the emergence of yet-to-be-developed IoT applications. This roadmap covers: emerging novel IoT services, articulation of major research directions, and suggestion of a roadmap to guide the IoT and service computing community to address key IoT service challenges.
△ Less
Submitted 1 February, 2021;
originally announced March 2021.
-
On the impact of predicate complexity in crowdsourced classification tasks
Authors:
Jorge Ramírez,
Marcos Baez,
Fabio Casati,
Luca Cernuzzi,
Boualem Benatallah,
Ekaterina A. Taran,
Veronika A. Malanina
Abstract:
This paper explores and offers guidance on a specific and relevant problem in task design for crowdsourcing: how to formulate a complex question used to classify a set of items. In micro-task markets, classification is still among the most popular tasks. We situate our work in the context of information retrieval and multi-predicate classification, i.e., classifying a set of items based on a set o…
▽ More
This paper explores and offers guidance on a specific and relevant problem in task design for crowdsourcing: how to formulate a complex question used to classify a set of items. In micro-task markets, classification is still among the most popular tasks. We situate our work in the context of information retrieval and multi-predicate classification, i.e., classifying a set of items based on a set of conditions. Our experiments cover a wide range of tasks and domains, and also consider crowd workers alone and in tandem with machine learning classifiers. We provide empirical evidence into how the resulting classification performance is affected by different predicate formulation strategies, emphasizing the importance of predicate formulation as a task design dimension in crowdsourcing.
△ Less
Submitted 17 November, 2020; v1 submitted 5 November, 2020;
originally announced November 2020.
-
Challenges and strategies for running controlled crowdsourcing experiments
Authors:
Jorge Ramírez,
Marcos Baez,
Fabio Casati,
Luca Cernuzzi,
Boualem Benatallah
Abstract:
This paper reports on the challenges and lessons we learned while running controlled experiments in crowdsourcing platforms. Crowdsourcing is becoming an attractive technique to engage a diverse and large pool of subjects in experimental research, allowing researchers to achieve levels of scale and completion times that would otherwise not be feasible in lab settings. However, the scale and flexib…
▽ More
This paper reports on the challenges and lessons we learned while running controlled experiments in crowdsourcing platforms. Crowdsourcing is becoming an attractive technique to engage a diverse and large pool of subjects in experimental research, allowing researchers to achieve levels of scale and completion times that would otherwise not be feasible in lab settings. However, the scale and flexibility comes at the cost of multiple and sometimes unknown sources of bias and confounding factors that arise from technical limitations of crowdsourcing platforms and from the challenges of running controlled experiments in the "wild". In this paper, we take our experience in running systematic evaluations of task design as a motivating example to explore, describe, and quantify the potential impact of running uncontrolled crowdsourcing experiments and derive possible co** strategies. Among the challenges identified, we can mention sampling bias, controlling the assignment of subjects to experimental conditions, learning effects, and reliability of crowdsourcing results. According to our empirical studies, the impact of potential biases and confounding factors can amount to a 38\% loss in the utility of the data collected in uncontrolled settings; and it can significantly change the outcome of experiments. These issues ultimately inspired us to implement CrowdHub, a system that sits on top of major crowdsourcing platforms and allows researchers and practitioners to run controlled crowdsourcing projects.
△ Less
Submitted 5 November, 2020;
originally announced November 2020.
-
Chatbot integration in few patterns
Authors:
Marcos Baez,
Florian Daniel,
Fabio Casati,
Boualem Benatallah
Abstract:
Chatbots are software agents that are able to interact with humans in natural language. Their intuitive interaction paradigm is expected to significantly reshape the software landscape of tomorrow, while already today chatbots are invading a multitude of scenarios and contexts. This article takes a developer's perspective, identifies a set of architectural patterns that capture different chatbot i…
▽ More
Chatbots are software agents that are able to interact with humans in natural language. Their intuitive interaction paradigm is expected to significantly reshape the software landscape of tomorrow, while already today chatbots are invading a multitude of scenarios and contexts. This article takes a developer's perspective, identifies a set of architectural patterns that capture different chatbot integration scenarios, and reviews state-of-the-art development aids.
△ Less
Submitted 18 September, 2020; v1 submitted 7 September, 2020;
originally announced September 2020.
-
Automatic Generation of Chatbots for Conversational Web Browsing
Authors:
Pietro Chittò,
Marcos Baez,
Florian Daniel,
Boualem Benatallah
Abstract:
In this paper, we describe the foundations for generating a chatbot out of a website equipped with simple, bot-specific HTML annotations. The approach is part of what we call conversational web browsing, i.e., a dialog-based, natural language interaction with websites. The goal is to enable users to use content and functionality accessible through rendered UIs by "talking to websites" instead of b…
▽ More
In this paper, we describe the foundations for generating a chatbot out of a website equipped with simple, bot-specific HTML annotations. The approach is part of what we call conversational web browsing, i.e., a dialog-based, natural language interaction with websites. The goal is to enable users to use content and functionality accessible through rendered UIs by "talking to websites" instead of by operating the graphical UI using keyboard and mouse. The chatbot mediates between the user and the website, operates its graphical UI on behalf of the user, and informs the user about the state of interaction. We describe the conceptual vocabulary and annotation format, the supporting conversational middleware and techniques, and the implementation of a demo able to deliver conversational web browsing experiences through Amazon Alexa.
△ Less
Submitted 21 October, 2020; v1 submitted 19 August, 2020;
originally announced August 2020.
-
CrowdHub: Extending crowdsourcing platforms for the controlled evaluation of tasks designs
Authors:
Jorge Ramírez,
Simone Degiacomi,
Davide Zanella,
Marcos Baez,
Fabio Casati,
Boualem Benatallah
Abstract:
We present CrowdHub, a tool for running systematic evaluations of task designs on top of crowdsourcing platforms. The goal is to support the evaluation process, avoiding potential experimental biases that, according to our empirical studies, can amount to 38% loss in the utility of the collected dataset in uncontrolled settings. Using CrowdHub, researchers can map their experimental design and aut…
▽ More
We present CrowdHub, a tool for running systematic evaluations of task designs on top of crowdsourcing platforms. The goal is to support the evaluation process, avoiding potential experimental biases that, according to our empirical studies, can amount to 38% loss in the utility of the collected dataset in uncontrolled settings. Using CrowdHub, researchers can map their experimental design and automate the complex process of managing task execution over time while controlling for returning workers and crowd demographics, thus reducing bias, increasing utility of collected data, and making more efficient use of a limited pool of subjects.
△ Less
Submitted 10 September, 2019; v1 submitted 6 September, 2019;
originally announced September 2019.
-
Understanding the Impact of Text Highlighting in Crowdsourcing Tasks
Authors:
Jorge Ramírez,
Marcos Baez,
Fabio Casati,
Boualem Benatallah
Abstract:
Text classification is one of the most common goals of machine learning (ML) projects, and also one of the most frequent human intelligence tasks in crowdsourcing platforms. ML has mixed success in such tasks depending on the nature of the problem, while crowd-based classification has proven to be surprisingly effective, but can be expensive. Recently, hybrid text classification algorithms, combin…
▽ More
Text classification is one of the most common goals of machine learning (ML) projects, and also one of the most frequent human intelligence tasks in crowdsourcing platforms. ML has mixed success in such tasks depending on the nature of the problem, while crowd-based classification has proven to be surprisingly effective, but can be expensive. Recently, hybrid text classification algorithms, combining human computation and machine learning, have been proposed to improve accuracy and reduce costs. One way to do so is to have ML highlight or emphasize portions of text that it believes to be more relevant to the decision. Humans can then rely only on this text or read the entire text if the highlighted information is insufficient. In this paper, we investigate if and under what conditions highlighting selected parts of the text can (or cannot) improve classification cost and/or accuracy, and in general how it affects the process and outcome of the human intelligence tasks. We study this through a series of crowdsourcing experiments running over different datasets and with task designs imposing different cognitive demands. Our findings suggest that highlighting is effective in reducing classification effort but does not improve accuracy - and in fact, low-quality highlighting can decrease it.
△ Less
Submitted 6 September, 2019;
originally announced September 2019.
-
DARec: Deep Domain Adaptation for Cross-Domain Recommendation via Transferring Rating Patterns
Authors:
Feng Yuan,
Lina Yao,
Boualem Benatallah
Abstract:
Cross-domain recommendation has long been one of the major topics in recommender systems. Recently, various deep models have been proposed to transfer the learned knowledge across domains, but most of them focus on extracting abstract transferable features from auxilliary contents, e.g., images and review texts, and the patterns in the rating matrix itself is rarely touched. In this work, inspired…
▽ More
Cross-domain recommendation has long been one of the major topics in recommender systems. Recently, various deep models have been proposed to transfer the learned knowledge across domains, but most of them focus on extracting abstract transferable features from auxilliary contents, e.g., images and review texts, and the patterns in the rating matrix itself is rarely touched. In this work, inspired by the concept of domain adaptation, we proposed a deep domain adaptation model (DARec) that is capable of extracting and transferring patterns from rating matrices {\em only} without relying on any auxillary information. We empirically demonstrate on public datasets that our method achieves the best performance among several state-of-the-art alternative cross-domain recommendation models.
△ Less
Submitted 26 May, 2019;
originally announced May 2019.
-
Combining Crowd and Machines for Multi-predicate Item Screening
Authors:
Evgeny Krivosheev,
Fabio Casati,
Marcos Baez,
Boualem Benatallah
Abstract:
This paper discusses how crowd and machine classifiers can be efficiently combined to screen items that satisfy a set of predicates. We show that this is a recurring problem in many domains, present machine-human (hybrid) algorithms that screen items efficiently and estimate the gain over human-only or machine-only screening in terms of performance and cost. We further show how, given a new classi…
▽ More
This paper discusses how crowd and machine classifiers can be efficiently combined to screen items that satisfy a set of predicates. We show that this is a recurring problem in many domains, present machine-human (hybrid) algorithms that screen items efficiently and estimate the gain over human-only or machine-only screening in terms of performance and cost. We further show how, given a new classification problem and a set of classifiers of unknown accuracy for the problem at hand, we can identify how to manage the cost-accuracy trade off by progressively determining if we should spend budget to obtain test data (to assess the accuracy of the given classifiers), or to train an ensemble of classifiers, or whether we should leverage the existing machine classifiers with the crowd, and in this case how to efficiently combine them based on their estimated characteristics to obtain the classification. We demonstrate that the techniques we propose obtain significant cost/accuracy improvements with respect to the leading classification algorithms.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.
-
Software Expert Discovery via Knowledge Domain Embeddings in a Collaborative Network
Authors:
Chaoran Huang,
Lina Yao,
Xianzhi Wang,
Boualem Benatallah,
Xiang Zhang
Abstract:
Community Question Answering (CQA) websites can be claimed as the most major venues for knowledge sharing, and the most effective way of exchanging knowledge at present. Considering that massive amount of users are participating online and generating huge amount data, management of knowledge here systematically can be challenging. Expert recommendation is one of the major challenges, as it highlig…
▽ More
Community Question Answering (CQA) websites can be claimed as the most major venues for knowledge sharing, and the most effective way of exchanging knowledge at present. Considering that massive amount of users are participating online and generating huge amount data, management of knowledge here systematically can be challenging. Expert recommendation is one of the major challenges, as it highlights users in CQA with potential expertise, which may help match unresolved questions with existing high quality answers while at the same time may help external services like human resource systems as another reference to evaluate their candidates. In this paper, we in this work we propose to exploring experts in CQA websites. We take advantage of recent distributed word representation technology to help summarize text chunks, and in a semantic view exploiting the relationships between natural language phrases to extract latent knowledge domains. By domains, the users' expertise is determined on their historical performance, and a rank can be compute to given recommendation accordingly. In particular, Stack Overflow is chosen as our dataset to test and evaluate our work, where inclusive experiment shows our competence.
△ Less
Submitted 26 October, 2018;
originally announced October 2018.
-
Adversarial Collaborative Auto-encoder for Top-N Recommendation
Authors:
Feng Yuan,
Lina Yao,
Boualem Benatallah
Abstract:
During the past decade, model-based recommendation methods have evolved from latent factor models to neural network-based models. Most of these techniques mainly focus on improving the overall performance, such as the root mean square error for rating predictions and hit ratio for top-N recommendation, where the users' feedback is considered as the ground-truth. However, in real-world applications…
▽ More
During the past decade, model-based recommendation methods have evolved from latent factor models to neural network-based models. Most of these techniques mainly focus on improving the overall performance, such as the root mean square error for rating predictions and hit ratio for top-N recommendation, where the users' feedback is considered as the ground-truth. However, in real-world applications, the users' feedback is possibly contaminated by imperfect user behaviours, namely, careless preference selection. Such data contamination poses challenges on the design of robust recommendation methods. In this work, to address the above issue, we propose a general adversial training framework for neural network-based recommendation models, which improves both the model robustness and the overall performance. We point out the tradeoffs between performance and robustness enhancement with detailed instructions on how to strike a balance. Specifically, we implement our approach on the collaborative auto-encoder, followed by experiments on three public available datasets: MovieLens-1M, Ciao, and FilmTrust. We show that our approach outperforms highly competitive state-of-the-art recommendation methods. In addition, we carry out a thorough analysis on the noise impacts, as well as the complex interactions between model nonlinearity and noise levels. Through simple modifications, our adversarial training framework can be applied to a host of neural network-based models whose robustness and performance are expected to be both enhanced.
△ Less
Submitted 16 August, 2018;
originally announced August 2018.
-
Expert Recommendation via Tensor Factorization with Regularizing Hierarchical Topical Relationships
Authors:
Chaoran Huang,
Lina Yao,
Xianzhi Wang,
Boualem Benatallah,
Shuai Zhang,
Manqing Dong
Abstract:
Knowledge acquisition and exchange are generally crucial yet costly for both businesses and individuals, especially when the knowledge concerns various areas. Question Answering Communities offer an opportunity for sharing knowledge at a low cost, where communities users, many of whom are domain experts, can potentially provide high-quality solutions to a given problem. In this paper, we propose a…
▽ More
Knowledge acquisition and exchange are generally crucial yet costly for both businesses and individuals, especially when the knowledge concerns various areas. Question Answering Communities offer an opportunity for sharing knowledge at a low cost, where communities users, many of whom are domain experts, can potentially provide high-quality solutions to a given problem. In this paper, we propose a framework for finding experts across multiple collaborative networks. We employ the recent techniques of tree-guided learning (via tensor decomposition), and matrix factorization to explore user expertise from past voted posts. Tensor decomposition enables to leverage the latent expertise of users, and the posts and related tags help identify the related areas. The final result is an expertise score for every user on every knowledge area. We experiment on Stack Exchange Networks, a set of question answering websites on different topics with a huge group of users and posts. Experiments show our proposed approach produces steady and premium outputs.
△ Less
Submitted 6 August, 2018; v1 submitted 3 August, 2018;
originally announced August 2018.
-
A Survey on Expert Recommendation in Community Question Answering
Authors:
Xianzhi Wang,
Chaoran Huang,
Lina Yao,
Boualem Benatallah,
Manqing Dong
Abstract:
Community question answering (CQA) represents the type of Web applications where people can exchange knowledge via asking and answering questions. One significant challenge of most real-world CQA systems is the lack of effective matching between questions and the potential good answerers, which adversely affects the efficient knowledge acquisition and circulation. On the one hand, a requester migh…
▽ More
Community question answering (CQA) represents the type of Web applications where people can exchange knowledge via asking and answering questions. One significant challenge of most real-world CQA systems is the lack of effective matching between questions and the potential good answerers, which adversely affects the efficient knowledge acquisition and circulation. On the one hand, a requester might experience many low-quality answers without receiving a quality response in a brief time, on the other hand, an answerer might face numerous new questions without being able to identify their questions of interest quickly. Under this situation, expert recommendation emerges as a promising technique to address the above issues. Instead of passively waiting for users to browse and find their questions of interest, an expert recommendation method raises the attention of users to the appropriate questions actively and promptly. The past few years have witnessed considerable efforts that address the expert recommendation problem from different perspectives. These methods all have their issues that need to be resolved before the advantages of expert recommendation can be fully embraced. In this survey, we first present an overview of the research efforts and state-of-the-art techniques for the expert recommendation in CQA. We next summarize and compare the existing methods concerning their advantages and shortcomings, followed by discussing the open issues and future research directions.
△ Less
Submitted 15 July, 2018;
originally announced July 2018.
-
GrCAN: Gradient Boost Convolutional Autoencoder with Neural Decision Forest
Authors:
Manqing Dong,
Lina Yao,
Xianzhi Wang,
Boualem Benatallah,
Shuai Zhang
Abstract:
Random forest and deep neural network are two schools of effective classification methods in machine learning. While the random forest is robust irrespective of the data domain, the deep neural network has advantages in handling high dimensional data. In view that a differentiable neural decision forest can be added to the neural network to fully exploit the benefits of both models, in our work, w…
▽ More
Random forest and deep neural network are two schools of effective classification methods in machine learning. While the random forest is robust irrespective of the data domain, the deep neural network has advantages in handling high dimensional data. In view that a differentiable neural decision forest can be added to the neural network to fully exploit the benefits of both models, in our work, we further combine convolutional autoencoder with neural decision forest, where autoencoder has its advantages in finding the hidden representations of the input data. We develop a gradient boost module and embed it into the proposed convolutional autoencoder with neural decision forest to improve the performance. The idea of gradient boost is to learn and use the residual in the prediction. In addition, we design a structure to learn the parameters of the neural decision forest and gradient boost module at contiguous steps. The extensive experiments on several public datasets demonstrate that our proposed model achieves good efficiency and prediction performance compared with a series of baseline methods.
△ Less
Submitted 24 June, 2018; v1 submitted 21 June, 2018;
originally announced June 2018.
-
CrowdRev: A platform for Crowd-based Screening of Literature Reviews
Authors:
Jorge Ramirez,
Evgeny Krivosheev,
Marcos Baez,
Fabio Casati,
Boualem Benatallah
Abstract:
In this paper and demo we present a crowd and crowd+AI based system, called CrowdRev, supporting the screening phase of literature reviews and achieving the same quality as author classification at a fraction of the cost, and near-instantly. CrowdRev makes it easy for authors to leverage the crowd, and ensures that no money is wasted even in the face of difficult papers or criteria: if the system…
▽ More
In this paper and demo we present a crowd and crowd+AI based system, called CrowdRev, supporting the screening phase of literature reviews and achieving the same quality as author classification at a fraction of the cost, and near-instantly. CrowdRev makes it easy for authors to leverage the crowd, and ensures that no money is wasted even in the face of difficult papers or criteria: if the system detects that the task is too hard for the crowd, it just gives up trying (for that paper, or for that criteria, or altogether), without wasting money and never compromising on quality.
△ Less
Submitted 31 May, 2018;
originally announced May 2018.
-
A Unified Knowledge Representation and Context-aware Recommender System in Internet of Things
Authors:
Yinhao Li,
Awa Alqahtani,
Ellis Solaiman,
Charith Perera,
Prem Prakash Jayaraman,
Boualem Benatallah,
Rajiv Ranjan
Abstract:
Within the rapidly develo** Internet of Things (IoT), numerous and diverse physical devices, Edge devices, Cloud infrastructure, and their quality of service requirements (QoS), need to be represented within a unified specification in order to enable rapid IoT application development, monitoring, and dynamic reconfiguration. But heterogeneities among different configuration knowledge representat…
▽ More
Within the rapidly develo** Internet of Things (IoT), numerous and diverse physical devices, Edge devices, Cloud infrastructure, and their quality of service requirements (QoS), need to be represented within a unified specification in order to enable rapid IoT application development, monitoring, and dynamic reconfiguration. But heterogeneities among different configuration knowledge representation models pose limitations for acquisition, discovery and curation of configuration knowledge for coordinated IoT applications. This paper proposes a unified data model to represent IoT resource configuration knowledge artifacts. It also proposes IoT-CANE (Context-Aware recommendatioN systEm) to facilitate incremental knowledge acquisition and declarative context driven knowledge recommendation.
△ Less
Submitted 24 May, 2018; v1 submitted 10 May, 2018;
originally announced May 2018.
-
Opinion Fraud Detection via Neural Autoencoder Decision Forest
Authors:
Manqing Dong,
Lina Yao,
Xianzhi Wang,
Boualem Benatallah,
Chaoran Huang,
Xiaodong Ning
Abstract:
Online reviews play an important role in influencing buyers' daily purchase decisions. However, fake and meaningless reviews, which cannot reflect users' genuine purchase experience and opinions, widely exist on the Web and pose great challenges for users to make right choices. Therefore,it is desirable to build a fair model that evaluates the quality of products by distinguishing spamming reviews…
▽ More
Online reviews play an important role in influencing buyers' daily purchase decisions. However, fake and meaningless reviews, which cannot reflect users' genuine purchase experience and opinions, widely exist on the Web and pose great challenges for users to make right choices. Therefore,it is desirable to build a fair model that evaluates the quality of products by distinguishing spamming reviews. We present an end-to-end trainable unified model to leverage the appealing properties from Autoencoder and random forest. A stochastic decision tree model is implemented to guide the global parameter learning process. Extensive experiments were conducted on a large Amazon review dataset. The proposed model consistently outperforms a series of compared methods.
△ Less
Submitted 9 May, 2018;
originally announced May 2018.
-
Crowd-based Multi-Predicate Screening of Papers in Literature Reviews
Authors:
Evgeny Krivosheev,
Fabio Casati,
Boualem Benatallah
Abstract:
Systematic literature reviews (SLRs) are one of the most common and useful form of scientific research and publication. Tens of thousands of SLRs are published each year, and this rate is growing across all fields of science. Performing an accurate, complete and unbiased SLR is however a difficult and expensive endeavor. This is true in general for all phases of a literature review, and in particu…
▽ More
Systematic literature reviews (SLRs) are one of the most common and useful form of scientific research and publication. Tens of thousands of SLRs are published each year, and this rate is growing across all fields of science. Performing an accurate, complete and unbiased SLR is however a difficult and expensive endeavor. This is true in general for all phases of a literature review, and in particular for the paper screening phase, where authors lter a set of potentially in-scope papers based on a number of exclusion criteria. To address the problem, in recent years the research community has began to explore the use of the crowd to allow for a faster, accurate, cheaper and unbiased screening of papers. Initial results show that crowdsourcing can be effective, even for relatively complex reviews. In this paper we derive and analyze a set of strategies for crowd-based screening, and show that an adaptive strategy, that continuously re-assesses the statistical properties of the problem to minimize the number of votes needed to take decisions for each paper, significantly outperforms a number of non-adaptive approaches in terms of cost and accuracy. We validate both applicability and results of the approach through a set of crowdsourcing experiments, and discuss properties of the problem and algorithms that we believe to be generally of interest for classification problems where items are classified via a series of successive tests (as it often happens in medicine).
△ Less
Submitted 21 March, 2018;
originally announced March 2018.
-
Crowd-Machine Collaboration for Item Screening
Authors:
Evgeny Krivosheev,
Bahareh Harandizadeh,
Fabio Casati,
Boualem Benatallah
Abstract:
In this paper we describe how crowd and machine classifier can be efficiently combined to screen items that satisfy a set of predicates. We show that this is a recurring problem in many domains, present machine-human (hybrid) algorithms that screen items efficiently and estimate the gain over human-only or machine-only screening in terms of performance and cost.
In this paper we describe how crowd and machine classifier can be efficiently combined to screen items that satisfy a set of predicates. We show that this is a recurring problem in many domains, present machine-human (hybrid) algorithms that screen items efficiently and estimate the gain over human-only or machine-only screening in terms of performance and cost.
△ Less
Submitted 21 March, 2018;
originally announced March 2018.
-
Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques and Assurance Actions
Authors:
Florian Daniel,
Pavel Kucherbaev,
Cinzia Cappiello,
Boualem Benatallah,
Mohammad Allahbakhsh
Abstract:
Crowdsourcing enables one to leverage on the intelligence and wisdom of potentially large groups of individuals toward solving problems. Common problems approached with crowdsourcing are labeling images, translating or transcribing text, providing opinions or ideas, and similar - all tasks that computers are not good at or where they may even fail altogether. The introduction of humans into comput…
▽ More
Crowdsourcing enables one to leverage on the intelligence and wisdom of potentially large groups of individuals toward solving problems. Common problems approached with crowdsourcing are labeling images, translating or transcribing text, providing opinions or ideas, and similar - all tasks that computers are not good at or where they may even fail altogether. The introduction of humans into computations and/or everyday work, however, also poses critical, novel challenges in terms of quality control, as the crowd is typically composed of people with unknown and very diverse abilities, skills, interests, personal objectives and technological resources. This survey studies quality in the context of crowdsourcing along several dimensions, so as to define and characterize it and to understand the current state of the art. Specifically, this survey derives a quality model for crowdsourcing tasks, identifies the methods and techniques that can be used to assess the attributes of the model, and the actions and strategies that help prevent and mitigate quality problems. An analysis of how these features are supported by the state of the art further identifies open issues and informs an outlook on hot future research directions.
△ Less
Submitted 8 January, 2018;
originally announced January 2018.
-
Programming Bots by Synthesizing Natural Language Expressions into API Invocations
Authors:
Shayan Zamanirad,
Boualem Benatallah,
Moshe Chai Barukh,
Fabio Casati,
Carlos Rodriguez
Abstract:
At present, bots are still in their preliminary stages of development. Many are relatively simple, or developed ad-hoc for a very specific use-case. For this reason, they are typically programmed manually, or utilize machine-learning classifiers to interpret a fixed set of user utterances. In reality, real world conversations with humans require support for dynamically capturing users expressions.…
▽ More
At present, bots are still in their preliminary stages of development. Many are relatively simple, or developed ad-hoc for a very specific use-case. For this reason, they are typically programmed manually, or utilize machine-learning classifiers to interpret a fixed set of user utterances. In reality, real world conversations with humans require support for dynamically capturing users expressions. Moreover, bots will derive immeasurable value by programming them to invoke APIs for their results. Today, within the Web and Mobile development community, complex applications are being stringed together with a few lines of code -- all made possible by APIs. Yet, developers today are not as empowered to program bots in much the same way. To overcome this, we introduce BotBase, a bot programming platform that dynamically synthesizes natural language user expressions into API invocations. Our solution is two faceted: Firstly, we construct an API knowledge graph to encode and evolve APIs; secondly, leveraging the above we apply techniques in NLP, ML and Entity Recognition to perform the required synthesis from natural language user expressions into API calls.
△ Less
Submitted 15 November, 2017;
originally announced November 2017.
-
Crowdsourcing Paper Screening in Systematic Literature Reviews
Authors:
Evgeny Krivosheev,
Fabio Casati,
Valentina Caforio,
Boualem Benatallah
Abstract:
Literature reviews allow scientists to stand on the shoulders of giants, showing promising directions, summarizing progress, and pointing out existing challenges in research. At the same time conducting a systematic literature review is a laborious and consequently expensive process. In the last decade, there have a few studies on crowdsourcing in literature reviews. This paper explores the feasib…
▽ More
Literature reviews allow scientists to stand on the shoulders of giants, showing promising directions, summarizing progress, and pointing out existing challenges in research. At the same time conducting a systematic literature review is a laborious and consequently expensive process. In the last decade, there have a few studies on crowdsourcing in literature reviews. This paper explores the feasibility of crowdsourcing for facilitating the literature review process in terms of results, time and effort, as well as to identify which crowdsourcing strategies provide the best results based on the budget available. In particular we focus on the screening phase of the literature review process and we contribute and assess methods for identifying the size of tests, labels required per paper, and classification functions as well as methods to split the crowdsourcing process in phases to improve results. Finally, we present our findings based on experiments run on Crowdflower.
△ Less
Submitted 15 September, 2017;
originally announced September 2017.
-
Data Curation APIs
Authors:
Seyed-Mehdi-Reza Beheshti,
Alireza Tabebordbar,
Boualem Benatallah,
Reza Nouri
Abstract:
Understanding and analyzing big data is firmly recognized as a powerful and strategic priority. For deeper interpretation of and better intelligence with big data, it is important to transform raw data (unstructured, semi-structured and structured data sources, e.g., text, video, image data sets) into curated data: contextualized data and knowledge that is maintained and made available for use by…
▽ More
Understanding and analyzing big data is firmly recognized as a powerful and strategic priority. For deeper interpretation of and better intelligence with big data, it is important to transform raw data (unstructured, semi-structured and structured data sources, e.g., text, video, image data sets) into curated data: contextualized data and knowledge that is maintained and made available for use by end-users and applications. In particular, data curation acts as the glue between raw data and analytics, providing an abstraction layer that relieves users from time consuming, tedious and error prone curation tasks. In this context, the data curation process becomes a vital analytics asset for increasing added value and insights.
In this paper, we identify and implement a set of curation APIs and make them available (on GitHub) to researchers and developers to assist them transforming their raw data into curated data. The curation APIs enable developers to easily add features - such as extracting keyword, part of speech, and named entities such as Persons, Locations, Organizations, Companies, Products, Diseases, Drugs, etc.; providing synonyms and stems for extracted information items leveraging lexical knowledge bases for the English language such as WordNet; linking extracted entities to external knowledge bases such as Google Knowledge Graph and Wikidata; discovering similarity among the extracted information items, such as calculating similarity between string, number, date and time data; classifying, sorting and categorizing data into various types, forms or any other distinct class; and indexing structured and unstructured data - into their applications.
△ Less
Submitted 10 December, 2016;
originally announced December 2016.
-
Big Data Analytics Using Cloud and Crowd
Authors:
Mohammad Allahbakhsh,
Saeed Arbabi,
Hamid-Reza Motahari-Nezhad,
Boualem Benatallah
Abstract:
The increasing application of social and human-enabled systems in people's daily life from one side and from the other side the fast growth of mobile and smart phones technologies have resulted in generating tremendous amount of data, also referred to as big data, and a need for analyzing these data, i.e., big data analytics. Recently a trend has emerged to incorporate human computing power into b…
▽ More
The increasing application of social and human-enabled systems in people's daily life from one side and from the other side the fast growth of mobile and smart phones technologies have resulted in generating tremendous amount of data, also referred to as big data, and a need for analyzing these data, i.e., big data analytics. Recently a trend has emerged to incorporate human computing power into big data analytics to solve some shortcomings of existing big data analytics such as dealing with semi or unstructured data. Including crowd into big data analytics creates some new challenges such as security, privacy and availability issues.
In this paper study hybrid human-machine big data analytics and propose a framework to study these systems from crowd involvement point of view. We identify some open issues in the area and propose a set of research directions for the future of big data analytics area.
△ Less
Submitted 16 April, 2016;
originally announced April 2016.
-
Unveiling Contextual Similarity of Things via Mining Human-Thing Interactions in the Internet of Things
Authors:
Lina Yao,
Quan Z. Sheng,
Anne H. H. Ngu,
Xue Li,
Boualem Benatallah
Abstract:
With recent advances in radio-frequency identification (RFID), wireless sensor networks, and Web services, physical things are becoming an integral part of the emerging ubiquitous Web. Finding correlations of ubiquitous things is a crucial prerequisite for many important applications such as things search, discovery, classification, recommendation, and composition. This article presents DisCor-T,…
▽ More
With recent advances in radio-frequency identification (RFID), wireless sensor networks, and Web services, physical things are becoming an integral part of the emerging ubiquitous Web. Finding correlations of ubiquitous things is a crucial prerequisite for many important applications such as things search, discovery, classification, recommendation, and composition. This article presents DisCor-T, a novel graph-based method for discovering underlying connections of things via mining the rich content embodied in human-thing interactions in terms of user, temporal and spatial information. We model these various information using two graphs, namely spatio-temporal graph and social graph. Then, random walk with restart (RWR) is applied to find proximities among things, and a relational graph of things (RGT) indicating implicit correlations of things is learned. The correlation analysis lays a solid foundation contributing to improved effectiveness in things management. To demonstrate the utility, we develop a flexible feature-based classification framework on top of RGT and perform a systematic case study. Our evaluation exhibits the strength and feasibility of the proposed approach.
△ Less
Submitted 17 July, 2017; v1 submitted 24 December, 2015;
originally announced December 2015.
-
Up in the Air: When Homes Meet the Web of Things
Authors:
Lina Yao,
Quan Z. Sheng,
Boualem Benatallah,
Schahram Dustdar,
Xianzhi Wang,
Ali Shemshadi,
Anne H. H. Ngu
Abstract:
The emerging Internet of Things (IoT) will comprise billions of Web-enabled objects (or "things") where such objects can sense, communicate, compute and potentially actuate. WoT is essentially the embodiment of the evolution from systems linking digital documents to systems relating digital information to real-world physical items. It is widely understood that significant technical challenges exis…
▽ More
The emerging Internet of Things (IoT) will comprise billions of Web-enabled objects (or "things") where such objects can sense, communicate, compute and potentially actuate. WoT is essentially the embodiment of the evolution from systems linking digital documents to systems relating digital information to real-world physical items. It is widely understood that significant technical challenges exist in develo** applications in the WoT environment. In this paper, we report our practical experience in the design and development of a smart home system in a WoT environment. Our system provides a layered framework for managing and sharing the information produced by physical things as well as the residents. We particularly focus on a research prototype named WITS, that helps the elderly live independently and safely in their own homes, with minimal support from the decreasing number of individuals in the working-age population. WITS enables an unobtrusive monitoring of elderly people in a real-world, inhabituated home environment, by leveraging WoT technologies in building context-aware, personalized services.
△ Less
Submitted 18 July, 2017; v1 submitted 19 December, 2015;
originally announced December 2015.
-
Big Data and Cross-Document Coreference Resolution: Current State and Future Opportunities
Authors:
Seyed-Mehdi-Reza Beheshti,
Srikumar Venugopal,
Seung Hwan Ryu,
Boualem Benatallah,
Wei Wang
Abstract:
Information Extraction (IE) is the task of automatically extracting structured information from unstructured/semi-structured machine-readable documents. Among various IE tasks, extracting actionable intelligence from ever-increasing amount of data depends critically upon Cross-Document Coreference Resolution (CDCR) - the task of identifying entity mentions across multiple documents that refer to t…
▽ More
Information Extraction (IE) is the task of automatically extracting structured information from unstructured/semi-structured machine-readable documents. Among various IE tasks, extracting actionable intelligence from ever-increasing amount of data depends critically upon Cross-Document Coreference Resolution (CDCR) - the task of identifying entity mentions across multiple documents that refer to the same underlying entity. Recently, document datasets of the order of peta-/tera-bytes has raised many challenges for performing effective CDCR such as scaling to large numbers of mentions and limited representational power. The problem of analysing such datasets is called "big data". The aim of this paper is to provide readers with an understanding of the central concepts, subtasks, and the current state-of-the-art in CDCR process. We provide assessment of existing tools/techniques for CDCR subtasks and highlight big data challenges in each of them to help readers identify important and outstanding issues for further investigation. Finally, we provide concluding remarks and discuss possible directions for future work.
△ Less
Submitted 14 November, 2013;
originally announced November 2013.
-
Extending SPARQL to Support Entity Grou** and Path Queries
Authors:
Seyed-Mehdi-Reza Beheshti,
Sherif Sakr,
Boualem Benatallah,
Hamid Reza Motahari-Nezhad
Abstract:
The ability to efficiently find relevant subgraphs and paths in a large graph to a given query is important in many applications including scientific data analysis, social networks, and business intelligence. Currently, there is little support and no efficient approaches for expressing and executing such queries. This paper proposes a data model and a query language to address this problem. The co…
▽ More
The ability to efficiently find relevant subgraphs and paths in a large graph to a given query is important in many applications including scientific data analysis, social networks, and business intelligence. Currently, there is little support and no efficient approaches for expressing and executing such queries. This paper proposes a data model and a query language to address this problem. The contributions include supporting the construction and selection of: (i) folder nodes, representing a set of related entities, and (ii) path nodes, representing a set of paths in which a path is the transitive relationship of two or more entities in the graph. Folders and paths can be stored and used for future queries. We introduce FPSPARQL which is an extension of the SPARQL supporting folder and path nodes. We have implemented a query engine that supports FPSPARQL and the evaluation results shows its viability and efficiency for querying large graph datasets.
△ Less
Submitted 21 November, 2012;
originally announced November 2012.
-
Temporal Provenance Model (TPM): Model and Query Language
Authors:
Seyed-Mehdi-Reza Beheshti,
Hamid Reza Motahari-Nezhad,
Boualem Benatallah
Abstract:
Provenance refers to the documentation of an object's lifecycle. This documentation (often represented as a graph) should include all the information necessary to reproduce a certain piece of data or the process that led to it. In a dynamic world, as data changes, it is important to be able to get a piece of data as it was, and its provenance graph, at a certain point in time. Supporting time-awar…
▽ More
Provenance refers to the documentation of an object's lifecycle. This documentation (often represented as a graph) should include all the information necessary to reproduce a certain piece of data or the process that led to it. In a dynamic world, as data changes, it is important to be able to get a piece of data as it was, and its provenance graph, at a certain point in time. Supporting time-aware provenance querying is challenging and requires: (i) explicitly representing the time information in the provenance graphs, and (ii) providing abstractions and efficient mechanisms for time-aware querying of provenance graphs over an ever growing volume of data. The existing provenance models treat time as a second class citizen (i.e. as an optional annotation). This makes time-aware querying of provenance data inefficient and sometimes inaccessible. We introduce an extended provenance graph model to explicitly represent time as an additional dimension of provenance data. We also provide a query language, novel abstractions and efficient mechanisms to query and analyze timed provenance graphs. The main contributions of the paper include: (i) proposing a Temporal Provenance Model (TPM) as a timed provenance model; and (ii) introducing two concepts of timed folder, as a container of related set of objects and their provenance relationship over time, and timed paths, to represent the evolution of objects tracing information over time, for analyzing and querying TPM graphs. We have implemented the approach on top of FPSPARQL, a query engine for large graphs, and have evaluated for querying TPM models. The evaluation shows the viability and efficiency of our approach.
△ Less
Submitted 21 November, 2012;
originally announced November 2012.
-
An Analytic Approach to People Evaluation in Crowdsourcing Systems
Authors:
Mohammad Allahbakhsh,
Aleksandar Ignjatovic,
Boualem Benatallah,
Seyed-Mehdi-Reza Beheshti,
Norman Foo,
Elisa Bertino
Abstract:
Worker selection is a significant and challenging issue in crowdsourcing systems. Such selection is usually based on an assessment of the reputation of the individual workers participating in such systems. However, assessing the credibility and adequacy of such calculated reputation is a real challenge. In this paper, we propose an analytic model which leverages the values of the tasks completed,…
▽ More
Worker selection is a significant and challenging issue in crowdsourcing systems. Such selection is usually based on an assessment of the reputation of the individual workers participating in such systems. However, assessing the credibility and adequacy of such calculated reputation is a real challenge. In this paper, we propose an analytic model which leverages the values of the tasks completed, the credibility of the evaluators of the results of the tasks and time of evaluation of the results of these tasks in order to calculate an accurate and credible reputation rank of participating workers and fairness rank for evaluators. The model has been implemented and experimentally validated.
△ Less
Submitted 13 November, 2012;
originally announced November 2012.
-
Detecting, Representing and Querying Collusion in Online Rating Systems
Authors:
Mohammad Allahbakhsh,
Aleksandar Ignjatovic,
Boualem Benatallah,
Seyed-Mehdi-Reza Beheshti,
Norman Foo,
Elisa Bertino
Abstract:
Online rating systems are subject to malicious behaviors mainly by posting unfair rating scores. Users may try to individually or collaboratively promote or demote a product. Collaborating unfair rating 'collusion' is more damaging than individual unfair rating. Although collusion detection in general has been widely studied, identifying collusion groups in online rating systems is less studied an…
▽ More
Online rating systems are subject to malicious behaviors mainly by posting unfair rating scores. Users may try to individually or collaboratively promote or demote a product. Collaborating unfair rating 'collusion' is more damaging than individual unfair rating. Although collusion detection in general has been widely studied, identifying collusion groups in online rating systems is less studied and needs more investigation. In this paper, we study impact of collusion in online rating systems and asses their susceptibility to collusion attacks. The proposed model uses a frequent itemset mining algorithm to detect candidate collusion groups. Then, several indicators are used for identifying collusion groups and for estimating how damaging such colluding groups might be. Also, we propose an algorithm for finding possible collusive subgroup inside larger groups which are not identified as collusive. The model has been implemented and we present results of experimental evaluation of our methodology.
△ Less
Submitted 2 November, 2012;
originally announced November 2012.
-
Programming Cloud Resource Orchestration Framework: Operations and Research Challenges
Authors:
Rajiv Ranjan,
Boualem Benatallah
Abstract:
The emergence of cloud computing over the past five years is potentially one of the breakthrough advances in the history of computing. It delivers hardware and software resources as virtualization-enabled services and in which administrators are free from the burden of worrying about the low level implementation or system administration details. Although cloud computing offers considerable opportu…
▽ More
The emergence of cloud computing over the past five years is potentially one of the breakthrough advances in the history of computing. It delivers hardware and software resources as virtualization-enabled services and in which administrators are free from the burden of worrying about the low level implementation or system administration details. Although cloud computing offers considerable opportunities for the users (e.g. application developers, governments, new startups, administrators, consultants, scientists, business analyst, etc.) such as no up-front investment, lowering operating cost, and infinite scalability, it has many unique research challenges that need to be carefully addressed in the future. In this paper, we present a survey on key cloud computing concepts, resource abstractions, and programming operations for orchestrating resources and associated research challenges, wherever applicable.
△ Less
Submitted 18 June, 2012; v1 submitted 10 April, 2012;
originally announced April 2012.