-
Fake News Detection: It's All in the Data!
Authors:
Soveatin Kuntur,
Anna Wróblewska,
Marcin Paprzycki,
Maria Ganzha
Abstract:
This comprehensive survey serves as an indispensable resource for researchers embarking on the journey of fake news detection. By highlighting the pivotal role of dataset quality and diversity, it underscores the significance of these elements in the effectiveness and robustness of detection models. The survey meticulously outlines the key features of datasets, various labeling systems employed, a…
▽ More
This comprehensive survey serves as an indispensable resource for researchers embarking on the journey of fake news detection. By highlighting the pivotal role of dataset quality and diversity, it underscores the significance of these elements in the effectiveness and robustness of detection models. The survey meticulously outlines the key features of datasets, various labeling systems employed, and prevalent biases that can impact model performance. Additionally, it addresses critical ethical issues and best practices, offering a thorough overview of the current state of available datasets. Our contribution to this field is further enriched by the provision of GitHub repository, which consolidates publicly accessible datasets into a single, user-friendly portal. This repository is designed to facilitate and stimulate further research and development efforts aimed at combating the pervasive issue of fake news.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Intelligent Interface: Enhancing Lecture Engagement with Didactic Activity Summaries
Authors:
Anna Wróblewska,
Marcel Witas,
Kinga Frańczak,
Arkadiusz Kniaź,
Siew Ann Cheong,
Tan Seng Chee,
Janusz Hołyst,
Marcin Paprzycki
Abstract:
Recently, multiple applications of machine learning have been introduced. They include various possibilities arising when image analysis methods are applied to, broadly understood, video streams. In this context, a novel tool, developed for academic educators to enhance the teaching process by automating, summarizing, and offering prompt feedback on conducting lectures, has been developed. The imp…
▽ More
Recently, multiple applications of machine learning have been introduced. They include various possibilities arising when image analysis methods are applied to, broadly understood, video streams. In this context, a novel tool, developed for academic educators to enhance the teaching process by automating, summarizing, and offering prompt feedback on conducting lectures, has been developed. The implemented prototype utilizes machine learning-based techniques to recognise selected didactic and behavioural teachers' features within lecture video recordings.
Specifically, users (teachers) can upload their lecture videos, which are preprocessed and analysed using machine learning models. Next, users can view summaries of recognized didactic features through interactive charts and tables. Additionally, stored ML-based prediction results support comparisons between lectures based on their didactic content. In the developed application text-based models trained on lecture transcriptions, with enhancements to the transcription quality, by adopting an automatic speech recognition solution are applied. Furthermore, the system offers flexibility for (future) integration of new/additional machine-learning models and software modules for image and video analysis.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
RDF Stream Taxonomy: Systematizing RDF Stream Types in Research and Practice
Authors:
Piotr Sowinski,
Pawel Szmeja,
Maria Ganzha,
Marcin Paprzycki
Abstract:
Over the years, RDF streaming was explored in research and practice from many angles, resulting in a wide range of RDF stream definitions. This variety presents a major challenge in discussing and integrating streaming systems, due to the lack of a common language. This work attempts to address this critical research gap, by systematizing RDF stream types present in the literature in a novel taxon…
▽ More
Over the years, RDF streaming was explored in research and practice from many angles, resulting in a wide range of RDF stream definitions. This variety presents a major challenge in discussing and integrating streaming systems, due to the lack of a common language. This work attempts to address this critical research gap, by systematizing RDF stream types present in the literature in a novel taxonomy. The proposed RDF Stream Taxonomy (RDF-STaX) is embodied in an OWL 2 DL ontology that follows the FAIR principles, making it readily applicable in practice. Extensive documentation and additional resources are provided, to foster the adoption of the ontology. Three use cases for the ontology are presented with accompanying competency questions, demonstrating the usefulness of the resource. Additionally, this work introduces a novel nanopublications dataset, which serves as a collaborative, living state-of-the-art review of RDF streaming. The results of a multifaceted evaluation of the resource are presented, testing its logical validity, use case coverage, and adherence to the community's best practices, while also comparing it to other works. RDF-STaX is expected to help drive innovation in RDF streaming, by fostering scientific discussion, cooperation, and tool interoperability.
△ Less
Submitted 27 June, 2024; v1 submitted 24 November, 2023;
originally announced November 2023.
-
RiverBench: an Open RDF Streaming Benchmark Suite
Authors:
Piotr Sowinski,
Maria Ganzha,
Marcin Paprzycki
Abstract:
RDF streaming has been explored by the Semantic Web community from many angles, resulting in multiple task formulations and streaming methods. However, for many existing formulations of the problem, reliably benchmarking streaming solutions has been challenging due to the lack of well-described and appropriately diverse benchmark datasets. Existing datasets and evaluations, except a few notable ca…
▽ More
RDF streaming has been explored by the Semantic Web community from many angles, resulting in multiple task formulations and streaming methods. However, for many existing formulations of the problem, reliably benchmarking streaming solutions has been challenging due to the lack of well-described and appropriately diverse benchmark datasets. Existing datasets and evaluations, except a few notable cases, suffer from unclear streaming task scopes, underspecified benchmarks, and errors in the data. To address these issues, we propose RiverBench, an open and collaborative RDF streaming benchmark suite. RiverBench leverages continuous, community-driven processes, established best practices (e.g., FAIR), and built-in quality guarantees. The suite distributes datasets in a common, accessible format, with clear documentation, licensing, and machine-readable metadata. The current release includes a diverse collection of non-synthetic datasets generated by the Semantic Web community, representing many applications of RDF data streaming, all major task formulations, and emerging RDF features (RDF-star). Finally, we present a list of research applications for the suite, demonstrating its versatility and value even beyond the realm of RDF streaming.
△ Less
Submitted 27 November, 2023; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Application of genetic algorithm to load balancing in networks with a homogeneous traffic flow
Authors:
Marek Bolanowski,
Alicja Gerka,
Andrzej Paszkiewicz,
Maria Ganzha,
Marcin Paprzycki
Abstract:
The concept of extended cloud requires efficient network infrastructure to support ecosystems reaching form the edge to the cloud(s). Standard approaches to network load balancing deliver static solutions that are insufficient for the extended clouds, where network loads change often. To address this issue, a genetic algorithm based load optimizer is proposed and implemented. Next, its performance…
▽ More
The concept of extended cloud requires efficient network infrastructure to support ecosystems reaching form the edge to the cloud(s). Standard approaches to network load balancing deliver static solutions that are insufficient for the extended clouds, where network loads change often. To address this issue, a genetic algorithm based load optimizer is proposed and implemented. Next, its performance is experimentally evaluated and it is shown that it outperforms other existing solutions.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Towards Edge-Cloud Architectures for Personal Protective Equipment Detection
Authors:
Jaroslaw Legierski,
Kajetan Rachwal,
Piotr Sowinski,
Wojciech Niewolski,
Przemyslaw Ratuszek,
Zbigniew Kopertowski,
Marcin Paprzycki,
Maria Ganzha
Abstract:
Detecting Personal Protective Equipment in images and video streams is a relevant problem in ensuring the safety of construction workers. In this contribution, an architecture enabling live image recognition of such equipment is proposed. The solution is deployable in two settings -- edge-cloud and edge-only. The system was tested on an active construction site, as a part of a larger scenario, wit…
▽ More
Detecting Personal Protective Equipment in images and video streams is a relevant problem in ensuring the safety of construction workers. In this contribution, an architecture enabling live image recognition of such equipment is proposed. The solution is deployable in two settings -- edge-cloud and edge-only. The system was tested on an active construction site, as a part of a larger scenario, within the scope of the ASSIST-IoT H2020 project. To determine the feasibility of the edge-only variant, a model for counting people wearing safety helmets was developed using the YOLOX method. It was found that an edge-only deployment is possible for this use case, given the hardware infrastructure available on site. In the preliminary evaluation, several important observations were made, that are crucial to the further development and deployment of the system. Future work will include an in-depth investigation of performance aspects of the two architecture variants.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
Eficiency of REST and gRPC realizing communication tasks in microservice-based ecosystems
Authors:
Marek Bolanowski,
Kamil Żak,
Andrzej Paszkiewicz,
Maria Ganzha,
Marcin Paprzycki,
Piotr Sowiński,
Ignacio Lacalle,
Carlos E. Palau
Abstract:
The aim of this contribution is to analyse practical aspects of the use of REST APIs and gRPC to realize communication tasks in applications in microservice-based ecosystems. On the basis of performed experiments, classes of communication tasks, for which given technology performs data transfer more efficiently, have been established. This, in turn, allows formulation of criteria for the selection…
▽ More
The aim of this contribution is to analyse practical aspects of the use of REST APIs and gRPC to realize communication tasks in applications in microservice-based ecosystems. On the basis of performed experiments, classes of communication tasks, for which given technology performs data transfer more efficiently, have been established. This, in turn, allows formulation of criteria for the selection of appropriate communication methods for communication tasks to be performed in an application using microservices-based architecture.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
Introducing Federated Learning into Internet of Things ecosystems -- preliminary considerations
Authors:
Karolina Bogacka,
Katarzyna Wasielewska-Michniewska,
Marcin Paprzycki,
Maria Ganzha,
Anastasiya Danilenka,
Lambis Tassakos,
Eduardo Garro
Abstract:
Federated learning (FL) was proposed to facilitate the training of models in a distributed environment. It supports the protection of (local) data privacy and uses local resources for model training. Until now, the majority of research has been devoted to "core issues", such as adaptation of machine learning algorithms to FL, data privacy protection, or dealing with the effects of uneven data dist…
▽ More
Federated learning (FL) was proposed to facilitate the training of models in a distributed environment. It supports the protection of (local) data privacy and uses local resources for model training. Until now, the majority of research has been devoted to "core issues", such as adaptation of machine learning algorithms to FL, data privacy protection, or dealing with the effects of uneven data distribution between clients. This contribution is anchored in a practical use case, where FL is to be actually deployed within an Internet of Things ecosystem. Hence, somewhat different issues that need to be considered, beyond popular considerations found in the literature, are identified. Moreover, an architecture that enables the building of flexible, and adaptable, FL solutions is introduced.
△ Less
Submitted 15 July, 2022;
originally announced July 2022.
-
Efficient RDF Streaming for the Edge-Cloud Continuum
Authors:
Piotr Sowinski,
Katarzyna Wasielewska-Michniewska,
Maria Ganzha,
Wieslaw Pawlowski,
Pawel Szmeja,
Marcin Paprzycki
Abstract:
With the ongoing, gradual shift of large-scale distributed systems towards the edge-cloud continuum, the need arises for software solutions that are universal, scalable, practical, and grounded in well-established technologies. Simultaneously, semantic technologies, especially in the streaming context, are becoming increasingly important for enabling interoperability in edge-cloud systems. However…
▽ More
With the ongoing, gradual shift of large-scale distributed systems towards the edge-cloud continuum, the need arises for software solutions that are universal, scalable, practical, and grounded in well-established technologies. Simultaneously, semantic technologies, especially in the streaming context, are becoming increasingly important for enabling interoperability in edge-cloud systems. However, in recent years, the field of semantic data streaming has been stagnant, and there are no available solutions that would fit those requirements. To fill this gap, in this contribution, a novel end-to-end RDF streaming approach is proposed (named Jelly). The method is simple to implement, yet very elastic, and designed to fit a wide variety of use cases. Its practical performance is evaluated in a series of experiments, including end-to-end throughput and latency measurements. It is shown that Jelly achieves vastly superior performance to the currently available approaches. The presented method makes significant progress towards enabling high-performance semantic data processing in a wide variety of applications, including future edge-cloud systems. Moreover, this study opens up the possibility of applying and evaluating the method in real-life scenarios, which will be the focus of further research.
△ Less
Submitted 10 July, 2022;
originally announced July 2022.
-
StatMix: Data augmentation method that relies on image statistics in federated learning
Authors:
Dominik Lewy,
Jacek Mańdziuk,
Maria Ganzha,
Marcin Paprzycki
Abstract:
Availability of large amount of annotated data is one of the pillars of deep learning success. Although numerous big datasets have been made available for research, this is often not the case in real life applications (e.g. companies are not able to share data due to GDPR or concerns related to intellectual property rights protection). Federated learning (FL) is a potential solution to this proble…
▽ More
Availability of large amount of annotated data is one of the pillars of deep learning success. Although numerous big datasets have been made available for research, this is often not the case in real life applications (e.g. companies are not able to share data due to GDPR or concerns related to intellectual property rights protection). Federated learning (FL) is a potential solution to this problem, as it enables training a global model on data scattered across multiple nodes, without sharing local data itself. However, even FL methods pose a threat to data privacy, if not handled properly. Therefore, we propose StatMix, an augmentation approach that uses image statistics, to improve results of FL scenario(s). StatMix is empirically tested on CIFAR-10 and CIFAR-100, using two neural network architectures. In all FL experiments, application of StatMix improves the average accuracy, compared to the baseline training (with no use of StatMix). Some improvement can also be observed in non-FL setups.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Using adversarial images to improve outcomes of federated learning for non-IID data
Authors:
Anastasiya Danilenka,
Maria Ganzha,
Marcin Paprzycki,
Jacek Mańdziuk
Abstract:
One of the important problems in federated learning is how to deal with unbalanced data. This contribution introduces a novel technique designed to deal with label skewed non-IID data, using adversarial inputs, created by the I-FGSM method. Adversarial inputs guide the training process and allow the Weighted Federated Averaging to give more importance to clients with 'selected' local label distrib…
▽ More
One of the important problems in federated learning is how to deal with unbalanced data. This contribution introduces a novel technique designed to deal with label skewed non-IID data, using adversarial inputs, created by the I-FGSM method. Adversarial inputs guide the training process and allow the Weighted Federated Averaging to give more importance to clients with 'selected' local label distributions. Experimental results, gathered from image classification tasks, for MNIST and CIFAR-10 datasets, are reported and analyzed.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Ontology Reuse: the Real Test of Ontological Design
Authors:
Piotr Sowinski,
Katarzyna Wasielewska-Michniewska,
Maria Ganzha,
Marcin Paprzycki,
Costin Badica
Abstract:
Reusing ontologies in practice is still very challenging, especially when multiple ontologies are (jointly) involved. Moreover, despite recent advances, the realization of systematic ontology quality assurance remains a difficult problem. In this work, the quality of thirty biomedical ontologies, and the Computer Science Ontology are investigated, from the perspective of a practical use case. Spec…
▽ More
Reusing ontologies in practice is still very challenging, especially when multiple ontologies are (jointly) involved. Moreover, despite recent advances, the realization of systematic ontology quality assurance remains a difficult problem. In this work, the quality of thirty biomedical ontologies, and the Computer Science Ontology are investigated, from the perspective of a practical use case. Special scrutiny is given to cross-ontology references, which are vital for combining ontologies. Diverse methods to detect potential issues are proposed, including natural language processing and network analysis. Moreover, several suggestions for improving ontologies and their quality assurance processes are presented. It is argued that while the advancing automatic tools for ontology quality assurance are crucial for ontology improvement, they will not solve the problem entirely. It is ontology reuse that is the ultimate method for continuously verifying and improving ontology quality, as well as for guiding its future development. Specifically, multiple issues can be found and fixed primarily through practical and diverse ontology reuse scenarios.
△ Less
Submitted 6 July, 2022; v1 submitted 5 May, 2022;
originally announced May 2022.
-
An Energy Aware Clustering Scheme for 5G-enabled Edge Computing based IoMT Framework
Authors:
Jitendra Kumar Samriya,
Mohit Kumar,
Maria Ganzha,
Marcin Paprzycki,
Marek Bolanowski,
Andrzej Paszkiewicz
Abstract:
In recent years, 5G network systems start to offer communication infrastructure for Internet of Things (IoT) applications, especially for health care service pro-viders. In smart health care systems, edge computing enabled Internet of Medical Things (IoMT) is an innovative technology to provide online health care monitor-ing facility to patients. Here, energy consumption, along with extending the…
▽ More
In recent years, 5G network systems start to offer communication infrastructure for Internet of Things (IoT) applications, especially for health care service pro-viders. In smart health care systems, edge computing enabled Internet of Medical Things (IoMT) is an innovative technology to provide online health care monitor-ing facility to patients. Here, energy consumption, along with extending the lifespan of biosensor network, is a key concern. In this contribution, a Chicken Swarm Optimization algorithm, based on Energy Efficient Multi-objective clus-tering scheme is applied in the context of IoMT system. An effective fitness func-tion is designed for cluster head selection., using multiple objectives, such as re-sidual energy, queuing delay, communication cost, link quality and node centrali-ty. Simulated outcomes of the proposed scheme are compared with the existing schemes in terms of parameters such as cluster formation time, energy consump-tion, network lifetime, throughput and propagation delay.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
-
Applying machine learning to predict behavior of bus transport in Warsaw, Poland
Authors:
Łukasz Pałys,
Maria Ganzha,
Marcin Paprzycki
Abstract:
Nowadays, it is possible to collect precise data describing movements of public transport. Specifically, for each bus (or tram) geoposition data can be regularly collected. This includes data for all buses in Warsaw, Poland. Moreover, this data can be downloaded and analyzed. In this context, one of the simplest questions is: can a model be build to represent behavior of busses, and predict their…
▽ More
Nowadays, it is possible to collect precise data describing movements of public transport. Specifically, for each bus (or tram) geoposition data can be regularly collected. This includes data for all buses in Warsaw, Poland. Moreover, this data can be downloaded and analyzed. In this context, one of the simplest questions is: can a model be build to represent behavior of busses, and predict their delays. This work provides initial results of our attempt to answer this question.
△ Less
Submitted 9 April, 2022;
originally announced April 2022.
-
Practical Aspects of Zero-Shot Learning
Authors:
Elie Saad,
Marcin Paprzycki,
Maria Ganzha
Abstract:
One of important areas of machine learning research is zero-shot learning. It is applied when properly labeled training data set is not available. A number of zero-shot algorithms have been proposed and experimented with. However, none of them seems to be the "overall winner". In situations like this, it may be possible to develop a meta-classifier that would combine "best aspects" of individual c…
▽ More
One of important areas of machine learning research is zero-shot learning. It is applied when properly labeled training data set is not available. A number of zero-shot algorithms have been proposed and experimented with. However, none of them seems to be the "overall winner". In situations like this, it may be possible to develop a meta-classifier that would combine "best aspects" of individual classifiers and outperform all of them. In this context, the goal of this contribution is twofold. First, multiple state-of-the-art zero-shot learning methods are compared for standard benchmark datasets. Second, multiple meta-classifiers are suggested and experimentally compared (for the same datasets).
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Topical Classification of Food Safety Publications with a Knowledge Base
Authors:
Piotr Sowinski,
Katarzyna Wasielewska-Michniewska,
Maria Ganzha,
Marcin Paprzycki
Abstract:
The vast body of scientific publications presents an increasing challenge of finding those that are relevant to a given research question, and making informed decisions on their basis. This becomes extremely difficult without the use of automated tools. Here, one possible area for improvement is automatic classification of publication abstracts according to their topic. This work introduces a nove…
▽ More
The vast body of scientific publications presents an increasing challenge of finding those that are relevant to a given research question, and making informed decisions on their basis. This becomes extremely difficult without the use of automated tools. Here, one possible area for improvement is automatic classification of publication abstracts according to their topic. This work introduces a novel, knowledge base-oriented publication classifier. The proposed method focuses on achieving scalability and easy adaptability to other domains. Classification speed and accuracy are shown to be satisfactory, in the very demanding field of food safety. Further development and evaluation of the method is needed, as the proposed approach shows much potential.
△ Less
Submitted 4 January, 2022; v1 submitted 2 January, 2022;
originally announced January 2022.
-
Exploring usability of Reddit in data science and knowledge processing
Authors:
Jan Sawicki,
Maria Ganzha,
Marcin Paprzycki,
Amelia Bădică
Abstract:
This contribution argues that Reddit, as a massive, categorized, open-access dataset, is a useful data source, for "almost any topic". Hence, it can be used in data science, e.g. for knowledge exploration. This statement is backed-up with presented analysis, based on 180 manually annotated papers, related to Reddit itself, and data acquired from popular databases of scientific papers. Finally, an…
▽ More
This contribution argues that Reddit, as a massive, categorized, open-access dataset, is a useful data source, for "almost any topic". Hence, it can be used in data science, e.g. for knowledge exploration. This statement is backed-up with presented analysis, based on 180 manually annotated papers, related to Reddit itself, and data acquired from popular databases of scientific papers. Finally, an open source tool is introduced, which provides an easy access to Reddit resources, and an exploratory data analysis of how Reddit covers selected topics. These functions can be used as a prelude analysis to a broader exploration of Reddit's applicability.
△ Less
Submitted 14 April, 2023; v1 submitted 5 October, 2021;
originally announced October 2021.
-
Semantic Access Control for Privacy Management of Personal Sensing in Smart Cities
Authors:
Michał Drozdowicz,
Maria Ganzha,
Marcin Paprzycki
Abstract:
Personal and home sensors generate valuable information that could be used in Smart Cities. Unfortunately, typically, this data is locked out and used only by application/system developer. While vendors are to blame, one should consider also the "binary nature" of data access. Specifically, either owner has full control over her data (e.g. in a "closed system"), or she completely looses control, w…
▽ More
Personal and home sensors generate valuable information that could be used in Smart Cities. Unfortunately, typically, this data is locked out and used only by application/system developer. While vendors are to blame, one should consider also the "binary nature" of data access. Specifically, either owner has full control over her data (e.g. in a "closed system"), or she completely looses control, when the data is "opened". In this context, we propose, a semantic technologies-based, authorization and privacy control framework that enables user to maintain flexible, yet manageable data access control policies. The proposed approach is described in detail, including implementation and testing.
△ Less
Submitted 8 January, 2021;
originally announced January 2021.
-
A Review of Platforms for the Development of Agent Systems
Authors:
Constantin-Valentin Pal,
Florin Leon,
Marcin Paprzycki,
Maria Ganzha
Abstract:
Agent-based computing is an active field of research with the goal of building autonomous software of hardware entities. This task is often facilitated by the use of dedicated, specialized frameworks. For almost thirty years, many such agent platforms have been developed. Meanwhile, some of them have been abandoned, others continue their development and new platforms are released. This paper prese…
▽ More
Agent-based computing is an active field of research with the goal of building autonomous software of hardware entities. This task is often facilitated by the use of dedicated, specialized frameworks. For almost thirty years, many such agent platforms have been developed. Meanwhile, some of them have been abandoned, others continue their development and new platforms are released. This paper presents a up-to-date review of the existing agent platforms and also a historical perspective of this domain. It aims to serve as a reference point for people interested in develo** agent systems. This work details the main characteristics of the included agent platforms, together with links to specific projects where they have been used. It distinguishes between the active platforms and those no longer under development or with unclear status. It also classifies the agent platforms as general purpose ones, free or commercial, and specialized ones, which can be used for particular types of applications.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
Grid Security and Integration with Minimal Performance Degradation
Authors:
Sugata Sanyal,
Rangarajan A. Vasudevan,
Ajith Abraham,
Marcin Paprzycki
Abstract:
Computational grids are believed to be the ultimate framework to meet the growing computational needs of the scientific community. Here, the processing power of geographically distributed resources working under different ownerships, having their own access policy, cost structure and the likes, is logically coupled to make them perform as a unified resource. The continuous increase of availability…
▽ More
Computational grids are believed to be the ultimate framework to meet the growing computational needs of the scientific community. Here, the processing power of geographically distributed resources working under different ownerships, having their own access policy, cost structure and the likes, is logically coupled to make them perform as a unified resource. The continuous increase of availability of high-bandwidth communication as well as powerful computers built of low-cost components further enhance chances of computational grids becoming a reality. However, the question of grid security remains one of the important open research issues. Here, we present some novel ideas about how to implement grid security, without appreciable performance degradation in grids. A suitable alternative to the computationally expensive encryption is suggested, which uses a key for message authentication. Methods of secure transfer and exchange of the required key(s) are also discussed.
△ Less
Submitted 19 November, 2011;
originally announced November 2011.
-
Traffic Accident Analysis Using Decision Trees and Neural Networks
Authors:
Miao M. Chong,
Ajith Abraham,
Marcin Paprzycki
Abstract:
The costs of fatalities and injuries due to traffic accident have a great impact on society. This paper presents our research to model the severity of injury resulting from traffic accidents using artificial neural networks and decision trees. We have applied them to an actual data set obtained from the National Automotive Sampling System (NASS) General Estimates System (GES). Experiment results…
▽ More
The costs of fatalities and injuries due to traffic accident have a great impact on society. This paper presents our research to model the severity of injury resulting from traffic accidents using artificial neural networks and decision trees. We have applied them to an actual data set obtained from the National Automotive Sampling System (NASS) General Estimates System (GES). Experiment results reveal that in all the cases the decision tree outperforms the neural network. Our research analysis also shows that the three most important factors in fatal injury are: driver's seat belt usage, light condition of the roadway, and driver's alcohol usage.
△ Less
Submitted 15 May, 2004;
originally announced May 2004.
-
Data Mining Approach for Analyzing Call Center Performance
Authors:
Marcin Paprzycki,
Ajith Abraham,
Ruiyuan Guo
Abstract:
The aim of our research was to apply well-known data mining techniques (such as linear neural networks, multi-layered perceptrons, probabilistic neural networks, classification and regression trees, support vector machines and finally a hybrid decision tree neural network approach) to the problem of predicting the quality of service in call centers; based on the performance data actually collect…
▽ More
The aim of our research was to apply well-known data mining techniques (such as linear neural networks, multi-layered perceptrons, probabilistic neural networks, classification and regression trees, support vector machines and finally a hybrid decision tree neural network approach) to the problem of predicting the quality of service in call centers; based on the performance data actually collected in a call center of a large insurance company. Our aim was two-fold. First, to compare the performance of models built using the above-mentioned techniques and, second, to analyze the characteristics of the input sensitivity in order to better understand the relationship between the perform-ance evaluation process and the actual performance and in this way help improve the performance of call centers. In this paper we summarize our findings.
△ Less
Submitted 4 May, 2004;
originally announced May 2004.