-
Brain Tumor Segmentation in MRI Images with 3D U-Net and Contextual Transformer
Authors:
Thien-Qua T. Nguyen,
Hieu-Nghia Nguyen,
Thanh-Hieu Bui,
Thien B. Nguyen-Tat,
Vuong M. Ngo
Abstract:
This research presents an enhanced approach for precise segmentation of brain tumor masses in magnetic resonance imaging (MRI) using an advanced 3D-UNet model combined with a Context Transformer (CoT). By architectural expansion CoT, the proposed model extends its architecture to a 3D format, integrates it smoothly with the base model to utilize the complex contextual information found in MRI scan…
▽ More
This research presents an enhanced approach for precise segmentation of brain tumor masses in magnetic resonance imaging (MRI) using an advanced 3D-UNet model combined with a Context Transformer (CoT). By architectural expansion CoT, the proposed model extends its architecture to a 3D format, integrates it smoothly with the base model to utilize the complex contextual information found in MRI scans, emphasizing how elements rely on each other across an extended spatial range. The proposed model synchronizes tumor mass characteristics from CoT, mutually reinforcing feature extraction, facilitating the precise capture of detailed tumor mass structures, including location, size, and boundaries. Several experimental results present the outstanding segmentation performance of the proposed method in comparison to current state-of-the-art approaches, achieving Dice score of 82.0%, 81.5%, 89.0% for Enhancing Tumor, Tumor Core and Whole Tumor, respectively, on BraTS2019.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Automating Attendance Management in Human Resources: A Design Science Approach Using Computer Vision and Facial Recognition
Authors:
Bao-Thien Nguyen-Tat,
Minh-Quoc Bui,
Vuong M. Ngo
Abstract:
Haar Cascade is a cost-effective and user-friendly machine learning-based algorithm for detecting objects in images and videos. Unlike Deep Learning algorithms, which typically require significant resources and expensive computing costs, it uses simple image processing techniques like edge detection and Haar features that are easy to comprehend and implement. By combining Haar Cascade with OpenCV2…
▽ More
Haar Cascade is a cost-effective and user-friendly machine learning-based algorithm for detecting objects in images and videos. Unlike Deep Learning algorithms, which typically require significant resources and expensive computing costs, it uses simple image processing techniques like edge detection and Haar features that are easy to comprehend and implement. By combining Haar Cascade with OpenCV2 on an embedded computer like the NVIDIA Jetson Nano, this system can accurately detect and match faces in a database for attendance tracking. This system aims to achieve several specific objectives that set it apart from existing solutions. It leverages Haar Cascade, enriched with carefully selected Haar features, such as Haar-like wavelets, and employs advanced edge detection techniques. These techniques enable precise face detection and matching in both images and videos, contributing to high accuracy and robust performance. By doing so, it minimizes manual intervention and reduces errors, thereby strengthening accountability. Additionally, the integration of OpenCV2 and the NVIDIA Jetson Nano optimizes processing efficiency, making it suitable for resource-constrained environments. This system caters to a diverse range of educational institutions, including schools, colleges, vocational training centers, and various workplace settings such as small businesses, offices, and factories. ... The system's affordability and efficiency democratize attendance management technology, making it accessible to a broader audience. Consequently, it has the potential to transform attendance tracking and management practices, ultimately leading to heightened productivity and accountability. In conclusion, this system represents a groundbreaking approach to attendance tracking and management...
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Graph-Based Optimisation of Network Expansion in a Dockless Bike Sharing System
Authors:
Mark Roantree,
Niamh Murphi,
Dinh Viet Cuong,
Vuong Minh Ngo
Abstract:
Bike-sharing systems (BSSs) are deployed in over a thousand cities worldwide and play an important role in many urban transportation systems. BSSs alleviate congestion, reduce pollution and promote physical exercise. It is essential to explore the spatiotemporal patterns of bike-sharing demand, as well as the factors that influence these patterns, in order to optimise system operational efficiency…
▽ More
Bike-sharing systems (BSSs) are deployed in over a thousand cities worldwide and play an important role in many urban transportation systems. BSSs alleviate congestion, reduce pollution and promote physical exercise. It is essential to explore the spatiotemporal patterns of bike-sharing demand, as well as the factors that influence these patterns, in order to optimise system operational efficiency. In this study, an optimised geo-temporal graph is constructed using trip data from Moby Bikes, a dockless BSS operator. The process of optimising the graph unveiled prime locations for erecting new stations during future expansions of the BSS. The Louvain algorithm, a community detection technique, is employed to uncover usage patterns at different levels of temporal granularity. The community detection results reveal largely self-contained sub-networks that exhibit similar usage patterns at their respective levels of temporal granularity. Overall, this study reinforces that BSSs are intrinsically spatiotemporal systems, with community presence driven by spatiotemporal dynamics. These findings may aid operators in improving redistribution efficiency.
△ Less
Submitted 28 March, 2024;
originally announced April 2024.
-
Structural Textile Pattern Recognition and Processing Based on Hypergraphs
Authors:
Vuong M. Ngo,
Sven Helmer,
Nhien-An Le-Khac,
M-Tahar Kechadi
Abstract:
The humanities, like many other areas of society, are currently undergoing major changes in the wake of digital transformation. However, in order to make collection of digitised material in this area easily accessible, we often still lack adequate search functionality. For instance, digital archives for textiles offer keyword search, which is fairly well understood, and arrange their content follo…
▽ More
The humanities, like many other areas of society, are currently undergoing major changes in the wake of digital transformation. However, in order to make collection of digitised material in this area easily accessible, we often still lack adequate search functionality. For instance, digital archives for textiles offer keyword search, which is fairly well understood, and arrange their content following a certain taxonomy, but search functionality at the level of thread structure is still missing. To facilitate the clustering and search, we introduce an approach for recognising similar weaving patterns based on their structures for textile archives. We first represent textile structures using hypergraphs and extract multisets of k-neighbourhoods describing weaving patterns from these graphs. Then, the resulting multisets are clustered using various distance measures and various clustering algorithms (K-Means for simplicity and hierarchical agglomerative algorithms for precision). We evaluate the different variants of our approach experimentally, showing that this can be implemented efficiently (meaning it has linear complexity), and demonstrate its quality to query and cluster datasets containing large textile samples. As, to the est of our knowledge, this is the first practical approach for explicitly modelling complex and irregular weaving patterns usable for retrieval, we aim at establishing a solid baseline.
△ Less
Submitted 20 March, 2021;
originally announced March 2021.
-
Crop Knowledge Discovery Based on Agricultural Big Data Integration
Authors:
Vuong M. Ngo,
M-Tahar Kechadi
Abstract:
Nowadays, the agricultural data can be generated through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, agricultural laboratories, farmers, government agencies and agribusinesses. The analysis of this big data enables farmers, companies and agronomists to extract high business and scientific knowledge, improving their operational p…
▽ More
Nowadays, the agricultural data can be generated through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, agricultural laboratories, farmers, government agencies and agribusinesses. The analysis of this big data enables farmers, companies and agronomists to extract high business and scientific knowledge, improving their operational processes and product quality. However, before analysing this data, different data sources need to be normalised, homogenised and integrated into a unified data representation. In this paper, we propose an agricultural data integration method using a constellation schema which is designed to be flexible enough to incorporate other datasets and big data models. We also apply some methods to extract knowledge with the view to improve crop yield; these include finding suitable quantities of soil properties, herbicides and insecticides for both increasing crop yield and protecting the environment.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Data Warehouse and Decision Support on Integrated Crop Big Data
Authors:
V. M. Ngo,
N. A. Le-Khac,
M. T. Kechadi
Abstract:
In recent years, precision agriculture is becoming very popular. The introduction of modern information and communication technologies for collecting and processing Agricultural data revolutionise the agriculture practises. This has started a while ago (early 20th century) and it is driven by the low cost of collecting data about everything; from information on fields such as seed, soil, fertilise…
▽ More
In recent years, precision agriculture is becoming very popular. The introduction of modern information and communication technologies for collecting and processing Agricultural data revolutionise the agriculture practises. This has started a while ago (early 20th century) and it is driven by the low cost of collecting data about everything; from information on fields such as seed, soil, fertiliser, pest, to weather data, drones and satellites images. Specially, the agricultural data mining today is considered as Big Data application in terms of volume, variety, velocity and veracity. Hence it leads to challenges in processing vast amounts of complex and diverse information to extract useful knowledge for the farmer, agronomist, and other businesses. It is a key foundation to establishing a crop intelligence platform, which will enable efficient resource management and high quality agronomy decision making and recommendations. In this paper, we designed and implemented a continental level agricultural data warehouse (ADW). ADW is characterised by its (1) flexible schema; (2) data integration from real agricultural multi datasets; (3) data science and business intelligent support; (4) high performance; (5) high storage; (6) security; (7) governance and monitoring; (8) consistency, availability and partition tolerant; (9) cloud compatibility. We also evaluate the performance of ADW and present some complex queries to extract and return necessary knowledge about crop management.
△ Less
Submitted 12 April, 2021; v1 submitted 9 March, 2020;
originally announced March 2020.
-
Designing and Implementing Data Warehouse for Agricultural Big Data
Authors:
Vuong M. Ngo,
Nhien-An Le-Khac,
M-Tahar Kechadi
Abstract:
In recent years, precision agriculture that uses modern information and communication technologies is becoming very popular. Raw and semi-processed agricultural data are usually collected through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, farmers and agribusinesses, etc. Besides, agricultural datasets are very large, complex, u…
▽ More
In recent years, precision agriculture that uses modern information and communication technologies is becoming very popular. Raw and semi-processed agricultural data are usually collected through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, farmers and agribusinesses, etc. Besides, agricultural datasets are very large, complex, unstructured, heterogeneous, non-standardized, and inconsistent. Hence, the agricultural data mining is considered as Big Data application in terms of volume, variety, velocity and veracity. It is a key foundation to establishing a crop intelligence platform, which will enable resource efficient agronomy decision making and recommendations. In this paper, we designed and implemented a continental level agricultural data warehouse by combining Hive, MongoDB and Cassandra. Our data warehouse capabilities: (1) flexible schema; (2) data integration from real agricultural multi datasets; (3) data science and business intelligent support; (4) high performance; (5) high storage; (6) security; (7) governance and monitoring; (8) replication and recovery; (9) consistency, availability and partition tolerant; (10) distributed and cloud deployment. We also evaluate the performance of our data warehouse.
△ Less
Submitted 29 May, 2019;
originally announced May 2019.
-
Using Entity Relations for Opinion Mining of Vietnamese Comments
Authors:
P. T. Nguyen,
L. T. Le,
V. M. Ngo,
P. M. Nguyen
Abstract:
In this paper, we propose several novel techniques to extract and mining opinions of Vietnamese reviews of customers about a number of products traded on e-commerce in Vietnam. The assessment is based on the emotional level of customers on a specific product such as mobile and laptop. We exploit the features of the products because they are much interested by customers and have many products in th…
▽ More
In this paper, we propose several novel techniques to extract and mining opinions of Vietnamese reviews of customers about a number of products traded on e-commerce in Vietnam. The assessment is based on the emotional level of customers on a specific product such as mobile and laptop. We exploit the features of the products because they are much interested by customers and have many products in the Vietnam e-commerce market. Thence, it can be known the favorites and dislikes of customers about exploited products.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.
-
Machine Learning based English Sentiment Analysis
Authors:
T. N. T. Tran,
L. K. N. Nguyen,
V. M. Ngo
Abstract:
Sentiment analysis or opinion mining aims to determine attitudes, judgments and opinions of customers for a product or a service. This is a great system to help manufacturers or servicers know the satisfaction level of customers about their products or services. From that, they can have appropriate adjustments. We use a popular machine learning method, being Support Vector Machine, combine with th…
▽ More
Sentiment analysis or opinion mining aims to determine attitudes, judgments and opinions of customers for a product or a service. This is a great system to help manufacturers or servicers know the satisfaction level of customers about their products or services. From that, they can have appropriate adjustments. We use a popular machine learning method, being Support Vector Machine, combine with the library in Waikato Environment for Knowledge Analysis (WEKA) to build Java web program which analyzes the sentiment of English comments belongs one in four types of woman products. That are dresses, handbags, shoes and rings. We have developed and test our system with a training set having 300 comments and a test set having 400 comments. The experimental results of the system about precision, recall and F measures for positive comments are 89.3%, 95.0% and 92,.1%; for negative comments are 97.1%, 78.5% and 86.8%; and for neutral comments are 76.7%, 86.2% and 81.2%.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.
-
Detecting Vietnamese Opinion Spam
Authors:
T. H. H Duong,
T. D. Vu,
V. M. Ngo
Abstract:
Recently, Vietnamese Natural Language Processing has been researched by experts in academic and business. However, the existing papers have been focused only on information classification or extraction from documents. Nowadays, with quickly development of the e-commerce websites, forums and social networks, the products, people, organizations or wonders are targeted of comments or reviews of the n…
▽ More
Recently, Vietnamese Natural Language Processing has been researched by experts in academic and business. However, the existing papers have been focused only on information classification or extraction from documents. Nowadays, with quickly development of the e-commerce websites, forums and social networks, the products, people, organizations or wonders are targeted of comments or reviews of the network communities. Many people often use that reviews to make their decision on something. Whereas, there are many people or organizations use the reviews to mislead readers. Therefore, it is so necessary to detect those bad behaviors in reviews. In this paper, we research this problem and propose an appropriate method for detecting Vietnamese reviews being spam or non-spam. The accuracy of our method is up to 90%.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
A Similarity Measure for Weaving Patterns in Textiles
Authors:
Sven Helmer,
Vuong M. Ngo
Abstract:
We propose a novel approach for measuring the similarity between weaving patterns that can provide similarity-based search functionality for textile archives. We represent textile structures using hypergraphs and extract multisets of k-neighborhoods from these graphs. The resulting multisets are then compared using Jaccard coefficients, Hamming distances, and cosine measures. We evaluate the diffe…
▽ More
We propose a novel approach for measuring the similarity between weaving patterns that can provide similarity-based search functionality for textile archives. We represent textile structures using hypergraphs and extract multisets of k-neighborhoods from these graphs. The resulting multisets are then compared using Jaccard coefficients, Hamming distances, and cosine measures. We evaluate the different variants of our similarity measure experimentally, showing that it can be implemented efficiently and illustrating its quality using it to cluster and query a data set containing more than a thousand textile samples.
△ Less
Submitted 10 October, 2018;
originally announced October 2018.
-
Discovering Latent Information By Spreading Activation Algorithm For Document Retrieval
Authors:
Vuong M. Ngo
Abstract:
Syntactic search relies on keywords contained in a query to find suitable documents. So, documents that do not contain the keywords but contain information related to the query are not retrieved. Spreading activation is an algorithm for finding latent information in a query by exploiting relations between nodes in an associative network or semantic network. However, the classical spreading activat…
▽ More
Syntactic search relies on keywords contained in a query to find suitable documents. So, documents that do not contain the keywords but contain information related to the query are not retrieved. Spreading activation is an algorithm for finding latent information in a query by exploiting relations between nodes in an associative network or semantic network. However, the classical spreading activation algorithm uses all relations of a node in the network that will add unsuitable information into the query. In this paper, we propose a novel approach for semantic text search, called query-oriented-constrained spreading activation that only uses relations relating to the content of the query to find really related information. Experiments on a benchmark dataset show that, in terms of the MAP measure, our search engine is 18.9% and 43.8% respectively better than the syntactic search and the search using the classical constrained spreading activation.
KEYWORDS: Information Retrieval, Ontology, Semantic Search, Spreading Activation
△ Less
Submitted 29 July, 2018;
originally announced August 2018.
-
Opinion Spam Recognition Method for Online Reviews using Ontological Features
Authors:
L. H. Nguyen,
N. T. H. Pham,
V. M. Ngo
Abstract:
Nowadays, there are a lot of people using social media opinions to make their decision on buying products or services. Opinion spam detection is a hard problem because fake reviews can be made by organizations as well as individuals for different purposes. They write fake reviews to mislead readers or automated detection system by promoting or demoting target products to promote them or to damage…
▽ More
Nowadays, there are a lot of people using social media opinions to make their decision on buying products or services. Opinion spam detection is a hard problem because fake reviews can be made by organizations as well as individuals for different purposes. They write fake reviews to mislead readers or automated detection system by promoting or demoting target products to promote them or to damage their reputations. In this paper, we pro-pose a new approach using knowledge-based Ontology to detect opinion spam with high accuracy (higher than 75%). Keywords: Opinion spam, Fake review, E-commercial, Ontology.
△ Less
Submitted 29 July, 2018;
originally announced July 2018.
-
Combining Named Entities with WordNet and Using Query-Oriented Spreading Activation for Semantic Text Search
Authors:
Vuong M. Ngo,
Tru H. Cao,
Tuan M. V. Le
Abstract:
Purely keyword-based text search is not satisfactory because named entities and WordNet words are also important elements to define the content of a document or a query in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. Words in WordNet also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those feature…
▽ More
Purely keyword-based text search is not satisfactory because named entities and WordNet words are also important elements to define the content of a document or a query in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. Words in WordNet also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those features of concepts may be hidden from their textual appearance. Besides, there are related concepts that do not appear in a query, but can bring out the meaning of the query if they are added. We propose an ontology-based generalized Vector Space Model to semantic text search. It exploits ontological features of named entities and WordNet words, and develops a query-oriented spreading activation algorithm to expand queries. In addition, it combines and utilizes advantages of different ontologies for semantic annotation and searching. Experiments on a benchmark dataset show that, in terms of the MAP measure, our model is 42.5% better than the purely keyword-based model, and 32.3% and 15.9% respectively better than the ones using only WordNet or named entities.
Keywords: semantic search, spreading activation, ontology, named entity, WordNet.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
Exploring Combinations of Ontological Features and Keywords for Text Retrieval
Authors:
Tru H. Cao,
Khanh C. Le,
Vuong M. Ngo
Abstract:
Named entities have been considered and combined with keywords to enhance information retrieval performance. However, there is not yet a formal and complete model that takes into account entity names, classes, and identifiers together. Our work explores various adaptations of the traditional Vector Space Model that combine different ontological features with keywords, and in different ways. It sho…
▽ More
Named entities have been considered and combined with keywords to enhance information retrieval performance. However, there is not yet a formal and complete model that takes into account entity names, classes, and identifiers together. Our work explores various adaptations of the traditional Vector Space Model that combine different ontological features with keywords, and in different ways. It shows better performance of the proposed models as compared to the keyword-based Lucene, and their advantages for both text retrieval and representation of documents and queries.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
A Generalized Vector Space Model for Ontology-Based Information Retrieval
Authors:
Vuong M. Ngo,
Tru H. Cao
Abstract:
Named entities (NE) are objects that are referred to by names such as people, organizations and locations. Named entities and keywords are important to the meaning of a document. We propose a generalized vector space model that combines named entities and keywords. In the model, we take into account different ontological features of named entities, namely, aliases, classes and identifiers. Moreove…
▽ More
Named entities (NE) are objects that are referred to by names such as people, organizations and locations. Named entities and keywords are important to the meaning of a document. We propose a generalized vector space model that combines named entities and keywords. In the model, we take into account different ontological features of named entities, namely, aliases, classes and identifiers. Moreover, we use entity classes to represent the latent information of interrogative words in Wh-queries, which are ignored in traditional keyword-based searching. We have implemented and tested the proposed model on a TREC dataset, as presented and discussed in the paper.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
Semantic Document Clustering on Named Entity Features
Authors:
Tru H. Cao,
Vuong M. Ngo,
Dung T. Hong,
Tho T. Quan
Abstract:
Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many cases are of user concerns. First, the traditional keyword-based vector space model is adapted with vectors defined over spaces of entity names, types, name-type p…
▽ More
Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many cases are of user concerns. First, the traditional keyword-based vector space model is adapted with vectors defined over spaces of entity names, types, name-type pairs, and identifiers, instead of keywords. Then, hierarchical document clustering can be performed using the similarity measure defined as the cosines of the vectors representing documents. Experimental results are presented and discussed. Clustering documents by information of named entities could be useful for managing web-based learning materials with respect to related objects.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
Ontology-Based Query Expansion with Latently Related Named Entities for Semantic Text Search
Authors:
Vuong M. Ngo,
Tru H. Cao
Abstract:
Traditional information retrieval systems represent documents and queries by keyword sets. However, the content of a document or a query is mainly defined by both keywords and named entities occurring in it. Named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. Besides, the meaning of a query may imply latent name…
▽ More
Traditional information retrieval systems represent documents and queries by keyword sets. However, the content of a document or a query is mainly defined by both keywords and named entities occurring in it. Named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. Besides, the meaning of a query may imply latent named entities that are related to the apparent ones in the query. We propose an ontology-based generalized vector space model to semantic text search. It exploits ontological features of named entities and their latently related ones to reveal the semantics of documents and queries. We also propose a framework to combine different ontologies to take their complementary advantages for semantic annotation and searching. Experiments on a benchmark dataset show better search quality of our model to other ones.
△ Less
Submitted 15 July, 2018;
originally announced July 2018.
-
Discovering Latent Concepts and Exploiting Ontological Features for Semantic Text Search
Authors:
Vuong M. Ngo,
Tru H. Cao
Abstract:
Named entities and WordNet words are important in defining the content of a text in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. WordNet words also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those features of concepts may be hidden from their textual appearance. Besides, there are related conce…
▽ More
Named entities and WordNet words are important in defining the content of a text in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. WordNet words also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those features of concepts may be hidden from their textual appearance. Besides, there are related concepts that do not appear in a query, but can bring out the meaning of the query if they are added. The traditional constrained spreading activation algorithms use all relations of a node in the network that will add unsuitable information into the query. Meanwhile, we only use relations represented in the query. We propose an ontology-based generalized Vector Space Model to semantic text search. It discovers relevant latent concepts in a query by relation constrained spreading activation. Besides, to represent a word having more than one possible direct sense, it combines the most specific common hypernym of the remaining undisambiguated multi-senses with the form of the word. Experiments on a benchmark dataset in terms of the MAP measure for the retrieval performance show that our model is 41.9% and 29.3% better than the purely keyword-based model and the traditional constrained spreading activation model, respectively.
△ Less
Submitted 15 July, 2018;
originally announced July 2018.
-
Semantic Search by Latent Ontological Features
Authors:
Tru H. Cao,
Vuong M. Ngo
Abstract:
Both named entities and keywords are important in defining the content of a text in which they occur. In particular, people often use named entities in information search. However, named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. We propose ontology-based extensions of the traditional Vector Space Model that…
▽ More
Both named entities and keywords are important in defining the content of a text in which they occur. In particular, people often use named entities in information search. However, named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. We propose ontology-based extensions of the traditional Vector Space Model that explore different combinations of those latent ontological features with keywords for text retrieval. Our experiments on benchmark datasets show better search quality of the proposed models as compared to the purely keyword-based model, and their advantages for both text retrieval and representation of documents and queries.
△ Less
Submitted 15 July, 2018;
originally announced July 2018.
-
WordNet-Based Information Retrieval Using Common Hypernyms and Combined Features
Authors:
Vuong M. Ngo,
Tru H. Cao,
Tuan M. V. Le
Abstract:
Text search based on lexical matching of keywords is not satisfactory due to polysemous and synonymous words. Semantic search that exploits word meanings, in general, improves search performance. In this paper, we survey WordNet-based information retrieval systems, which employ a word sense disambiguation method to process queries and documents. The problem is that in many cases a word has more th…
▽ More
Text search based on lexical matching of keywords is not satisfactory due to polysemous and synonymous words. Semantic search that exploits word meanings, in general, improves search performance. In this paper, we survey WordNet-based information retrieval systems, which employ a word sense disambiguation method to process queries and documents. The problem is that in many cases a word has more than one possible direct sense, and picking only one of them may give a wrong sense for the word. Moreover, the previous systems use only word forms to represent word senses and their hypernyms. We propose a novel approach that uses the most specific common hypernym of the remaining undisambiguated multi-senses of a word, as well as combined WordNet features to represent word meanings. Experiments on a benchmark dataset show that, in terms of the MAP measure, our search engine is 17.7% better than the lexical search, and at least 9.4% better than all surveyed search systems using WordNet.
Keywords Ontology, word sense disambiguation, semantic annotation, semantic search.
△ Less
Submitted 15 July, 2018;
originally announced July 2018.
-
An Efficient Data Warehouse for Crop Yield Prediction
Authors:
Vuong M. Ngo,
Nhien-An Le-Khac,
M-Tahar Kechadi
Abstract:
Nowadays, precision agriculture combined with modern information and communications technologies, is becoming more common in agricultural activities such as automated irrigation systems, precision planting, variable rate applications of nutrients and pesticides, and agricultural decision support systems. In the latter, crop management data analysis, based on machine learning and data mining, focus…
▽ More
Nowadays, precision agriculture combined with modern information and communications technologies, is becoming more common in agricultural activities such as automated irrigation systems, precision planting, variable rate applications of nutrients and pesticides, and agricultural decision support systems. In the latter, crop management data analysis, based on machine learning and data mining, focuses mainly on how to efficiently forecast and improve crop yield. In recent years, raw and semi-processed agricultural data are usually collected using sensors, robots, satellites, weather stations, farm equipment, farmers and agribusinesses while the Internet of Things (IoT) should deliver the promise of wirelessly connecting objects and devices in the agricultural ecosystem. Agricultural data typically captures information about farming entities and operations. Every farming entity encapsulates an individual farming concept, such as field, crop, seed, soil, temperature, humidity, pest, and weed. Agricultural datasets are spatial, temporal, complex, heterogeneous, non-standardized, and very large. In particular, agricultural data is considered as Big Data in terms of volume, variety, velocity and veracity. Designing and develo** a data warehouse for precision agriculture is a key foundation for establishing a crop intelligence platform, which will enable resource efficient agronomy decision making and recommendations. Some of the requirements for such an agricultural data warehouse are privacy, security, and real-time access among its stakeholders (e.g., farmers, farm equipment manufacturers, agribusinesses, co-operative societies, customers and possibly Government agencies). However, currently there are very few reports in the literature that focus on the design of efficient data warehouses with the view of enabling Agricultural Big Data analysis and data mining. In this paper ...
△ Less
Submitted 26 June, 2018;
originally announced July 2018.