Search | arXiv e-print repository

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Authors: Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V. Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Railey Montalan, Ryan Ignatius, Joanito Agili Lopo, William Nixon, Börje F. Karlsson, James Jaya, Ryandito Diandaru, Yuze Gao, Patrick Amadeus, Bin Wang, Jan Christian Blaise Cruz, Chenxi Whitehouse , et al. (36 additional authors not shown)

Abstract: Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due t… ▽ More Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due to the scarcity of high-quality datasets, compounded by the dominance of English training data, raising concerns about potential cultural misrepresentation. To address these challenges, we introduce SEACrowd, a collaborative initiative that consolidates a comprehensive resource hub that fills the resource gap by providing standardized corpora in nearly 1,000 SEA languages across three modalities. Through our SEACrowd benchmarks, we assess the quality of AI models on 36 indigenous languages across 13 tasks, offering valuable insights into the current AI landscape in SEA. Furthermore, we propose strategies to facilitate greater AI advancements, maximizing potential utility and resource equity for the future of AI in SEA. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: https://github.com/SEACrowd

arXiv:2406.05967 [pdf, other]

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Authors: David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song , et al. (50 additional authors not shown)

Abstract: Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen… ▽ More Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of languages and cultures, where we engage native speakers and cultural experts in the data collection process. As a result, CVQA includes culturally-driven images and questions from across 28 countries on four continents, covering 26 languages with 11 scripts, providing a total of 9k questions. We then benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models. This benchmark can serve as a probing evaluation suite for assessing the cultural capability and bias of multimodal models and hopefully encourage more research efforts toward increasing cultural awareness and linguistic diversity in this field. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2404.02565 [pdf, other]

Spatial Summation of Localized Pressure for Haptic Sensory Prostheses

Authors: Sreela Kodali, Cihualpilli Camino Cruz, Thomas C. Bulea, Kevin S. Rao Diana Bharucha-Goebel, Alexander T. Chesler, Carsten G. Bonnemann, Allison M. Okamura

Abstract: A host of medical conditions, including amputations, diabetes, stroke, and genetic disease, result in loss of touch sensation. Because most types of sensory loss have no pharmacological treatment or rehabilitative therapy, we propose a haptic sensory prosthesis that provides substitutive feedback. The wrist and forearm are compelling locations for feedback due to available skin area and not occlud… ▽ More A host of medical conditions, including amputations, diabetes, stroke, and genetic disease, result in loss of touch sensation. Because most types of sensory loss have no pharmacological treatment or rehabilitative therapy, we propose a haptic sensory prosthesis that provides substitutive feedback. The wrist and forearm are compelling locations for feedback due to available skin area and not occluding the hands, but have reduced mechanoreceptor density compared to the fingertips. Focusing on localized pressure as the feedback modality, we hypothesize that we can improve on prior devices by invoking a wider range of stimulus intensity using multiple points of pressure to evoke spatial summation, which is the cumulative perceptual experience from multiple points of stimuli. We conducted a preliminary perceptual test to investigate this idea and found that just noticeable difference is reduced with two points of pressure compared to one, motivating future work using spatial summation in sensory prostheses. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 2 pages, 2 figures, 2024 IEEE Haptics Symposium Work-in-Progress Paper

arXiv:2403.07769 [pdf]

Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations

Authors: Carlos Jose Xavier Cruz

Abstract: This article explores the dynamic influence of computational entities based on multi-agent systems theory (SMA) combined with large language models (LLM), which are characterized by their ability to simulate complex human interactions, as a possibility to revolutionize human user interaction from the use of specialized artificial agents to support everything from operational organizational process… ▽ More This article explores the dynamic influence of computational entities based on multi-agent systems theory (SMA) combined with large language models (LLM), which are characterized by their ability to simulate complex human interactions, as a possibility to revolutionize human user interaction from the use of specialized artificial agents to support everything from operational organizational processes to strategic decision making based on applied knowledge and human orchestration. Previous investigations reveal that there are limitations, particularly in the autonomous approach of artificial agents, especially when dealing with new challenges and pragmatic tasks such as inducing logical reasoning and problem solving. It is also considered that traditional techniques, such as the stimulation of chains of thoughts, require explicit human guidance. In our approach we employ agents developed from large language models (LLM), each with distinct prototy** that considers behavioral elements, driven by strategies that stimulate the generation of knowledge based on the use case proposed in the scenario (role-play) business, using a discussion approach between agents (guided conversation). We demonstrate the potential of develo** agents useful for organizational strategies, based on multi-agent system theories (SMA) and innovative uses based on large language models (LLM based), offering a differentiated and adaptable experiment to different applications, complexities, domains, and capabilities from LLM. △ Less

Submitted 15 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2401.06161 [pdf]

Trustworthy human-centric based Automated Decision-Making Systems

Authors: Marcelino Cabrera, Carlos Cruz, Pavel Novoa-Hernández, David A. Pelta, José Luis Verdegay

Abstract: Automated Decision-Making Systems (ADS) have become pervasive across various fields, activities, and occupations, to enhance performance. However, this widespread adoption introduces potential risks, including the misuse of ADS. Such misuse may manifest when ADS is employed in situations where it is unnecessary or when essential requirements, conditions, and terms are overlooked, leading to uninte… ▽ More Automated Decision-Making Systems (ADS) have become pervasive across various fields, activities, and occupations, to enhance performance. However, this widespread adoption introduces potential risks, including the misuse of ADS. Such misuse may manifest when ADS is employed in situations where it is unnecessary or when essential requirements, conditions, and terms are overlooked, leading to unintended consequences. This research paper presents a thorough examination of the implications, distinctions, and ethical considerations associated with digitalization, digital transformation, and the utilization of ADS in contemporary society and future contexts. Emphasis is placed on the imperative need for regulation, transparency, and ethical conduct in the deployment of ADS. △ Less

Submitted 22 December, 2023; originally announced January 2024.

Comments: 16 pages, 1 Table

arXiv:2310.16322 [pdf, other]

Samsung R&D Institute Philippines at WMT 2023

Authors: Jan Christian Blaise Cruz

Abstract: In this paper, we describe the constrained MT systems submitted by Samsung R&D Institute Philippines to the WMT 2023 General Translation Task for two directions: en$\rightarrow$he and he$\rightarrow$en. Our systems comprise of Transformer-based sequence-to-sequence models that are trained with a mix of best practices: comprehensive data preprocessing pipelines, synthetic backtranslated data, and t… ▽ More In this paper, we describe the constrained MT systems submitted by Samsung R&D Institute Philippines to the WMT 2023 General Translation Task for two directions: en$\rightarrow$he and he$\rightarrow$en. Our systems comprise of Transformer-based sequence-to-sequence models that are trained with a mix of best practices: comprehensive data preprocessing pipelines, synthetic backtranslated data, and the use of noisy channel reranking during online decoding. Our models perform comparably to, and sometimes outperform, strong baseline unconstrained systems such as mBART50 M2M and NLLB 200 MoE despite having significantly fewer parameters on two public benchmarks: FLORES-200 and NTREX-128. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: To appear in Proceedings of the Eighth Conference on Machine Translation 2023 (WMT)

arXiv:2308.05609 [pdf, ps, other]

LASIGE and UNICAGE solution to the NASA LitCoin NLP Competition

Authors: Pedro Ruas, Diana F. Sousa, André Neves, Carlos Cruz, Francisco M. Couto

Abstract: Biomedical Natural Language Processing (NLP) tends to become cumbersome for most researchers, frequently due to the amount and heterogeneity of text to be processed. To address this challenge, the industry is continuously develo** highly efficient tools and creating more flexible engineering solutions. This work presents the integration between industry data engineering solutions for efficient d… ▽ More Biomedical Natural Language Processing (NLP) tends to become cumbersome for most researchers, frequently due to the amount and heterogeneity of text to be processed. To address this challenge, the industry is continuously develo** highly efficient tools and creating more flexible engineering solutions. This work presents the integration between industry data engineering solutions for efficient data processing and academic systems developed for Named Entity Recognition (LasigeUnicage\_NER) and Relation Extraction (BiOnt). Our design reflects an integration of those components with external knowledge in the form of additional training data from other datasets and biomedical ontologies. We used this pipeline in the 2022 LitCoin NLP Challenge, where our team LasigeUnicage was awarded the 7th Prize out of approximately 200 participating teams, reflecting a successful collaboration between the academia (LASIGE) and the industry (Unicage). The software supporting this work is available at \url{https://github.com/lasigeBioTM/Litcoin-Lasige_Unicage}. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2307.10296 [pdf, other]

Towards Automated Semantic Segmentation in Mammography Images

Authors: Cesar A. Sierra-Franco, Jan Hurtado, Victor de A. Thomaz, Leonardo C. da Cruz, Santiago V. Silva, Alberto B. Raposo

Abstract: Mammography images are widely used to detect non-palpable breast lesions or nodules, preventing cancer and providing the opportunity to plan interventions when necessary. The identification of some structures of interest is essential to make a diagnosis and evaluate image adequacy. Thus, computer-aided detection systems can be helpful in assisting medical interpretation by automatically segmenting… ▽ More Mammography images are widely used to detect non-palpable breast lesions or nodules, preventing cancer and providing the opportunity to plan interventions when necessary. The identification of some structures of interest is essential to make a diagnosis and evaluate image adequacy. Thus, computer-aided detection systems can be helpful in assisting medical interpretation by automatically segmenting these landmark structures. In this paper, we propose a deep learning-based framework for the segmentation of the nipple, the pectoral muscle, the fibroglandular tissue, and the fatty tissue on standard-view mammography images. We introduce a large private segmentation dataset and extensive experiments considering different deep-learning model architectures. Our experiments demonstrate accurate segmentation performance on variate and challenging cases, showing that this framework can be integrated into clinical practice. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: 6 pages

arXiv:2307.01548 [pdf, other]

Knowledge Graph for NLG in the context of conversational agents

Authors: Hussam Ghanem, Massinissa Atmani, Christophe Cruz

Abstract: The use of knowledge graphs (KGs) enhances the accuracy and comprehensiveness of the responses provided by a conversational agent. While generating answers during conversations consists in generating text from these KGs, it is still regarded as a challenging task that has gained significant attention in recent years. In this document, we provide a review of different architectures used for knowled… ▽ More The use of knowledge graphs (KGs) enhances the accuracy and comprehensiveness of the responses provided by a conversational agent. While generating answers during conversations consists in generating text from these KGs, it is still regarded as a challenging task that has gained significant attention in recent years. In this document, we provide a review of different architectures used for knowledge graph-to-text generation including: Graph Neural Networks, the Graph Transformer, and linearization with seq2seq models. We discuss the advantages and limitations of each architecture and conclude that the choice of architecture will depend on the specific requirements of the task at hand. We also highlight the importance of considering constraints such as execution time and model validity, particularly in the context of conversational agents. Based on these constraints and the availability of labeled data for the domains of DAVI, we choose to use seq2seq Transformer-based models (PLMs) for the Knowledge Graph-to-Text Generation task. We aim to refine benchmark datasets of kg-to-text generation on PLMs and to explore the emotional and multilingual dimensions in our future work. Overall, this review provides insights into the different approaches for knowledge graph-to-text generation and outlines future directions for research in this area. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Journal ref: French Regional Conference on Complex Systems (FRCCS 2023), May 2023, Le Havre, France

arXiv:2306.15898 [pdf, other]

doi 10.24963/ijcai.2023/531

Pseudo-Labeling Enhanced by Privileged Information and Its Application to In Situ Sequencing Images

Authors: Marzieh Haghighi, Mario C. Cruz, Erin Weisbart, Beth A. Cimini, Avtar Singh, Julia Bauman, Maria E. Lozada, Sanam L. Kavari, James T. Neal, Paul C. Blainey, Anne E. Carpenter, Shantanu Singh

Abstract: Various strategies for label-scarce object detection have been explored by the computer vision research community. These strategies mainly rely on assumptions that are specific to natural images and not directly applicable to the biological and biomedical vision domains. For example, most semi-supervised learning strategies rely on a small set of labeled data as a confident source of ground truth.… ▽ More Various strategies for label-scarce object detection have been explored by the computer vision research community. These strategies mainly rely on assumptions that are specific to natural images and not directly applicable to the biological and biomedical vision domains. For example, most semi-supervised learning strategies rely on a small set of labeled data as a confident source of ground truth. In many biological vision applications, however, the ground truth is unknown and indirect information might be available in the form of noisy estimations or orthogonal evidence. In this work, we frame a crucial problem in spatial transcriptomics - decoding barcodes from In-Situ-Sequencing (ISS) images - as a semi-supervised object detection (SSOD) problem. Our proposed framework incorporates additional available sources of information into a semi-supervised learning framework in the form of privileged information. The privileged information is incorporated into the teacher's pseudo-labeling in a teacher-student self-training iteration. Although the available privileged information could be data domain specific, we have introduced a general strategy of pseudo-labeling enhanced by privileged information (PLePI) and exemplified the concept using ISS images, as well on the COCO benchmark using extra evidence provided by CLIP. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: This paper has been accepted for publication at IJCAI 2023

Journal ref: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI), Main Track, Pages 4775-4784, 2023

arXiv:2306.10034 [pdf]

Unlocking Insights into Business Trajectories with Transformer-based Spatio-temporal Data Analysis

Authors: Muhammad Arslan, Christophe Cruz

Abstract: The world of business is constantly evolving and staying ahead of the curve requires a deep understanding of market trends and performance. This article addresses this requirement by modeling business trajectories using news articles data. The world of business is constantly evolving and staying ahead of the curve requires a deep understanding of market trends and performance. This article addresses this requirement by modeling business trajectories using news articles data. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: Presented in the conference Spatial Analysis and GEOmatics 2023 SAGEO

arXiv:2306.07046 [pdf]

Imbalanced Multi-label Classification for Business-related Text with Moderately Large Label Spaces

Authors: Muhammad Arslan, Christophe Cruz

Abstract: In this study, we compared the performance of four different methods for multi label text classification using a specific imbalanced business dataset. The four methods we evaluated were fine tuned BERT, Binary Relevance, Classifier Chains, and Label Powerset. The results show that fine tuned BERT outperforms the other three methods by a significant margin, achieving high values of accuracy, F1 Sco… ▽ More In this study, we compared the performance of four different methods for multi label text classification using a specific imbalanced business dataset. The four methods we evaluated were fine tuned BERT, Binary Relevance, Classifier Chains, and Label Powerset. The results show that fine tuned BERT outperforms the other three methods by a significant margin, achieving high values of accuracy, F1 Score, Precision, and Recall. Binary Relevance also performs well on this dataset, while Classifier Chains and Label Powerset demonstrate relatively poor performance. These findings highlight the effectiveness of fine tuned BERT for multi label text classification tasks, and suggest that it may be a useful tool for businesses seeking to analyze complex and multifaceted texts. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Journal ref: https://easychair.org/smart-program/FRCCS2023/2023-06-01.html

arXiv:2305.14235 [pdf, other]

Multilingual Large Language Models Are Not (Yet) Code-Switchers

Authors: Ruochen Zhang, Samuel Cahyawijaya, Jan Christian Blaise Cruz, Genta Indra Winata, Alham Fikri Aji

Abstract: Multilingual Large Language Models (LLMs) have recently shown great capabilities in a wide range of tasks, exhibiting state-of-the-art performance through zero-shot or few-shot prompting methods. While there have been extensive studies on their abilities in monolingual tasks, the investigation of their potential in the context of code-switching (CSW), the practice of alternating languages within a… ▽ More Multilingual Large Language Models (LLMs) have recently shown great capabilities in a wide range of tasks, exhibiting state-of-the-art performance through zero-shot or few-shot prompting methods. While there have been extensive studies on their abilities in monolingual tasks, the investigation of their potential in the context of code-switching (CSW), the practice of alternating languages within an utterance, remains relatively uncharted. In this paper, we provide a comprehensive empirical analysis of various multilingual LLMs, benchmarking their performance across four tasks: sentiment analysis, machine translation, summarization and word-level language identification. Our results indicate that despite multilingual LLMs exhibiting promising outcomes in certain tasks using zero or few-shot prompting, they still underperform in comparison to fine-tuned models of much smaller scales. We argue that current "multilingualism" in LLMs does not inherently imply proficiency with code-switching texts, calling for future research to bridge this discrepancy. △ Less

Submitted 23 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted at EMNLP 2023

arXiv:2303.13592 [pdf, other]

Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages

Authors: Zheng-Xin Yong, Ruochen Zhang, Jessica Zosa Forde, Skyler Wang, Arjun Subramonian, Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Lintang Sutawika, Jan Christian Blaise Cruz, Yin Lin Tan, Long Phan, Rowena Garcia, Thamar Solorio, Alham Fikri Aji

Abstract: While code-mixing is a common linguistic practice in many parts of the world, collecting high-quality and low-cost code-mixed data remains a challenge for natural language processing (NLP) research. The recent proliferation of Large Language Models (LLMs) compels one to ask: how capable are these systems in generating code-mixed data? In this paper, we explore prompting multilingual LLMs in a zero… ▽ More While code-mixing is a common linguistic practice in many parts of the world, collecting high-quality and low-cost code-mixed data remains a challenge for natural language processing (NLP) research. The recent proliferation of Large Language Models (LLMs) compels one to ask: how capable are these systems in generating code-mixed data? In this paper, we explore prompting multilingual LLMs in a zero-shot manner to generate code-mixed data for seven languages in South East Asia (SEA), namely Indonesian, Malay, Chinese, Tagalog, Vietnamese, Tamil, and Singlish. We find that publicly available multilingual instruction-tuned models such as BLOOMZ and Flan-T5-XXL are incapable of producing texts with phrases or clauses from different languages. ChatGPT exhibits inconsistent capabilities in generating code-mixed texts, wherein its performance varies depending on the prompt template and language pairing. For instance, ChatGPT generates fluent and natural Singlish texts (an English-based creole spoken in Singapore), but for English-Tamil language pair, the system mostly produces grammatically incorrect or semantically meaningless utterances. Furthermore, it may erroneously introduce languages not specified in the prompt. Based on our investigation, existing multilingual LLMs exhibit a wide range of proficiency in code-mixed data generation for SEA languages. As such, we advise against using LLMs in this context without extensive human checks. △ Less

Submitted 12 September, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Updating Authors

arXiv:2301.05122 [pdf, other]

Quantum algorithm for finding minimum values in a Quantum Random Access Memory

Authors: Anton S. Albino, Lucas Q. Galvão, Ethan Hansen, Mauro Q. Nooblath Neto, Clebson Cruz

Abstract: Finding the minimum value in an unordered database is a common and fundamental task in computer science. However, the optimal classical deterministic algorithm can find the minimum value with a time complexity that grows linearly with the number of elements in the database. In this paper, we present the proposal of a quantum algorithm for finding the minimum value of a database, which is quadratic… ▽ More Finding the minimum value in an unordered database is a common and fundamental task in computer science. However, the optimal classical deterministic algorithm can find the minimum value with a time complexity that grows linearly with the number of elements in the database. In this paper, we present the proposal of a quantum algorithm for finding the minimum value of a database, which is quadratically faster than its best classical analogs. We assume a Quantum Random Access Memory (QRAM) that stores values from a database and perform an iterative search based on an oracle whose role is to limit the searched values by controlling the states of the most significant qubits. A complexity analysis was performed in order to demonstrate the advantage of this quantum algorithm over its classical counterparts. Furthermore, we demonstrate how the proposed algorithm would be used in an unsupervised machine learning task through a quantum version of the K-means algorithm. △ Less

Submitted 12 January, 2023; originally announced January 2023.

arXiv:2212.13656 [pdf, other]

Smart meter data processing: a showcase for simple and efficient textual processing

Authors: Miguel Ferreira, André Neves, Rodrigo Gorjão, Carlos Cruz, Miguel L. Pardal

Abstract: The increase in the production and collection of data from devices is an ongoing trend due to the roll-out of more cyber-physical applications. Smart meters, because of their importance in power grids, are a class of such devices whose produced data requires meticulous processing. In this paper, we use Unicage, a data processing system based on classic Unix shell scripting, that delivers excellent… ▽ More The increase in the production and collection of data from devices is an ongoing trend due to the roll-out of more cyber-physical applications. Smart meters, because of their importance in power grids, are a class of such devices whose produced data requires meticulous processing. In this paper, we use Unicage, a data processing system based on classic Unix shell scripting, that delivers excellent performance in a simple package. We use this methodology to process smart meter data in XML format, subjected to the constraints posed by a real use case. We develop a solution that parses, validates and performs a simple aggregation of 27 million XML files in less than 10 minutes. We present a study of the solution as well as the benefits of its adoption. △ Less

Submitted 27 December, 2022; originally announced December 2022.

Comments: 11 pages, 5 figures, 1 table, 9 listings. Accepted after review for the 1st Workshop on High-Performance and Reliable Big Data (HPBD 2021), which was held virtually on September 20th 2021, and was co-located with the 40th International Symposium on Reliable Distributed Systems (SRDS 2021)

arXiv:2205.00952 [pdf, other]

Leaf Tar Spot Detection Using RGB Images

Authors: Sriram Baireddy, Da-Young Lee, Carlos Gongora-Canul, Christian D. Cruz, Edward J. Delp

Abstract: Tar spot disease is a fungal disease that appears as a series of black circular spots containing spores on corn leaves. Tar spot has proven to be an impactful disease in terms of reducing crop yield. To quantify disease progression, experts usually have to visually phenotype leaves from the plant. This process is very time-consuming and is difficult to incorporate in any high-throughput phenotypin… ▽ More Tar spot disease is a fungal disease that appears as a series of black circular spots containing spores on corn leaves. Tar spot has proven to be an impactful disease in terms of reducing crop yield. To quantify disease progression, experts usually have to visually phenotype leaves from the plant. This process is very time-consuming and is difficult to incorporate in any high-throughput phenoty** system. Deep neural networks could provide quick, automated tar spot detection with sufficient ground truth. However, manually labeling tar spots in images to serve as ground truth is also tedious and time-consuming. In this paper we first describe an approach that uses automated image analysis tools to generate ground truth images that are then used for training a Mask R-CNN. We show that a Mask R-CNN can be used effectively to detect tar spots in close-up images of leaf surfaces. We additionally show that the Mask R-CNN can also be used for in-field images of whole leaves to capture the number of tar spots and area of the leaf infected by the disease. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2204.03251 [pdf, other]

Towards Automatic Construction of Filipino WordNet: Word Sense Induction and Synset Induction Using Sentence Embeddings

Authors: Dan John Velasco, Axel Alba, Trisha Gail Pelagio, Bryce Anthony Ramirez, Unisse Chua, Briane Paul Samson, Jan Christian Blaise Cruz, Charibeth Cheng

Abstract: Wordnets are indispensable tools for various natural language processing applications. Unfortunately, wordnets get outdated, and producing or updating wordnets can be slow and costly in terms of time and resources. This problem intensifies for low-resource languages. This study proposes a method for word sense induction and synset induction using only two linguistic resources, namely, an unlabeled… ▽ More Wordnets are indispensable tools for various natural language processing applications. Unfortunately, wordnets get outdated, and producing or updating wordnets can be slow and costly in terms of time and resources. This problem intensifies for low-resource languages. This study proposes a method for word sense induction and synset induction using only two linguistic resources, namely, an unlabeled corpus and a sentence embeddings-based language model. The resulting sense inventory and synonym sets can be used in automatically creating a wordnet. We applied this method on a corpus of Filipino text. The sense inventory and synsets were evaluated by matching them with the sense inventory of the machine translated Princeton WordNet, as well as comparing the synsets to the Filipino WordNet. This study empirically shows that the 30% of the induced word senses are valid and 40% of the induced synsets are valid in which 20% are novel synsets. △ Less

Submitted 19 October, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: To appear in SEALP 2023. Formerly titled "Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings"

arXiv:2204.02653 [pdf, ps, other]

Using Synthetic Data for Conversational Response Generation in Low-resource Settings

Authors: Gabriel Louis Tan, Adrian Paule Ty, Schuyler Ng, Denzel Adrian Co, Jan Christian Blaise Cruz, Charibeth Cheng

Abstract: Response generation is a task in natural language processing (NLP) where a model is trained to respond to human statements. Conversational response generators take this one step further with the ability to respond within the context of previous responses. While there are existing techniques for training such models, they all require an abundance of conversational data which are not always availabl… ▽ More Response generation is a task in natural language processing (NLP) where a model is trained to respond to human statements. Conversational response generators take this one step further with the ability to respond within the context of previous responses. While there are existing techniques for training such models, they all require an abundance of conversational data which are not always available for low-resource languages. In this research, we make three contributions. First, we released the first Filipino conversational dataset collected from a popular Philippine online forum, which we named the PEx Conversations Dataset. Second, we introduce a data augmentation (DA) methodology for Filipino data by employing a Tagalog RoBERTa model to increase the size of the existing corpora. Lastly, we published the first Filipino conversational response generator capable of generating responses related to the previous 3 responses. With the supplementary synthetic data, we were able to improve the performance of the response generator by up to 12.2% in BERTScore, 10.7% in perplexity, and 11.7% in content word usage as compared to training with zero synthetic data. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2111.10513 [pdf, other]

Data Processing Matters: SRPH-Konvergen AI's Machine Translation System for WMT'21

Authors: Lintang Sutawika, Jan Christian Blaise Cruz

Abstract: In this paper, we describe the submission of the joint Samsung Research Philippines-Konvergen AI team for the WMT'21 Large Scale Multilingual Translation Task - Small Track 2. We submit a standard Seq2Seq Transformer model to the shared task without any training or architecture tricks, relying mainly on the strength of our data preprocessing techniques to boost performance. Our final submission mo… ▽ More In this paper, we describe the submission of the joint Samsung Research Philippines-Konvergen AI team for the WMT'21 Large Scale Multilingual Translation Task - Small Track 2. We submit a standard Seq2Seq Transformer model to the shared task without any training or architecture tricks, relying mainly on the strength of our data preprocessing techniques to boost performance. Our final submission model scored 22.92 average BLEU on the FLORES-101 devtest set, and scored 22.97 average BLEU on the contest's hidden test set, ranking us sixth overall. Despite using only a standard Transformer, our model ranked first in Indonesian to Javanese, showing that data preprocessing matters equally, if not more, than cutting edge model architectures and training techniques. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: In Proceedings of the Sixth Conference on Machine Translation (WMT)

arXiv:2111.06053 [pdf, other]

Improving Large-scale Language Models and Resources for Filipino

Authors: Jan Christian Blaise Cruz, Charibeth Cheng

Abstract: In this paper, we improve on existing language resources for the low-resource Filipino language in two ways. First, we outline the construction of the TLUnified dataset, a large-scale pretraining corpus that serves as an improvement over smaller existing pretraining datasets for the language in terms of scale and topic variety. Second, we pretrain new Transformer language models following the RoBE… ▽ More In this paper, we improve on existing language resources for the low-resource Filipino language in two ways. First, we outline the construction of the TLUnified dataset, a large-scale pretraining corpus that serves as an improvement over smaller existing pretraining datasets for the language in terms of scale and topic variety. Second, we pretrain new Transformer language models following the RoBERTa pretraining technique to supplant existing models trained with small corpora. Our new RoBERTa models show significant improvements over existing Filipino models in three benchmark datasets with an average gain of 4.47% test accuracy across the three classification tasks of varying difficulty. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: Resources are available at blaisecruz.com/resources

arXiv:2105.12949 [pdf, other]

A Survey on Interactive Reinforcement Learning: Design Principles and Open Challenges

Authors: Christian Arzate Cruz, Takeo Igarashi

Abstract: Interactive reinforcement learning (RL) has been successfully used in various applications in different fields, which has also motivated HCI researchers to contribute in this area. In this paper, we survey interactive RL to empower human-computer interaction (HCI) researchers with the technical background in RL needed to design new interaction techniques and propose new applications. We elucidate… ▽ More Interactive reinforcement learning (RL) has been successfully used in various applications in different fields, which has also motivated HCI researchers to contribute in this area. In this paper, we survey interactive RL to empower human-computer interaction (HCI) researchers with the technical background in RL needed to design new interaction techniques and propose new applications. We elucidate the roles played by HCI researchers in interactive RL, identifying ideas and promising research directions. Furthermore, we propose generic design principles that will provide researchers with a guide to effectively implement interactive RL applications. △ Less

Submitted 27 May, 2021; originally announced May 2021.

arXiv:2105.12944 [pdf, other]

MarioMix: Creating Aligned Playstyles for Bots with Interactive Reinforcement Learning

Authors: Christian Arzate Cruz, Takeo Igarashi

Abstract: In this paper, we propose a generic framework that enables game developers without knowledge of machine learning to create bot behaviors with playstyles that align with their preferences. Our framework is based on interactive reinforcement learning (RL), and we used it to create a behavior authoring tool called MarioMix. This tool enables non-experts to create bots with varied playstyles for the g… ▽ More In this paper, we propose a generic framework that enables game developers without knowledge of machine learning to create bot behaviors with playstyles that align with their preferences. Our framework is based on interactive reinforcement learning (RL), and we used it to create a behavior authoring tool called MarioMix. This tool enables non-experts to create bots with varied playstyles for the game titled Super Mario Bros. The main interaction procedure of MarioMix consists of presenting short clips of gameplay displaying precomputed bots with different playstyles to end-users. Then, end-users can select the bot with the playstyle that behaves as intended. We evaluated MarioMix by incorporating input from game designers working in the industry. △ Less

Submitted 27 May, 2021; originally announced May 2021.

arXiv:2105.12938 [pdf, other]

Interactive Explanations: Diagnosis and Repair of Reinforcement Learning Based Agent Behaviors

Authors: Christian Arzate Cruz, Takeo Igarashi

Abstract: Reinforcement learning techniques successfully generate convincing agent behaviors, but it is still difficult to tailor the behavior to align with a user's specific preferences. What is missing is a communication method for the system to explain the behavior and for the user to repair it. In this paper, we present a novel interaction method that uses interactive explanations using templates of nat… ▽ More Reinforcement learning techniques successfully generate convincing agent behaviors, but it is still difficult to tailor the behavior to align with a user's specific preferences. What is missing is a communication method for the system to explain the behavior and for the user to repair it. In this paper, we present a novel interaction method that uses interactive explanations using templates of natural language as a communication method. The main advantage of this interaction method is that it enables a two-way communication channel between users and the agent; the bot can explain its thinking procedure to the users, and the users can communicate their behavior preferences to the bot using the same interactive explanations. In this manner, the thinking procedure of the bot is transparent, and users can provide corrections to the bot that include a suggested action to take, a goal to achieve, and the reasons behind these decisions. We tested our proposed method in a clone of the video game named \textit{Super Mario Bros.}, and the results demonstrate that our interactive explanation approach is effective at diagnosing and repairing bot behaviors. △ Less

Submitted 27 May, 2021; originally announced May 2021.

arXiv:2010.11574 [pdf, other]

Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets

Authors: Jan Christian Blaise Cruz, Jose Kristian Resabal, James Lin, Dan John Velasco, Charibeth Cheng

Abstract: Transformers represent the state-of-the-art in Natural Language Processing (NLP) in recent years, proving effective even in tasks done in low-resource languages. While pretrained transformers for these languages can be made, it is challenging to measure their true performance and capacity due to the lack of hard benchmark datasets, as well as the difficulty and cost of producing them. In this pape… ▽ More Transformers represent the state-of-the-art in Natural Language Processing (NLP) in recent years, proving effective even in tasks done in low-resource languages. While pretrained transformers for these languages can be made, it is challenging to measure their true performance and capacity due to the lack of hard benchmark datasets, as well as the difficulty and cost of producing them. In this paper, we present three contributions: First, we propose a methodology for automatically producing Natural Language Inference (NLI) benchmark datasets for low-resource languages using published news articles. Through this, we create and release NewsPH-NLI, the first sentence entailment benchmark dataset in the low-resource Filipino language. Second, we produce new pretrained transformers based on the ELECTRA technique to further alleviate the resource scarcity in Filipino, benchmarking them on our dataset against other commonly-used transfer learning techniques. Lastly, we perform analyses on transfer learning techniques to shed light on their true performance when operating in low-data domains through the use of degradation tests. △ Less

Submitted 13 August, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: To appear in PRICAI 2021. Formerly titled "Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation." Code and data available at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks

arXiv:2005.02068 [pdf, other]

Establishing Baselines for Text Classification in Low-Resource Languages

Authors: Jan Christian Blaise Cruz, Charibeth Cheng

Abstract: While transformer-based finetuning techniques have proven effective in tasks that involve low-resource, low-data environments, a lack of properly established baselines and benchmark datasets make it hard to compare different approaches that are aimed at tackling the low-resource setting. In this work, we provide three contributions. First, we introduce two previously unreleased datasets as benchma… ▽ More While transformer-based finetuning techniques have proven effective in tasks that involve low-resource, low-data environments, a lack of properly established baselines and benchmark datasets make it hard to compare different approaches that are aimed at tackling the low-resource setting. In this work, we provide three contributions. First, we introduce two previously unreleased datasets as benchmark datasets for text classification and low-resource multilabel text classification for the low-resource language Filipino. Second, we pretrain better BERT and DistilBERT models for use within the Filipino setting. Third, we introduce a simple degradation test that benchmarks a model's resistance to performance degradation as the number of training samples are reduced. We analyze our pretrained model's degradation speeds and look towards the use of this method for comparing models aimed at operating within the low-resource setting. We release all our models and datasets for the research community to use. △ Less

Submitted 5 May, 2020; originally announced May 2020.

Comments: We release all our models, finetuning code, and data at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks

arXiv:2005.01107 [pdf, other]

Simplifying Paragraph-level Question Generation via Transformer Language Models

Authors: Luis Enrico Lopez, Diane Kathryn Cruz, Jan Christian Blaise Cruz, Charibeth Cheng

Abstract: Question generation (QG) is a natural language generation task where a model is trained to ask questions corresponding to some input text. Most recent approaches frame QG as a sequence-to-sequence problem and rely on additional features and mechanisms to increase performance; however, these often increase model complexity, and can rely on auxiliary data unavailable in practical use. A single Trans… ▽ More Question generation (QG) is a natural language generation task where a model is trained to ask questions corresponding to some input text. Most recent approaches frame QG as a sequence-to-sequence problem and rely on additional features and mechanisms to increase performance; however, these often increase model complexity, and can rely on auxiliary data unavailable in practical use. A single Transformer-based unidirectional language model leveraging transfer learning can be used to produce high quality questions while disposing of additional task-specific complexity. Our QG model, finetuned from GPT-2 Small, outperforms several paragraph-level QG baselines on the SQuAD dataset by 0.95 METEOR points. Human evaluators rated questions as easy to answer, relevant to their context paragraph, and corresponding well to natural human speech. Also introduced is a new set of baseline scores on the RACE dataset, which has not previously been used for QG tasks. Further experimentation with varying model capacities and datasets with non-identification type questions is recommended in order to further verify the robustness of pretrained Transformer-based LMs as question generators. △ Less

Submitted 13 August, 2021; v1 submitted 3 May, 2020; originally announced May 2020.

Comments: To appear in PRICAI 2021. Formerly titled "Transformer-based End-to-End Question Generation."

arXiv:2003.00762 [pdf, other]

Flashlight CNN Image Denoising

Authors: Pham Huu Thanh Binh, Cristóvão Cruz, Karen Egiazarian

Abstract: This paper proposes a learning-based denoising method called FlashLight CNN (FLCNN) that implements a deep neural network for image denoising. The proposed approach is based on deep residual networks and inception networks and it is able to leverage many more parameters than residual networks alone for denoising grayscale images corrupted by additive white Gaussian noise (AWGN). FlashLight CNN dem… ▽ More This paper proposes a learning-based denoising method called FlashLight CNN (FLCNN) that implements a deep neural network for image denoising. The proposed approach is based on deep residual networks and inception networks and it is able to leverage many more parameters than residual networks alone for denoising grayscale images corrupted by additive white Gaussian noise (AWGN). FlashLight CNN demonstrates state of the art performance when compared quantitatively and visually with the current state of the art image denoising methods. △ Less

Submitted 2 July, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

arXiv:1911.01279 [pdf]

Automated Smart Wick System-Based Microfarm Using Internet of Things

Authors: R. Jorda, Jr., C. Alcabasa, A. Buhay, E. C. Dela Cruz, J. P. Mendoza, A. Tolentino, L. K. Tolentino, E. Fernandez, A. Thio-ac, J. Velasco, N. Arago

Abstract: This paper presents a study conducted to allow urban farmers to remotely monitor their farm through the design and development of an Internet of Things-based (IoT) microfarm prototype which utilized wick system as planting method. The system involves the detection of three environmental parameters namely, light intensity, soil moisture and temperature through the use of respective sensors which we… ▽ More This paper presents a study conducted to allow urban farmers to remotely monitor their farm through the design and development of an Internet of Things-based (IoT) microfarm prototype which utilized wick system as planting method. The system involves the detection of three environmental parameters namely, light intensity, soil moisture and temperature through the use of respective sensors which were connected to the Arduino microcontroller, the sensor node of the system. Irregularities in the aforementioned parameters were neutralized through the use of parameter regulators such as LED growlight strips, water pump and air cooler. The data collected by these sensors were gathered by the Arduino microcontroller and were sent to the Web database through the IoT gateway which was the Raspberry Pi computer chip. These data were also sent to an Android unit installed with the Microfarm Companion application which was capable of monitoring and controlling the environmental parameters observed in the microfarm. The application allows the user to view the current value of the parameter involved and to choose whether to control the parameter regulators automatically or manually. The microfarm system runs autonomously which reduces the labor required to produce healthy plants and crops. Mustard greens samples were used in testing the system. After a month of monitoring the height of the samples, it was observed that the average height of the samples is about 0.23 cm taller than the standard height. The proponents has also tested the system functionality by evaluating the sensor data log that provides the values gathered by the sensors and the turn-on times of the parameter regulators. From these data, it can be observed that whenever the values obtained by the sensors fall outside the threshold range, the parameter regulators turns on, indicating that the system is working properly. △ Less

Submitted 30 October, 2019; originally announced November 2019.

Journal ref: Lecture Notes on Research and Innovation in Computer Engineering and Computer Sciences, 2019

arXiv:1910.09295 [pdf, other]

doi 10.13140/RG.2.2.23028.40322

Localization of Fake News Detection via Multitask Transfer Learning

Authors: Jan Christian Blaise Cruz, Julianne Agatha Tan, Charibeth Cheng

Abstract: The use of the internet as a fast medium of spreading fake news reinforces the need for computational tools that combat it. Techniques that train fake news classifiers exist, but they all assume an abundance of resources including large labeled datasets and expert-curated corpora, which low-resource languages may not have. In this work, we make two main contributions: First, we alleviate resource… ▽ More The use of the internet as a fast medium of spreading fake news reinforces the need for computational tools that combat it. Techniques that train fake news classifiers exist, but they all assume an abundance of resources including large labeled datasets and expert-curated corpora, which low-resource languages may not have. In this work, we make two main contributions: First, we alleviate resource scarcity by constructing the first expertly-curated benchmark dataset for fake news detection in Filipino, which we call "Fake News Filipino." Second, we benchmark Transfer Learning (TL) techniques and show that they can be used to train robust fake news classifiers from little data, achieving 91% accuracy on our fake news dataset, reducing the error by 14% compared to established few-shot baselines. Furthermore, lifting ideas from multitask learning, we show that augmenting transformer-based transfer techniques with auxiliary language modeling losses improves their performance by adapting to writing style. Using this, we improve TL performance by 4-6%, achieving an accuracy of 96% on our best model. Lastly, we show that our method generalizes well to different types of news articles, including political news, entertainment news, and opinion articles. △ Less

Submitted 15 May, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

Comments: Published in the LREC 2020 Proceedings. Models and data available at https://github.com/jcblaisecruz02/Tagalog-fake-news

Journal ref: In Proceedings of The 12th Language Resources and Evaluation Conference, pp.2589-2597 (2020)

arXiv:1907.07286 [pdf, other]

Vertex arboricity of cographs

Authors: Sebastián González Hermosillo de la Maza, Pavol Hell, César Hernández Cruz, Seyyed Aliasghar Hosseini, Payam Valadkhan

Abstract: Arboricity is a graph parameter akin to chromatic number, in that it seeks to partition the vertices into the smallest number of sparse subgraphs. Where for the chromatic number we are partitioning the vertices into independent sets, for the arboricity we want to partition the vertices into cycle-free subsets (i.e., forests). Arboricity is NP-hard in general, and our focus is on the arboricity of… ▽ More Arboricity is a graph parameter akin to chromatic number, in that it seeks to partition the vertices into the smallest number of sparse subgraphs. Where for the chromatic number we are partitioning the vertices into independent sets, for the arboricity we want to partition the vertices into cycle-free subsets (i.e., forests). Arboricity is NP-hard in general, and our focus is on the arboricity of cographs. For arboricity two, we obtain the complete list of minimal cograph obstructions. These minimal obstructions do generalize to higher arboricities; however, we no longer have a complete list, and in fact, the number of minimal cograph obstructions grows exponentially with arboricity. We obtain bounds on their size and the height of their cotrees. More generally, we consider the following common generalization of colouring and partition into forests: given non-negative integers $p$ and $q$, we ask if a given cograph $G$ admits a vertex partition into $p$ forests and $q$ independent sets. We give a polynomial-time dynamic programming algorithm for this problem. In fact, the algorithm solves a more general problem which also includes several other problems such as finding a maximum $q$-colourable subgraph, maximum subgraph of arboricity-$p$, minimum vertex feedback set and minimum $q$ of a $q$-colourable vertex feedback set. △ Less

Submitted 16 July, 2019; originally announced July 2019.

Comments: 14 pages, 1 figure

MSC Class: 05C70; 05C75

arXiv:1907.00409 [pdf, other]

doi 10.13140/RG.2.2.23028.40322

Evaluating Language Model Finetuning Techniques for Low-resource Languages

Authors: Jan Christian Blaise Cruz, Charibeth Cheng

Abstract: Unlike mainstream languages (such as English and French), low-resource languages often suffer from a lack of expert-annotated corpora and benchmark resources that make it hard to apply state-of-the-art techniques directly. In this paper, we alleviate this scarcity problem for the low-resourced Filipino language in two ways. First, we introduce a new benchmark language modeling dataset in Filipino… ▽ More Unlike mainstream languages (such as English and French), low-resource languages often suffer from a lack of expert-annotated corpora and benchmark resources that make it hard to apply state-of-the-art techniques directly. In this paper, we alleviate this scarcity problem for the low-resourced Filipino language in two ways. First, we introduce a new benchmark language modeling dataset in Filipino which we call WikiText-TL-39. Second, we show that language model finetuning techniques such as BERT and ULMFiT can be used to consistently train robust classifiers in low-resource settings, experiencing at most a 0.0782 increase in validation error when the number of training examples is decreased from 10K to 1K while finetuning using a privately-held sentiment dataset. △ Less

Submitted 30 June, 2019; originally announced July 2019.

Comments: Pretrained models and datasets available at https://github.com/jcblaisecruz02/Tagalog-BERT

arXiv:1803.02112 [pdf, other]

doi 10.1109/LSP.2018.2850222

Nonlocality-Reinforced Convolutional Neural Networks for Image Denoising

Authors: Cristóvão Cruz, Alessandro Foi, Vladimir Katkovnik, Karen Egiazarian

Abstract: We introduce a paradigm for nonlocal sparsity reinforced deep convolutional neural network denoising. It is a combination of a local multiscale denoising by a convolutional neural network (CNN) based denoiser and a nonlocal denoising based on a nonlocal filter (NLF) exploiting the mutual similarities between groups of patches. CNN models are leveraged with noise levels that progressively decrease… ▽ More We introduce a paradigm for nonlocal sparsity reinforced deep convolutional neural network denoising. It is a combination of a local multiscale denoising by a convolutional neural network (CNN) based denoiser and a nonlocal denoising based on a nonlocal filter (NLF) exploiting the mutual similarities between groups of patches. CNN models are leveraged with noise levels that progressively decrease at every iteration of our framework, while their output is regularized by a nonlocal prior implicit within the NLF. Unlike complicated neural networks that embed the nonlocality prior within the layers of the network, our framework is modular, it uses standard pre-trained CNNs together with standard nonlocal filters. An instance of the proposed framework, called NN3D, is evaluated over large grayscale image datasets showing state-of-the-art performance. △ Less

Submitted 21 June, 2018; v1 submitted 6 March, 2018; originally announced March 2018.

Comments: Accepted for publication in IEEE SPL

arXiv:1704.04126 [pdf, other]

doi 10.1109/TIP.2017.2779265

Single Image Super-Resolution based on Wiener Filter in Similarity Domain

Authors: Cristóvão Cruz, Rakesh Mehta, Vladimir Katkovnik, Karen Egiazarian

Abstract: Single image super resolution (SISR) is an ill-posed problem aiming at estimating a plausible high resolution (HR) image from a single low resolution (LR) image. Current state-of-the-art SISR methods are patch-based. They use either external data or internal self-similarity to learn a prior for a HR image. External data based methods utilize large number of patches from the training data, while se… ▽ More Single image super resolution (SISR) is an ill-posed problem aiming at estimating a plausible high resolution (HR) image from a single low resolution (LR) image. Current state-of-the-art SISR methods are patch-based. They use either external data or internal self-similarity to learn a prior for a HR image. External data based methods utilize large number of patches from the training data, while self-similarity based approaches leverage one or more similar patches from the input image. In this paper we propose a self-similarity based approach that is able to use large groups of similar patches extracted from the input image to solve the SISR problem. We introduce a novel prior leading to collaborative filtering of patch groups in 1D similarity domain and couple it with an iterative back-projection framework. The performance of the proposed algorithm is evaluated on a number of SISR benchmark datasets. Without using any external data, the proposed approach outperforms the current non-CNN based methods on the tested datasets for various scaling factors. On certain datasets, the gain is over 1 dB, when compared to the recent method A+. For high sampling rate (x4) the proposed method performs similarly to very recent state-of-the-art deep convolutional network based approaches. △ Less

Submitted 29 November, 2017; v1 submitted 13 April, 2017; originally announced April 2017.

Comments: Paper accepted for publication on IEEE Transactions on Image Processing

arXiv:1412.0854 [pdf, other]

Semantic HMC for Big Data Analysis

Authors: Thomas Hassan, Rafael Peixoto, Christophe Cruz, Aurlie Bertaux, Nuno Silva

Abstract: Analyzing Big Data can help corporations to im-prove their efficiency. In this work we present a new vision to derive Value from Big Data using a Semantic Hierarchical Multi-label Classification called Semantic HMC based in a non-supervised Ontology learning process. We also proposea Semantic HMC process, using scalable Machine-Learning techniques and Rule-based reasoning. Analyzing Big Data can help corporations to im-prove their efficiency. In this work we present a new vision to derive Value from Big Data using a Semantic Hierarchical Multi-label Classification called Semantic HMC based in a non-supervised Ontology learning process. We also proposea Semantic HMC process, using scalable Machine-Learning techniques and Rule-based reasoning. △ Less

Submitted 2 December, 2014; originally announced December 2014.

arXiv:1301.5349 [pdf]

Toward the Automatic Generation of a Semantic VRML Model from Unorganized 3D Point Clouds

Authors: Helmi Ben Hmida, Christophe Cruz, Christophe Nicolle, Frank Boochs

Abstract: This paper presents our experience regarding the creation of 3D semantic facility model out of unorganized 3D point clouds. Thus, a knowledge-based detection approach of objects using the OWL ontology language is presented. This knowledge is used to define SWRL detection rules. In addition, the combination of 3D processing built-ins and topological Built-Ins in SWRL rules aims at combining geometr… ▽ More This paper presents our experience regarding the creation of 3D semantic facility model out of unorganized 3D point clouds. Thus, a knowledge-based detection approach of objects using the OWL ontology language is presented. This knowledge is used to define SWRL detection rules. In addition, the combination of 3D processing built-ins and topological Built-Ins in SWRL rules aims at combining geometrical analysis of 3D point clouds and specialist's knowledge. This combination allows more flexible and intelligent detection and the annotation of objects contained in 3D point clouds. The created WiDOP prototype takes a set of 3D point clouds as input, and produces an indexed scene of colored objects visualized within VRML language as output. The context of the study is the detection of railway objects materialized within the Deutsche Bahn scene such as signals, technical cupboards, electric poles, etc. Therefore, the resulting enriched and populated domain ontology, that contains the annotations of objects in the point clouds, is used to feed a GIS system. △ Less

Submitted 21 January, 2013; originally announced January 2013.

Comments: arXiv admin note: substantial text overlap with arXiv:1301.4991, arXiv:1301.4783

Journal ref: The Fifth International Conference on Advances in Semantic Processing, Lisbon : Portugal (2011)

arXiv:1301.4992 [pdf]

From 9-IM Topological Operators to Qualitative Spatial Relations using 3D Selective Nef Complexes and Logic Rules for bodies

Authors: Helmi Ben Hmida, Christophe Cruz, Frank Boochs, Christophe Nicolle

Abstract: This paper presents a method to compute automatically topological relations using SWRL rules. The calculation of these rules is based on the definition of a Selective Nef Complexes Nef Polyhedra structure generated from standard Polyhedron. The Selective Nef Complexes is a data model providing a set of binary Boolean operators such as Union, Difference, Intersection and Symmetric difference, and u… ▽ More This paper presents a method to compute automatically topological relations using SWRL rules. The calculation of these rules is based on the definition of a Selective Nef Complexes Nef Polyhedra structure generated from standard Polyhedron. The Selective Nef Complexes is a data model providing a set of binary Boolean operators such as Union, Difference, Intersection and Symmetric difference, and unary operators such as Interior, Closure and Boundary. In this work, these operators are used to compute topological relations between objects defined by the constraints of the 9 Intersection Model (9-IM) from Egenhofer. With the help of these constraints, we defined a procedure to compute the topological relations on Nef polyhedra. These topological relationships are Disjoint, Meets, Contains, Inside, Covers, CoveredBy, Equals and Overlaps, and defined in a top-level ontology with a specific semantic definition on relation such as Transitive, Symmetric, Asymmetric, Functional, Reflexive, and Irreflexive. The results of the computation of topological relationships are stored in an OWL-DL ontology allowing after what to infer on these new relationships between objects. In addition, logic rules based on the Semantic Web Rule Language allows the definition of logic programs that define which topological relationships have to be computed on which kind of objects with specific attributes. For instance, a "Building" that overlaps a "Railway" is a "RailStation". △ Less

Submitted 21 January, 2013; originally announced January 2013.

Comments: arXiv admin note: substantial text overlap with arXiv:1301.4780

Journal ref: International Conference on Knowledge Engineering and Ontology Development, Barcelone : Spain (2012)

arXiv:1301.4991 [pdf]

Knowledge Base Approach for 3D Objects Detection in Point Clouds Using 3D Processing and Specialists Knowledge

Authors: Helmi Ben Hmida, Christophe Cruz, Frank Boochs, Christophe Nicolle

Abstract: This paper presents a knowledge-based detection of objects approach using the OWL ontology language, the Semantic Web Rule Language, and 3D processing built-ins aiming at combining geometrical analysis of 3D point clouds and specialist's knowledge. Here, we share our experience regarding the creation of 3D semantic facility model out of unorganized 3D point clouds. Thus, a knowledge-based detectio… ▽ More This paper presents a knowledge-based detection of objects approach using the OWL ontology language, the Semantic Web Rule Language, and 3D processing built-ins aiming at combining geometrical analysis of 3D point clouds and specialist's knowledge. Here, we share our experience regarding the creation of 3D semantic facility model out of unorganized 3D point clouds. Thus, a knowledge-based detection approach of objects using the OWL ontology language is presented. This knowledge is used to define SWRL detection rules. In addition, the combination of 3D processing built-ins and topological Built-Ins in SWRL rules allows a more flexible and intelligent detection, and the annotation of objects contained in 3D point clouds. The created WiDOP prototype takes a set of 3D point clouds as input, and produces as output a populated ontology corresponding to an indexed scene visualized within VRML language. The context of the study is the detection of railway objects materialized within the Deutsche Bahn scene such as signals, technical cupboards, electric poles, etc. Thus, the resulting enriched and populated ontology, that contains the annotations of objects in the point clouds, is used to feed a GIS system or an IFC file for architecture purposes. △ Less

Submitted 21 January, 2013; originally announced January 2013.

Comments: ISSN: 1942-2679. arXiv admin note: text overlap with arXiv:1301.4783

Journal ref: International Journal On Advances in Intelligent Systems 5, 1 et 2 (2012) 1-14

arXiv:1301.4848 [pdf]

doi 10.1109/SSD.2011.5993558

Integration of knowledge to support automatic object reconstruction from images and 3D data

Authors: Frank Boochs, Andreas Marbs, Hung Truong, Helmi Ben Hmida, Ashish Karmacharya, Christophe Cruz, Adlane Habed, Yvon Voisin, Christophe Nicolle

Abstract: Object reconstruction is an important task in many fields of application as it allows to generate digital representations of our physical world used as base for analysis, planning, construction, visualization or other aims. A reconstruction itself normally is based on reliable data (images, 3D point clouds for example) expressing the object in his complete extent. This data then has to be compiled… ▽ More Object reconstruction is an important task in many fields of application as it allows to generate digital representations of our physical world used as base for analysis, planning, construction, visualization or other aims. A reconstruction itself normally is based on reliable data (images, 3D point clouds for example) expressing the object in his complete extent. This data then has to be compiled and analyzed in order to extract all necessary geometrical elements, which represent the object and form a digital copy of it. Traditional strategies are largely based on manual interaction and interpretation, because with increasing complexity of objects human understanding is inevitable to achieve acceptable and reliable results. But human interaction is time consuming and expensive, why many researches has already been invested to use algorithmic support, what allows to speed up the process and to reduce manual work load. Presently most of such supporting algorithms are data-driven and concentate on specific features of the objects, being accessible to numerical models. By means of these models, which normally will represent geometrical (flatness, roughness, for example) or physical features (color, texture), the data is classified and analyzed. This is successful for objects with low complexity, but gets to its limits with increasing complexness of objects. Then purely numerical strategies are not able to sufficiently model the reality. Therefore, the intention of our approach is to take human cognitive strategy as an example, and to simulate extraction processes based on available human defined knowledge for the objects of interest. Such processes will introduce a semantic structure for the objects and guide the algorithms used to detect and recognize objects, which will yield a higher effectiveness. Hence, our research proposes an approach using knowledge to guide the algorithms in 3D point cloud and image processing. △ Less

Submitted 21 January, 2013; originally announced January 2013.

Journal ref: Systems, Signals and Devices (SSD), 2011 8th International Multi-Conference on, Chemnitz : Germany (2011)

arXiv:1301.4783 [pdf]

From 3D Point Clouds To Semantic Objects An Ontology-Based Detection Approach

Authors: Helmi Ben Hmida, Christophe Cruz, Frank Boochs, Christophe Nicolle

Abstract: This paper presents a knowledge-based detection of objects approach using the OWL ontology language, the Semantic Web Rule Language, and 3D processing built-ins aiming at combining geometrical analysis of 3D point clouds and specialist's knowledge. This combination allows the detection and the annotation of objects contained in point clouds. The context of the study is the detection of railway obj… ▽ More This paper presents a knowledge-based detection of objects approach using the OWL ontology language, the Semantic Web Rule Language, and 3D processing built-ins aiming at combining geometrical analysis of 3D point clouds and specialist's knowledge. This combination allows the detection and the annotation of objects contained in point clouds. The context of the study is the detection of railway objects such as signals, technical cupboards, electric poles, etc. Thus, the resulting enriched and populated ontology, that contains the annotations of objects in the point clouds, is used to feed a GIS systems or an IFC file for architecture purposes. △ Less

Submitted 21 January, 2013; originally announced January 2013.

Journal ref: International Conference on Knowledge Engineering and Ontology Development, Paris : France (2011)

arXiv:1301.4781 [pdf]

Ontology-based Recommender System of Economic Articles

Authors: David Werner, Christophe Cruz, Christophe Nicolle

Abstract: Decision makers need economical information to drive their decisions. The Company Actualis SARL is specialized in the production and distribution of a press review about French regional economic actors. This economic review represents for a client a prospecting tool on partners and competitors. To reduce the overload of useless information, the company is moving towards a customized review for eac… ▽ More Decision makers need economical information to drive their decisions. The Company Actualis SARL is specialized in the production and distribution of a press review about French regional economic actors. This economic review represents for a client a prospecting tool on partners and competitors. To reduce the overload of useless information, the company is moving towards a customized review for each customer. Three issues appear to achieve this goal. First, how to identify the elements in the text in order to extract objects that match with the recommendation's criteria presented? Second, How to define the structure of these objects, relationships and articles in order to provide a source of knowledge usable by the extraction process to produce new knowledge from articles? The latter issue is the feedback on customer experience to identify the quality of distributed information in real-time and to improve the relevance of the recommendations. This paper presents a new type of recommendation based on the semantic description of both articles and user profile. △ Less

Submitted 21 January, 2013; originally announced January 2013.

Journal ref: 8th International Conference on Web Information Systems and Technologies, Porto : Portugal (2013)

arXiv:1301.4780 [pdf]

doi 10.1109/CSAE.2012.6272992

From Quantitative Spatial Operator to Qualitative Spatial Relation Using Constructive Solid Geometry, Logic Rules and Optimized 9-IM Model, A Semantic Based Approach

Authors: Helmi Ben Hmida, Christophe Cruz, Frank Boochs, Christophe Nicolle

Abstract: The Constructive Solid Geometry (CSG) is a data model providing a set of binary Boolean operators such as Union, Difference and Intersection. In this work, these operators are used to compute topological relations between objects defined by the constraints of the nine Intersection Model (9-IM) from Egenhofer. With the help of these constraints, we define a procedure to compute the topological rela… ▽ More The Constructive Solid Geometry (CSG) is a data model providing a set of binary Boolean operators such as Union, Difference and Intersection. In this work, these operators are used to compute topological relations between objects defined by the constraints of the nine Intersection Model (9-IM) from Egenhofer. With the help of these constraints, we define a procedure to compute the topological relations on CSG objects. These topological relations are Disjoint, Contains, Inside, Covers, CoveredBy, Equals and Overlaps, and are defined in a top-level ontology with a specific semantic definition on relation such as Transitive, Symmetric, Asymmetric, Functional, Reflexive, and Irreflexive. The results of topological relations computation are stored in the ontology allowing after what to infer on these topological relationships. In addition, logic rules based on the Semantic Web Language allows the definition of logic programs that define which topological relationships have to be computed on which kind of objects. For instance, a "Building" that overlaps a "Railway" is a "RailStation". △ Less

Submitted 21 January, 2013; originally announced January 2013.

Journal ref: IEEE International Conference on Computer Science and Automation Engineering (CSAE),, Zhangjiajie : China (2012)

arXiv:1208.1750 [pdf]

Guidelines for a Dynamic Ontology - Integrating Tools of Evolution and Versioning in Ontology

Authors: Perrine Pittet, Christophe Nicolle, Christophe Cruz

Abstract: Ontologies are built on systems that conceptually evolve over time. In addition, techniques and languages for building ontologies evolve too. This has led to numerous studies in the field of ontology versioning and ontology evolution. This paper presents a new way to manage the lifecycle of an ontology incorporating both versioning tools and evolution process. This solution, called VersionGraph, i… ▽ More Ontologies are built on systems that conceptually evolve over time. In addition, techniques and languages for building ontologies evolve too. This has led to numerous studies in the field of ontology versioning and ontology evolution. This paper presents a new way to manage the lifecycle of an ontology incorporating both versioning tools and evolution process. This solution, called VersionGraph, is integrated in the source ontology since its creation in order to make it possible to evolve and to be versioned. Change management is strongly related to the model in which the ontology is represented. Therefore, we focus on the OWL language in order to take into account the impact of the changes on the logical consistency of the ontology like specified in OWL DL. △ Less

Submitted 8 August, 2012; originally announced August 2012.

Journal ref: KMIS 2011 - International Conference on Knowledge Management and Information Sharing is part of 3rd International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management., Paris : France (2011)

Showing 1–43 of 43 results for author: Cruz, C