Search | arXiv e-print repository

arXiv:2405.07338 [pdf, other]

Explainable Convolutional Neural Networks for Retinal Fundus Classification and Cutting-Edge Segmentation Models for Retinal Blood Vessels from Fundus Images

Authors: Fatema Tuj Johora Faria, Mukaffi Bin Moin, Pronay Debnath, Asif Iftekher Fahim, Faisal Muhammad Shah

Abstract: Our research focuses on the critical field of early diagnosis of disease by examining retinal blood vessels in fundus images. While automatic segmentation of retinal blood vessels holds promise for early detection, accurate analysis remains challenging due to the limitations of existing methods, which often lack discrimination power and are susceptible to influences from pathological regions. Our… ▽ More Our research focuses on the critical field of early diagnosis of disease by examining retinal blood vessels in fundus images. While automatic segmentation of retinal blood vessels holds promise for early detection, accurate analysis remains challenging due to the limitations of existing methods, which often lack discrimination power and are susceptible to influences from pathological regions. Our research in fundus image analysis advances deep learning-based classification using eight pre-trained CNN models. To enhance interpretability, we utilize Explainable AI techniques such as Grad-CAM, Grad-CAM++, Score-CAM, Faster Score-CAM, and Layer CAM. These techniques illuminate the decision-making processes of the models, fostering transparency and trust in their predictions. Expanding our exploration, we investigate ten models, including TransUNet with ResNet backbones, Attention U-Net with DenseNet and ResNet backbones, and Swin-UNET. Incorporating diverse architectures such as ResNet50V2, ResNet101V2, ResNet152V2, and DenseNet121 among others, this comprehensive study deepens our insights into attention mechanisms for enhanced fundus image analysis. Among the evaluated models for fundus image classification, ResNet101 emerged with the highest accuracy, achieving an impressive 94.17%. On the other end of the spectrum, EfficientNetB0 exhibited the lowest accuracy among the models, achieving a score of 88.33%. Furthermore, in the domain of fundus image segmentation, Swin-Unet demonstrated a Mean Pixel Accuracy of 86.19%, showcasing its effectiveness in accurately delineating regions of interest within fundus images. Conversely, Attention U-Net with DenseNet201 backbone exhibited the lowest Mean Pixel Accuracy among the evaluated models, achieving a score of 75.87%. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.07332 [pdf, other]

PotatoGANs: Utilizing Generative Adversarial Networks, Instance Segmentation, and Explainable AI for Enhanced Potato Disease Identification and Classification

Authors: Mohammad Shafiul Alam, Fatema Tuj Johora Faria, Mukaffi Bin Moin, Ahmed Al Wase, Md. Rabius Sani, Khan Md Hasib

Abstract: Numerous applications have resulted from the automation of agricultural disease segmentation using deep learning techniques. However, when applied to new conditions, these applications frequently face the difficulty of overfitting, resulting in lower segmentation performance. In the context of potato farming, where diseases have a large influence on yields, it is critical for the agricultural econ… ▽ More Numerous applications have resulted from the automation of agricultural disease segmentation using deep learning techniques. However, when applied to new conditions, these applications frequently face the difficulty of overfitting, resulting in lower segmentation performance. In the context of potato farming, where diseases have a large influence on yields, it is critical for the agricultural economy to quickly and properly identify these diseases. Traditional data augmentation approaches, such as rotation, flip, and translation, have limitations and frequently fail to provide strong generalization results. To address these issues, our research employs a novel approach termed as PotatoGANs. In this novel data augmentation approach, two types of Generative Adversarial Networks (GANs) are utilized to generate synthetic potato disease images from healthy potato images. This approach not only expands the dataset but also adds variety, which helps to enhance model generalization. Using the Inception score as a measure, our experiments show the better quality and realisticness of the images created by PotatoGANs, emphasizing their capacity to resemble real disease images closely. The CycleGAN model outperforms the Pix2Pix GAN model in terms of image quality, as evidenced by its higher IS scores CycleGAN achieves higher Inception scores (IS) of 1.2001 and 1.0900 for black scurf and common scab, respectively. This synthetic data can significantly improve the training of large neural networks. It also reduces data collection costs while enhancing data diversity and generalization capabilities. Our work improves interpretability by combining three gradient-based Explainable AI algorithms (GradCAM, GradCAM++, and ScoreCAM) with three distinct CNN architectures (DenseNet169, Resnet152 V2, InceptionResNet V2) for potato disease classification. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.04610 [pdf, other]

Exploring Explainable AI Techniques for Improved Interpretability in Lung and Colon Cancer Classification

Authors: Mukaffi Bin Moin, Fatema Tuj Johora Faria, Swarnajit Saha, Busra Kamal Rafa, Mohammad Shafiul Alam

Abstract: Lung and colon cancer are serious worldwide health challenges that require early and precise identification to reduce mortality risks. However, diagnosis, which is mostly dependent on histopathologists' competence, presents difficulties and hazards when expertise is insufficient. While diagnostic methods like imaging and blood markers contribute to early detection, histopathology remains the gold… ▽ More Lung and colon cancer are serious worldwide health challenges that require early and precise identification to reduce mortality risks. However, diagnosis, which is mostly dependent on histopathologists' competence, presents difficulties and hazards when expertise is insufficient. While diagnostic methods like imaging and blood markers contribute to early detection, histopathology remains the gold standard, although time-consuming and vulnerable to inter-observer mistakes. Limited access to high-end technology further limits patients' ability to receive immediate medical care and diagnosis. Recent advances in deep learning have generated interest in its application to medical imaging analysis, specifically the use of histopathological images to diagnose lung and colon cancer. The goal of this investigation is to use and adapt existing pre-trained CNN-based models, such as Xception, DenseNet201, ResNet101, InceptionV3, DenseNet121, DenseNet169, ResNet152, and InceptionResNetV2, to enhance classification through better augmentation strategies. The results show tremendous progress, with all eight models reaching impressive accuracy ranging from 97% to 99%. Furthermore, attention visualization techniques such as GradCAM, GradCAM++, ScoreCAM, Faster Score-CAM, and LayerCAM, as well as Vanilla Saliency and SmoothGrad, are used to provide insights into the models' classification decisions, thereby improving interpretability and understanding of malignant and benign image classification. △ Less

Submitted 14 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: Accepted in 4th International Conference on Computing and Communication Networks (ICCCNet-2024)

arXiv:2405.02937 [pdf, other]

Unraveling the Dominance of Large Language Models Over Transformer Models for Bangla Natural Language Inference: A Comprehensive Study

Authors: Fatema Tuj Johora Faria, Mukaffi Bin Moin, Asif Iftekher Fahim, Pronay Debnath, Faisal Muhammad Shah

Abstract: Natural Language Inference (NLI) is a cornerstone of Natural Language Processing (NLP), providing insights into the entailment relationships between text pairings. It is a critical component of Natural Language Understanding (NLU), demonstrating the ability to extract information from spoken or written interactions. NLI is mainly concerned with determining the entailment relationship between two s… ▽ More Natural Language Inference (NLI) is a cornerstone of Natural Language Processing (NLP), providing insights into the entailment relationships between text pairings. It is a critical component of Natural Language Understanding (NLU), demonstrating the ability to extract information from spoken or written interactions. NLI is mainly concerned with determining the entailment relationship between two statements, known as the premise and hypothesis. When the premise logically implies the hypothesis, the pair is labeled "entailment". If the hypothesis contradicts the premise, the pair receives the "contradiction" label. When there is insufficient evidence to establish a connection, the pair is described as "neutral". Despite the success of Large Language Models (LLMs) in various tasks, their effectiveness in NLI remains constrained by issues like low-resource domain accuracy, model overconfidence, and difficulty in capturing human judgment disagreements. This study addresses the underexplored area of evaluating LLMs in low-resourced languages such as Bengali. Through a comprehensive evaluation, we assess the performance of prominent LLMs and state-of-the-art (SOTA) models in Bengali NLP tasks, focusing on natural language inference. Utilizing the XNLI dataset, we conduct zero-shot and few-shot evaluations, comparing LLMs like GPT-3.5 Turbo and Gemini 1.5 Pro with models such as BanglaBERT, Bangla BERT Base, DistilBERT, mBERT, and sahajBERT. Our findings reveal that while LLMs can achieve comparable or superior performance to fine-tuned SOTA models in few-shot scenarios, further research is necessary to enhance our understanding of LLMs in languages with modest resources like Bengali. This study underscores the importance of continued efforts in exploring LLM capabilities across diverse linguistic contexts. △ Less

Submitted 7 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: Accepted in 4th International Conference on Computing and Communication Networks (ICCCNet-2024)

arXiv:2403.12936 [pdf]

Automatic Information Extraction From Employment Tribunal Judgements Using Large Language Models

Authors: Joana Ribeiro de Faria, Huiyuan Xie, Felix Steffek

Abstract: Court transcripts and judgments are rich repositories of legal knowledge, detailing the intricacies of cases and the rationale behind judicial decisions. The extraction of key information from these documents provides a concise overview of a case, crucial for both legal experts and the public. With the advent of large language models (LLMs), automatic information extraction has become increasingly… ▽ More Court transcripts and judgments are rich repositories of legal knowledge, detailing the intricacies of cases and the rationale behind judicial decisions. The extraction of key information from these documents provides a concise overview of a case, crucial for both legal experts and the public. With the advent of large language models (LLMs), automatic information extraction has become increasingly feasible and efficient. This paper presents a comprehensive study on the application of GPT-4, a large language model, for automatic information extraction from UK Employment Tribunal (UKET) cases. We meticulously evaluated GPT-4's performance in extracting critical information with a manual verification process to ensure the accuracy and relevance of the extracted data. Our research is structured around two primary extraction tasks: the first involves a general extraction of eight key aspects that hold significance for both legal specialists and the general public, including the facts of the case, the claims made, references to legal statutes, references to precedents, general case outcomes and corresponding labels, detailed order and remedies and reasons for the decision. The second task is more focused, aimed at analysing three of those extracted features, namely facts, claims and outcomes, in order to facilitate the development of a tool capable of predicting the outcome of employment law disputes. Through our analysis, we demonstrate that LLMs like GPT-4 can obtain high accuracy in legal information extraction, highlighting the potential of LLMs in revolutionising the way legal information is processed and utilised, offering significant implications for legal research and practice. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2311.11142 [pdf, other]

Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language

Authors: Fatema Tuj Johora Faria, Mukaffi Bin Moin, Ahmed Al Wase, Mehidi Ahmmed, Md. Rabius Sani, Tashreef Muhammad

Abstract: The Bangla linguistic variety is a fascinating mix of regional dialects that adds to the cultural diversity of the Bangla-speaking community. Despite extensive study into translating Bangla to English, English to Bangla, and Banglish to Bangla in the past, there has been a noticeable gap in translating Bangla regional dialects into standard Bangla. In this study, we set out to fill this gap by cre… ▽ More The Bangla linguistic variety is a fascinating mix of regional dialects that adds to the cultural diversity of the Bangla-speaking community. Despite extensive study into translating Bangla to English, English to Bangla, and Banglish to Bangla in the past, there has been a noticeable gap in translating Bangla regional dialects into standard Bangla. In this study, we set out to fill this gap by creating a collection of 32,500 sentences, encompassing Bangla, Banglish, and English, representing five regional Bangla dialects. Our aim is to translate these regional dialects into standard Bangla and detect regions accurately. To achieve this, we proposed models known as mT5 and BanglaT5 for translating regional dialects into standard Bangla. Additionally, we employed mBERT and Bangla-bert-base to determine the specific regions from where these dialects originated. Our experimental results showed the highest BLEU score of 69.06 for Mymensingh regional dialects and the lowest BLEU score of 36.75 for Chittagong regional dialects. We also observed the lowest average word error rate of 0.1548 for Mymensingh regional dialects and the highest of 0.3385 for Chittagong regional dialects. For region detection, we achieved an accuracy of 85.86% for Bangla-bert-base and 84.36% for mBERT. This is the first large-scale investigation of Bangla regional dialects to Bangla machine translation. We believe our findings will not only pave the way for future work on Bangla regional dialects to Bangla machine translation, but will also be useful in solving similar language-related challenges in low-resource language conditions. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2301.03224 [pdf, other]

Case studies of development of verified programs with Dafny for accessibility assessment

Authors: João Pascoal Faria, Rui Abreu

Abstract: Formal verification techniques aim at formally proving the correctness of a computer program with respect to a formal specification, but the expertise and effort required for applying formal specification and verification techniques and scalability issues have limited their practical application. In recent years, the tremendous progress with SAT and SMT solvers enabled the construction of a new ge… ▽ More Formal verification techniques aim at formally proving the correctness of a computer program with respect to a formal specification, but the expertise and effort required for applying formal specification and verification techniques and scalability issues have limited their practical application. In recent years, the tremendous progress with SAT and SMT solvers enabled the construction of a new generation of tools that promise to make formal verification more accessible for software engineers, by automating most if not all of the verification process. The Dafny system is a prominent example of that trend. However, little evidence exists yet about its accessibility. To help fill this gap, we conducted a set of 10 case studies of develo** verified implementations in Dafny of some real-world algorithms and data structures, to determine its accessibility for software engineers. We found that, on average, the amount of code written for specification and verification purposes is of the same order of magnitude as the traditional code written for implementation and testing purposes (ratio of 1.14) -- an ``overhead'' that certainly pays off for high-integrity software. The performance of the Dafny verifier was impressive, with 2.4 proof obligations generated per line of code written, and 24 ms spent per proof obligation generated and verified, on average. However, we also found that the manual work needed in writing auxiliary verification code may be significant and difficult to predict and master. Hence, further automation and systematization of verification tasks are possible directions for future advances in the field. △ Less

Submitted 9 January, 2023; originally announced January 2023.

Comments: Pre-print and extended version, including source code, of our paper accepted in FSEN 2023 - 10th IPM International Conference on Fundamentals of Software Engineering

arXiv:2206.01087 [pdf, other]

Enriching a Fashion Knowledge Graph from Product Textual Descriptions

Authors: João Barroca, Abhishek Shivkumar, Beatriz Quintino Ferreira, Evgeny Sherkhonov, João Faria

Abstract: Knowledge Graphs offer a very useful and powerful structure for representing information, consequently, they have been adopted as the backbone for many applications in e-commerce scenarios. In this paper, we describe an application of existing techniques for enriching thelarge-scale Fashion Knowledge Graph (FKG) that we build at Farfetch. In particular, we apply techniques for named entity recogni… ▽ More Knowledge Graphs offer a very useful and powerful structure for representing information, consequently, they have been adopted as the backbone for many applications in e-commerce scenarios. In this paper, we describe an application of existing techniques for enriching thelarge-scale Fashion Knowledge Graph (FKG) that we build at Farfetch. In particular, we apply techniques for named entity recognition (NER) and entity linking (EL) in order to extract and link rich metadata from product textual descriptions to entities in the FKG. Having a complete and enriched FKG as an e-commerce backbone can have a highly valuable impact on downstream applications such as search and recommendations. However, enriching a Knowledge Graph in the fashion domain has its own challenges. Data representation is different from a more generic KG, like Wikidata and Yago, as entities (e.g. product attributes) are too specific to the domain, and long textual descriptions are not readily available. Data itself is also scarce, as labelling datasets to train supervised models is a very laborious task. Even more, fashion products display a high variability and require an intricate ontology of attributes to link to. We use a transfer learning based approach to train an NER module on a small amount of manually labeled data, followed by an EL module that links the previously identified named entities to the appropriate entities within the FKG. Experiments using a pre-trained model show that it is possible to achieve 89.75% accuracy in NER even with a small manually labeled dataset. Moreover, the EL module, despite relying on simple rule-based or ML models (due to lack of training data), is able to link relevant attributes to products, thus automatically enriching the FKG. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: Presented at the International Workshop on Knowledge Graph Generation from Text (ESWC 2022)

arXiv:2202.05347 [pdf, other]

Development and Validation of an AI-Driven Model for the La Rance Tidal Barrage: A Generalisable Case Study

Authors: Túlio Marcondes Moreira, Jackson Geraldo de Faria Jr, Pedro O. S. Vaz-de-Melo, Gilberto Medeiros-Ribeiro

Abstract: In this work, an AI-Driven (autonomous) model representation of the La Rance tidal barrage was developed using novel parametrisation and Deep Reinforcement Learning (DRL) techniques. Our model results were validated with experimental measurements, yielding the first Tidal Range Structure (TRS) model validated against a constructed tidal barrage and made available to academics. In order to proper m… ▽ More In this work, an AI-Driven (autonomous) model representation of the La Rance tidal barrage was developed using novel parametrisation and Deep Reinforcement Learning (DRL) techniques. Our model results were validated with experimental measurements, yielding the first Tidal Range Structure (TRS) model validated against a constructed tidal barrage and made available to academics. In order to proper model La Rance, parametrisation methodologies were developed for simulating (i) turbines (in pum** and power generation modes), (ii) transition ramp functions (for opening and closing hydraulic structures) and (iii) equivalent lagoon wetted area. Furthermore, an updated DRL method was implemented for optimising the operation of the hydraulic structures that compose La Rance. The achieved objective of this work was to verify the capabilities of an AI-Driven TRS model to appropriately predict (i) turbine power and (ii) lagoon water level variations. In addition, the observed operational strategy and yearly energy output of our AI-Driven model appeared to be comparable with those reported for the La Rance tidal barrage. The outcomes of this work (developed methodologies and DRL implementations) are generalisable and can be applied to other TRS projects. Furthermore, this work provided insights which allow for more realistic simulation of TRS operation, enabled through our AI-Driven model. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Comments: 30 pages, 22 figures and 6 tables

arXiv:2106.10360 [pdf, other]

Prediction-Free, Real-Time Flexible Control of Tidal Lagoons through Proximal Policy Optimisation: A Case Study for the Swansea Lagoon

Authors: Túlio Marcondes Moreira, Jackson Geraldo de Faria Jr, Pedro O. S. Vaz de Melo, Luiz Chaimowicz, Gilberto Medeiros-Ribeiro

Abstract: Tidal Range Structures (TRS) have been considered for large-scale electricity generation for their potential ability to produce reasonably predictable energy without the emission of greenhouse gases. Once the main forcing components for driving the tides have deterministic dynamics, the available energy in a given TRS has been estimated, through analytical and numerical optimisation routines, as a… ▽ More Tidal Range Structures (TRS) have been considered for large-scale electricity generation for their potential ability to produce reasonably predictable energy without the emission of greenhouse gases. Once the main forcing components for driving the tides have deterministic dynamics, the available energy in a given TRS has been estimated, through analytical and numerical optimisation routines, as a mostly predictable event. This constraint imposes state-of-art flexible operation methods to rely on tidal predictions to infer best operational strategies for TRS, with the additional cost of requiring to run optimisation routines for every new tide. In this paper, a Deep Reinforcement Learning approach (Proximal Policy Optimisation through Unity ML-Agents) is introduced to perform automatic operation of TRS. For validation, the performance of the proposed method is compared with six different operation optimisation approaches devised from the literature, utilising the Swansea Bay Tidal Lagoon as a case study. We show that our approach is successful in maximising energy generation through an optimised operational policy of turbines and sluices, yielding competitive results with state-of-art optimisation strategies, with the clear advantages of requiring training once and performing real-time automatic control of TRS with measured ocean data only. △ Less

Submitted 23 January, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

Comments: 35 pages, 13 figures and 11 tables

arXiv:2011.07910 [pdf]

In Search of Outstanding Research Advances: Prototy** the creation of an open dataset of "editorial highlights"

Authors: Alexis-Michel Mugabushaka, Jasmin Sadat, Jorge Costa Dantas Faria

Abstract: A long-standing research question in bibliometrics is how one identifies publications, which represent major advances in their fields, making high impact in there and other areas. In this context, the term "Breakthrough" is often used and commonly used approaches rely on citation links between publications implicitly positing that peers who use or build upon previously published results collective… ▽ More A long-standing research question in bibliometrics is how one identifies publications, which represent major advances in their fields, making high impact in there and other areas. In this context, the term "Breakthrough" is often used and commonly used approaches rely on citation links between publications implicitly positing that peers who use or build upon previously published results collectively inform about their standing in terms of advancing the research frontiers. Here we argue that the "Breakthrough" concept is rooted in the Kuhnian model of scientific revolution which has been both conceptually and empirically challenged. A more fruitful approach is to consider various ways in which authoritative actors in scholarly communication system signal the importance of research results. We bring to discussions different "recognition channels" and pilot the creation of an open dataset of editorial highlights from regular lists of notable research advances. The dataset covers the last ten years and includes: the "discoveries of the year" from Science magazine and La Recherche and weekly editorial highlights from Nature ("research highlights") and Science ("editor's choice"). The final dataset includes 230 entries in the "discoveries of the years" (with over 720 references) and about 9,000 weekly highlights (with over 8,000 references). △ Less

Submitted 16 November, 2020; originally announced November 2020.

arXiv:2004.04616 [pdf, other]

doi 10.1145/3377812.3382142

DCO Analyzer: Local Controllability and Observability Analysis and Enforcement of Distributed Test Scenarios

Authors: Bruno Lima, João Pascoal Faria

Abstract: To ensure interoperability and the correct behavior of heterogeneous distributed systems in key scenarios, it is important to conduct automated integration tests, based on distributed test components (called local testers) that are deployed close to the system components to simulate inputs from the environment and monitor the interactions with the environment and other system components. We say th… ▽ More To ensure interoperability and the correct behavior of heterogeneous distributed systems in key scenarios, it is important to conduct automated integration tests, based on distributed test components (called local testers) that are deployed close to the system components to simulate inputs from the environment and monitor the interactions with the environment and other system components. We say that a distributed test scenario is locally controllable and locally observable if test inputs can be decided locally and conformance errors can be detected locally by the local testers, without the need for exchanging coordination messages between the test components during test execution (which may reduce the responsiveness and fault detection capability of the test harness). DCO Analyzer is the first tool that checks if distributed test scenarios specified by means of UML sequence diagrams exhibit those properties, and automatically determines a minimum number of coordination messages to enforce them. △ Less

Submitted 9 April, 2020; originally announced April 2020.

arXiv:1903.12553 [pdf]

doi 10.13140/RG.2.2.34042.95685

A survey of blockchain frameworks and applications

Authors: Bruno Tavares, Filipe Figueiredo Correia, André Restivo, João Pascoal Faria, Ademar Aguiar

Abstract: The applications of the blockchain technology are still being discov-ered. When a new potential disruptive technology emerges, there is a tendency to try to solve every problem with that technology. However, it is still necessary to determine what approach is the best for each type of application. To find how distributed ledgers solve existing problems, this study looks for blockchain frameworks i… ▽ More The applications of the blockchain technology are still being discov-ered. When a new potential disruptive technology emerges, there is a tendency to try to solve every problem with that technology. However, it is still necessary to determine what approach is the best for each type of application. To find how distributed ledgers solve existing problems, this study looks for blockchain frameworks in the academic world. Identifying the existing frameworks can demonstrate where the interest in the technology exists and where it can be miss-ing. This study encountered several blockchain frameworks in development. However, there are few references to operational needs, testing, and deploy of the technology. With the widespread use of the technology, either integrating with pre-existing solutions, replacing legacy systems, or new implementations, the need for testing, deploying, exploration, and maintenance is expected to in-tensify. △ Less

Submitted 24 March, 2019; originally announced March 2019.

arXiv:1806.09511 [pdf, other]

A Hierarchical Deep Learning Natural Language Parser for Fashion

Authors: José Marcelino, João Faria, Luís Baía, Ricardo Gamelas Sousa

Abstract: This work presents a hierarchical deep learning natural language parser for fashion. Our proposal intends not only to recognize fashion-domain entities but also to expose syntactic and morphologic insights. We leverage the usage of an architecture of specialist models, each one for a different task (from parsing to entity recognition). Such architecture renders a hierarchical model able to capture… ▽ More This work presents a hierarchical deep learning natural language parser for fashion. Our proposal intends not only to recognize fashion-domain entities but also to expose syntactic and morphologic insights. We leverage the usage of an architecture of specialist models, each one for a different task (from parsing to entity recognition). Such architecture renders a hierarchical model able to capture the nuances of the fashion language. The natural language parser is able to deal with textual ambiguities which are left unresolved by our currently existing solution. Our empirical results establish a robust baseline, which justifies the use of hierarchical architectures of deep learning models while opening new research avenues to explore. △ Less

Submitted 25 June, 2018; originally announced June 2018.

Comments: In Proceedings of KDD 2018 (KDD Workshop on AI for Fashion)

arXiv:1806.09445 [pdf, other]

A Unified Model with Structured Output for Fashion Images Classification

Authors: Beatriz Quintino Ferreira, Luís Baía, João Faria, Ricardo Gamelas Sousa

Abstract: A picture is worth a thousand words. Albeit a cliché, for the fashion industry, an image of a clothing piece allows one to perceive its category (e.g., dress), sub-category (e.g., day dress) and properties (e.g., white colour with floral patterns). The seasonal nature of the fashion industry creates a highly dynamic and creative domain with evermore data, making it unpractical to manually describe… ▽ More A picture is worth a thousand words. Albeit a cliché, for the fashion industry, an image of a clothing piece allows one to perceive its category (e.g., dress), sub-category (e.g., day dress) and properties (e.g., white colour with floral patterns). The seasonal nature of the fashion industry creates a highly dynamic and creative domain with evermore data, making it unpractical to manually describe a large set of images (of products). In this paper, we explore the concept of visual recognition for fashion images through an end-to-end architecture embedding the hierarchical nature of the annotations directly into the model. Towards that goal, and inspired by the work of [7], we have modified and adapted the original architecture proposal. Namely, we have removed the message passing layer symmetry to cope with Farfetch category tree, added extra layers for hierarchy level specificity, and moved the message passing layer into an enriched latent space. We compare the proposed unified architecture against state-of-the-art models and demonstrate the performance advantage of our model for structured multi-level categorization on a dataset of about 350k fashion product images. △ Less

Submitted 25 June, 2018; originally announced June 2018.

Comments: Accepted in KDD 2018's AI for Fashion workshop

Showing 1–15 of 15 results for author: Faria, J