Search | arXiv e-print repository

Cross-Attention Fusion of Visual and Geometric Features for Large Vocabulary Arabic Lipreading

Authors: Samar Daou, Ahmed Rekik, Achraf Ben-Hamadou, Abdelaziz Kallel

Abstract: Lipreading involves using visual data to recognize spoken words by analyzing the movements of the lips and surrounding area. It is a hot research topic with many potential applications, such as human-machine interaction and enhancing audio speech recognition. Recent deep-learning based works aim to integrate visual features extracted from the mouth region with landmark points on the lip contours.… ▽ More Lipreading involves using visual data to recognize spoken words by analyzing the movements of the lips and surrounding area. It is a hot research topic with many potential applications, such as human-machine interaction and enhancing audio speech recognition. Recent deep-learning based works aim to integrate visual features extracted from the mouth region with landmark points on the lip contours. However, employing a simple combination method such as concatenation may not be the most effective approach to get the optimal feature vector. To address this challenge, firstly, we propose a cross-attention fusion-based approach for large lexicon Arabic vocabulary to predict spoken words in videos. Our method leverages the power of cross-attention networks to efficiently integrate visual and geometric features computed on the mouth region. Secondly, we introduce the first large-scale Lip Reading in the Wild for Arabic (LRW-AR) dataset containing 20,000 videos for 100-word classes, uttered by 36 speakers. The experimental results obtained on LRW-AR and ArabicVisual databases showed the effectiveness and robustness of the proposed approach in recognizing Arabic words. Our work provides insights into the feasibility and effectiveness of applying lipreading techniques to the Arabic language, opening doors for further research in this field. Link to the project page: https://crns-smartvision.github.io/lrwar △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: Under review, 24 pages

arXiv:2308.16139 [pdf, other]

MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

Authors: Jianning Li, Zongwei Zhou, Jiancheng Yang, Antonio Pepe, Christina Gsaxner, Gijs Luijten, Chongyu Qu, Tiezheng Zhang, Xiaoxi Chen, Wenxuan Li, Marek Wodzinski, Paul Friedrich, Kangxian Xie, Yuan **, Narmada Ambigapathy, Enrico Nasca, Naida Solak, Gian Marco Melito, Viet Duc Vu, Afaque R. Memon, Christopher Schlachta, Sandrine De Ribaupierre, Rajnikant Patel, Roy Eagleson, Xiaojun Chen , et al. (132 additional authors not shown)

Abstract: Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of Shape… ▽ More Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of ShapeNet (about 51,300 models) and Princeton ModelNet (127,915 models). For the medical domain, we present a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D models of surgical instrument, called MedShapeNet, created to facilitate the translation of data-driven vision algorithms to medical applications and to adapt SOTA vision algorithms to medical problems. As a unique feature, we directly model the majority of shapes on the imaging data of real patients. As of today, MedShapeNet includes 23 dataset with more than 100,000 shapes that are paired with annotations (ground truth). Our data is freely accessible via a web interface and a Python application programming interface (API) and can be used for discriminative, reconstructive, and variational benchmarks as well as various applications in virtual, augmented, or mixed reality, and 3D printing. Exemplary, we present use cases in the fields of classification of brain tumors, facial and skull reconstructions, multi-class anatomy completion, education, and 3D printing. In future, we will extend the data and improve the interfaces. The project pages are: https://medshapenet.ikim.nrw/ and https://github.com/Jianningli/medshapenet-feedback △ Less

Submitted 12 December, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: 16 pages

MSC Class: 68T01

arXiv:2307.06162 [pdf, other]

Deep Generative Models for Physiological Signals: A Systematic Literature Review

Authors: Nour Neifar, Afef Mdhaffar, Achraf Ben-Hamadou, Mohamed Jmaiel

Abstract: In this paper, we present a systematic literature review on deep generative models for physiological signals, particularly electrocardiogram, electroencephalogram, photoplethysmogram and electromyogram. Compared to the existing review papers, we present the first review that summarizes the recent state-of-the-art deep generative models. By analysing the state-of-the-art research related to deep ge… ▽ More In this paper, we present a systematic literature review on deep generative models for physiological signals, particularly electrocardiogram, electroencephalogram, photoplethysmogram and electromyogram. Compared to the existing review papers, we present the first review that summarizes the recent state-of-the-art deep generative models. By analysing the state-of-the-art research related to deep generative models along with their main applications and challenges, this review contributes to the overall understanding of these models applied to physiological signals. Additionally, by highlighting the employed evaluation protocol and the most used physiological databases, this review facilitates the assessment and benchmarking of deep generative models. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: paper under review, 34 pages

arXiv:2306.11141 [pdf, other]

Graph Self-Supervised Learning for Endoscopic Image Matching

Authors: Manel Farhat, Achraf Ben-Hamadou

Abstract: Accurate feature matching and correspondence in endoscopic images play a crucial role in various clinical applications, including patient follow-up and rapid anomaly localization through panoramic image generation. However, develo** robust and accurate feature matching techniques faces challenges due to the lack of discriminative texture and significant variability between patients. To address t… ▽ More Accurate feature matching and correspondence in endoscopic images play a crucial role in various clinical applications, including patient follow-up and rapid anomaly localization through panoramic image generation. However, develo** robust and accurate feature matching techniques faces challenges due to the lack of discriminative texture and significant variability between patients. To address these limitations, we propose a novel self-supervised approach that combines Convolutional Neural Networks for capturing local visual appearance and attention-based Graph Neural Networks for modeling spatial relationships between key-points. Our approach is trained in a fully self-supervised scheme without the need for labeled data. Our approach outperforms state-of-the-art handcrafted and deep learning-based methods, demonstrating exceptional performance in terms of precision rate (1) and matching score (99.3%). We also provide code and materials related to this work, which can be accessed at https://github.com/abenhamadou/graph-self-supervised-learning-for-endoscopic-image-matching. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: 35 pages, under review

arXiv:2306.01875 [pdf, other]

DiffECG: A Versatile Probabilistic Diffusion Model for ECG Signals Synthesis

Authors: Nour Neifar, Achraf Ben-Hamadou, Afef Mdhaffar, Mohamed Jmaiel

Abstract: Within cardiovascular disease detection using deep learning applied to ECG signals, the complexities of handling physiological signals have sparked growing interest in leveraging deep generative models for effective data augmentation. In this paper, we introduce a novel versatile approach based on denoising diffusion probabilistic models for ECG synthesis, addressing three scenarios: (i) heartbeat… ▽ More Within cardiovascular disease detection using deep learning applied to ECG signals, the complexities of handling physiological signals have sparked growing interest in leveraging deep generative models for effective data augmentation. In this paper, we introduce a novel versatile approach based on denoising diffusion probabilistic models for ECG synthesis, addressing three scenarios: (i) heartbeat generation, (ii) partial signal imputation, and (iii) full heartbeat forecasting. Our approach presents the first generalized conditional approach for ECG synthesis, and our experimental results demonstrate its effectiveness for various ECG-related tasks. Moreover, we show that our approach outperforms other state-of-the-art ECG generative models and can enhance the performance of state-of-the-art classifiers. △ Less

Submitted 3 May, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: Accepted in IEEE SERA 2024 conference

arXiv:2305.18277 [pdf, other]

3DTeethSeg'22: 3D Teeth Scan Segmentation and Labeling Challenge

Authors: Achraf Ben-Hamadou, Oussama Smaoui, Ahmed Rekik, Sergi Pujades, Edmond Boyer, Hoyeon Lim, Minchang Kim, Minkyung Lee, Minyoung Chung, Yeong-Gil Shin, Mathieu Leclercq, Lucia Cevidanes, Juan Carlos Prieto, Shaojie Zhuang, Guangshun Wei, Zhiming Cui, Yuanfeng Zhou, Tudor Dascalu, Bulat Ibragimov, Tae-Hoon Yong, Hong-Gi Ahn, Wan Kim, Jae-Hwan Han, Byungsun Choi, Niels van Nistelrooij , et al. (7 additional authors not shown)

Abstract: Teeth localization, segmentation, and labeling from intra-oral 3D scans are essential tasks in modern dentistry to enhance dental diagnostics, treatment planning, and population-based studies on oral health. However, develo** automated algorithms for teeth analysis presents significant challenges due to variations in dental anatomy, imaging protocols, and limited availability of publicly accessi… ▽ More Teeth localization, segmentation, and labeling from intra-oral 3D scans are essential tasks in modern dentistry to enhance dental diagnostics, treatment planning, and population-based studies on oral health. However, develo** automated algorithms for teeth analysis presents significant challenges due to variations in dental anatomy, imaging protocols, and limited availability of publicly accessible data. To address these challenges, the 3DTeethSeg'22 challenge was organized in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2022, with a call for algorithms tackling teeth localization, segmentation, and labeling from intraoral 3D scans. A dataset comprising a total of 1800 scans from 900 patients was prepared, and each tooth was individually annotated by a human-machine hybrid algorithm. A total of 6 algorithms were evaluated on this dataset. In this study, we present the evaluation results of the 3DTeethSeg'22 challenge. The 3DTeethSeg'22 challenge code can be accessed at: https://github.com/abenhamadou/3DTeethSeg22_challenge △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 29 pages, MICCAI 2022 Singapore, Satellite Event, Challenge

arXiv:2211.02626 [pdf, other]

Leveraging Statistical Shape Priors in GAN-based ECG Synthesis

Authors: Nour Neifar, Achraf Ben-Hamadou, Afef Mdhaffar, Mohamed Jmaiel, Bernd Freisleben

Abstract: Electrocardiogram (ECG) data collection during emergency situations is challenging, making ECG data generation an efficient solution for dealing with highly imbalanced ECG training datasets. In this paper, we propose a novel approach for ECG signal generation using Generative Adversarial Networks (GANs) and statistical ECG data modeling. Our approach leverages prior knowledge about ECG dynamics to… ▽ More Electrocardiogram (ECG) data collection during emergency situations is challenging, making ECG data generation an efficient solution for dealing with highly imbalanced ECG training datasets. In this paper, we propose a novel approach for ECG signal generation using Generative Adversarial Networks (GANs) and statistical ECG data modeling. Our approach leverages prior knowledge about ECG dynamics to synthesize realistic signals, addressing the complex dynamics of ECG signals. To validate our approach, we conducted experiments using ECG signals from the MIT-BIH arrhythmia database. Our results demonstrate that our approach, which models temporal and amplitude variations of ECG signals as 2-D shapes, generates more realistic signals compared to state-of-the-art GAN based generation baselines. Our proposed approach has significant implications for improving the quality of ECG training datasets, which can ultimately lead to better performance of ECG classification algorithms. This research contributes to the development of more efficient and accurate methods for ECG analysis, which can aid in the diagnosis and treatment of cardiac diseases. △ Less

Submitted 3 June, 2023; v1 submitted 22 October, 2022; originally announced November 2022.

Comments: 6 figures, 31 pages, under review

arXiv:2210.06094 [pdf, ps, other]

Teeth3DS: a benchmark for teeth segmentation and labeling from intra-oral 3D scans

Authors: Achraf Ben-Hamadou, Oussama Smaoui, Houda Chaabouni-Chouayakh, Ahmed Rekik, Sergi Pujades, Edmond Boyer, Julien Strippoli, Aurélien Thollot, Hugo Setbon, Cyril Trosset, Edouard Ladroit

Abstract: Teeth segmentation and labeling are critical components of Computer-Aided Dentistry (CAD) systems. Indeed, before any orthodontic or prosthetic treatment planning, a CAD system needs to first accurately segment and label each instance of teeth visible in the 3D dental scan, this is to avoid time-consuming manual adjustments by the dentist. Nevertheless, develo** such an automated and accurate de… ▽ More Teeth segmentation and labeling are critical components of Computer-Aided Dentistry (CAD) systems. Indeed, before any orthodontic or prosthetic treatment planning, a CAD system needs to first accurately segment and label each instance of teeth visible in the 3D dental scan, this is to avoid time-consuming manual adjustments by the dentist. Nevertheless, develo** such an automated and accurate dental segmentation and labeling tool is very challenging, especially given the lack of publicly available datasets or benchmarks. This article introduces the first public benchmark, named Teeth3DS, which has been created in the frame of the 3DTeethSeg 2022 MICCAI challenge to boost the research field and inspire the 3D vision research community to work on intra-oral 3D scans analysis such as teeth identification, segmentation, labeling, 3D modeling and 3D reconstruction. Teeth3DS is made of 1800 intra-oral scans (23999 annotated teeth) collected from 900 patients covering the upper and lower jaws separately, acquired and validated by orthodontists/dental surgeons with more than 5 years of professional experience. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: 8 pages, 5 figures, 1 tables

arXiv:2208.11424 [pdf, other]

Self-Supervised Endoscopic Image Key-Points Matching

Authors: Manel Farhat, Houda Chaabouni-Chouayakh, Achraf Ben-Hamadou

Abstract: Feature matching and finding correspondences between endoscopic images is a key step in many clinical applications such as patient follow-up and generation of panoramic image from clinical sequences for fast anomalies localization. Nonetheless, due to the high texture variability present in endoscopic images, the development of robust and accurate feature matching becomes a challenging task. Recen… ▽ More Feature matching and finding correspondences between endoscopic images is a key step in many clinical applications such as patient follow-up and generation of panoramic image from clinical sequences for fast anomalies localization. Nonetheless, due to the high texture variability present in endoscopic images, the development of robust and accurate feature matching becomes a challenging task. Recently, deep learning techniques which deliver learned features extracted via convolutional neural networks (CNNs) have gained traction in a wide range of computer vision tasks. However, they all follow a supervised learning scheme where a large amount of annotated data is required to reach good performances, which is generally not always available for medical data databases. To overcome this limitation related to labeled data scarcity, the self-supervised learning paradigm has recently shown great success in a number of applications. This paper proposes a novel self-supervised approach for endoscopic image matching based on deep learning techniques. When compared to standard hand-crafted local feature descriptors, our method outperformed them in terms of precision and recall. Furthermore, our self-supervised descriptor provides a competitive performance in comparison to a selection of state-of-the-art deep learning based supervised methods in terms of precision and matching score. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: 35 pages

arXiv:1812.06829 [pdf, ps, other]

Discriminant Patch Representation for RGB-D Face Recognition Using Convolutional Neural Networks

Authors: Nesrine Grati, Achraf Ben-Hamadou, Mohamed Hammami

Abstract: This paper focuses on designing data-driven models to learn a discriminant representation space for face recognition using RGB-D data. Unlike hand-crafted representations, learned models can extract and organize the discriminant information from the data, and can automatically adapt to build new compute vision applications faster. We proposed an effective way to train Convolutional Neural Networks… ▽ More This paper focuses on designing data-driven models to learn a discriminant representation space for face recognition using RGB-D data. Unlike hand-crafted representations, learned models can extract and organize the discriminant information from the data, and can automatically adapt to build new compute vision applications faster. We proposed an effective way to train Convolutional Neural Networks to learn face patch discriminant features. The proposed solution was tested and validated on state-of-the-art RGB-D datasets and showed competitive and promising results relatively to standard hand-crafted feature extractors. △ Less

Submitted 17 December, 2018; originally announced December 2018.

Comments: Preprint accepted for publication in Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - 2019

arXiv:1607.04773 [pdf, other]

doi 10.1007/s13319-016-0095-6

Construction of extended 3D field of views of the internal bladder wall surface: a proof of concept

Authors: Achraf Ben-Hamadou, Christian Daul, Charles Soussen

Abstract: 3D extended field of views (FOVs) of the internal bladder wall facilitate lesion diagnosis, patient follow-up and treatment traceability. In this paper, we propose a 3D image mosaicing algorithm guided by 2D cystoscopic video-image registration for obtaining textured FOV mosaics. In this feasibility study, the registration makes use of data from a 3D cystoscope prototype providing, in addition to… ▽ More 3D extended field of views (FOVs) of the internal bladder wall facilitate lesion diagnosis, patient follow-up and treatment traceability. In this paper, we propose a 3D image mosaicing algorithm guided by 2D cystoscopic video-image registration for obtaining textured FOV mosaics. In this feasibility study, the registration makes use of data from a 3D cystoscope prototype providing, in addition to each small FOV image, some 3D points located on the surface. This proof of concept shows that textured surfaces can be constructed with minimally modified cystoscopes. The potential of the method is demonstrated on numerical and real phantoms reproducing various surface shapes. Pig and human bladder textures are superimposed on phantoms with known shape and dimensions. These data allow for quantitative assessment of the 3D mosaicing algorithm based on the registration of images simulating bladder textures. △ Less

Submitted 16 July, 2016; originally announced July 2016.

Comments: 23 pages, Springer 3D Research, 2016

arXiv:1504.07901 [pdf, ps, other]

doi 10.1117/12.831772

Comparative study of image registration techniques for bladder video-endoscopy

Authors: Achraf Ben-Hamadou, Charles Soussen, Walter Blondel, Christian Daul, Didier Wolf

Abstract: Bladder cancer is widely spread in the world. Many adequate diagnosis techniques exist. Video-endoscopy remains the standard clinical procedure for visual exploration of the bladder internal surface. However, video-endoscopy presents the limit that the imaged area for each image is about nearly 1cm2. And, lesions are, typically, spread over several images. The aim of this contribution is to assess… ▽ More Bladder cancer is widely spread in the world. Many adequate diagnosis techniques exist. Video-endoscopy remains the standard clinical procedure for visual exploration of the bladder internal surface. However, video-endoscopy presents the limit that the imaged area for each image is about nearly 1cm2. And, lesions are, typically, spread over several images. The aim of this contribution is to assess the performance of two mosaicing algorithms leading to the construction of panoramic maps (one unique image) of bladder walls. The quantitative comparison study is performed on a set of real endoscopic exam data and on simulated data relative to bladder phantom. △ Less

Submitted 29 April, 2015; originally announced April 2015.

Comments: 7 pages, 5 figures

Journal ref: Novel Optical Instrumentation for Biomedical Applications, 737118 (10 July 2009)

Showing 1–12 of 12 results for author: Ben-Hamadou, A