Search | arXiv e-print repository

Exploring the Limitations of Detecting Machine-Generated Text

Authors: Jad Doughman, Osama Mohammed Afzal, Hawau Olamide Toyin, Shady Shehata, Preslav Nakov, Zeerak Talat

Abstract: Recent improvements in the quality of the generations by large language models have spurred research into identifying machine-generated text. Systems proposed for the task often achieve high performance. However, humans and machines can produce text in different styles and in different domains, and it remains unclear whether machine generated-text detection models favour particular styles or domai… ▽ More Recent improvements in the quality of the generations by large language models have spurred research into identifying machine-generated text. Systems proposed for the task often achieve high performance. However, humans and machines can produce text in different styles and in different domains, and it remains unclear whether machine generated-text detection models favour particular styles or domains. In this paper, we critically examine the classification performance for detecting machine-generated text by evaluating on texts with varying writing styles. We find that classifiers are highly sensitive to stylistic changes and differences in text complexity, and in some cases degrade entirely to random classifiers. We further find that detection systems are particularly susceptible to misclassify easy-to-read texts while they have high performance for complex texts. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.09933 [pdf, other]

What Does it Take to Generalize SER Model Across Datasets? A Comprehensive Benchmark

Authors: Adham Ibrahim, Shady Shehata, A**kya Kulkarni, Mukhtar Mohamed, Muhammad Abdul-Mageed

Abstract: Speech emotion recognition (SER) is essential for enhancing human-computer interaction in speech-based applications. Despite improvements in specific emotional datasets, there is still a research gap in SER's capability to generalize across real-world situations. In this paper, we investigate approaches to generalize the SER system across different emotion datasets. In particular, we incorporate 1… ▽ More Speech emotion recognition (SER) is essential for enhancing human-computer interaction in speech-based applications. Despite improvements in specific emotional datasets, there is still a research gap in SER's capability to generalize across real-world situations. In this paper, we investigate approaches to generalize the SER system across different emotion datasets. In particular, we incorporate 11 emotional speech datasets and illustrate a comprehensive benchmark on the SER task. We also address the challenge of imbalanced data distribution using over-sampling methods when combining SER datasets for training. Furthermore, we explore various evaluation protocols for adeptness in the generalization of SER. Building on this, we explore the potential of Whisper for SER, emphasizing the importance of thorough evaluation. Our approach is designed to advance SER technology by integrating speaker-independent methods. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: ACCEPTED AT INTERSPEECH 2024, GREECE

arXiv:2405.19525 [pdf, other]

Lifelong Learning Using a Dynamically Growing Tree of Sub-networks for Domain Generalization in Video Object Segmentation

Authors: Islam Osman, Mohamed S. Shehata

Abstract: Current state-of-the-art video object segmentation models have achieved great success using supervised learning with massive labeled training datasets. However, these models are trained using a single source domain and evaluated using videos sampled from the same source domain. When these models are evaluated using videos sampled from a different target domain, their performance degrades significa… ▽ More Current state-of-the-art video object segmentation models have achieved great success using supervised learning with massive labeled training datasets. However, these models are trained using a single source domain and evaluated using videos sampled from the same source domain. When these models are evaluated using videos sampled from a different target domain, their performance degrades significantly due to poor domain generalization, i.e., their inability to learn from multi-domain sources simultaneously using traditional supervised learning. In this paper, We propose a dynamically growing tree of sub-networks (DGT) to learn effectively from multi-domain sources. DGT uses a novel lifelong learning technique that allows the model to continuously and effectively learn from new domains without forgetting the previously learned domains. Hence, the model can generalize to out-of-domain videos. The proposed work is evaluated using single-source in-domain (traditional video object segmentation), multi-source in-domain, and multi-source out-of-domain video object segmentation. The results of DGT show a single source in-domain performance gain of 0.2% and 3.5% on the DAVIS16 and DAVIS17 datasets, respectively. However, when DGT is evaluated using in-domain multi-sources, the results show superior performance compared to state-of-the-art video object segmentation and other lifelong learning techniques with an average performance increase in the F-score of 6.9% with minimal catastrophic forgetting. Finally, in the out-of-domain experiment, the performance of DGT is 2.7% and 4% better than state-of-the-art in 1 and 5-shots, respectively. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.10444 [pdf, other]

A Novel Bounding Box Regression Method for Single Object Tracking

Authors: Omar Abdelaziz, Mohamed Sami Shehata

Abstract: Locating an object in a sequence of frames, given its appearance in the first frame of the sequence, is a hard problem that involves many stages. Usually, state-of-the-art methods focus on bringing novel ideas in the visual encoding or relational modelling phases. However, in this work, we show that bounding box regression from learned joint search and template features is of high importance as we… ▽ More Locating an object in a sequence of frames, given its appearance in the first frame of the sequence, is a hard problem that involves many stages. Usually, state-of-the-art methods focus on bringing novel ideas in the visual encoding or relational modelling phases. However, in this work, we show that bounding box regression from learned joint search and template features is of high importance as well. While previous methods relied heavily on well-learned features representing interactions between search and template, we hypothesize that the receptive field of the input convolutional bounding box network plays an important role in accurately determining the object location. To this end, we introduce two novel bounding box regression networks: inception and deformable. Experiments and ablation studies show that our inception module installed on the recent ODTrack outperforms the latter on three benchmarks: the GOT-10k, the UAV123 and the OTB2015. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2404.18473 [pdf]

On Rings of MAL'CEV-NEUMANN Series

Authors: Mohammad. H. Fahmy, Refaat. M. Salem, Shaimaa. Sh. Shehata

Abstract: In this paper, we investigate the conditions for the Mal'cev-Neumann series ring Λ = R((G;σ;τ)) to be left fusible and an SA-ring. Also, we show that: if G is a quasitotally ordered group and U a Σ-compatible semiprime ideal of R, then R((G;σ;τ)) is a Σ(U((G; σ; τ)))-zip ring if and only if R is a Σ(U )-zip ring. In this paper, we investigate the conditions for the Mal'cev-Neumann series ring Λ = R((G;σ;τ)) to be left fusible and an SA-ring. Also, we show that: if G is a quasitotally ordered group and U a Σ-compatible semiprime ideal of R, then R((G;σ;τ)) is a Σ(U((G; σ; τ)))-zip ring if and only if R is a Σ(U )-zip ring. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2402.12840 [pdf, other]

ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic

Authors: Fajri Koto, Haonan Li, Sara Shatnawi, Jad Doughman, Abdelrahman Boda Sadallah, Aisha Alraeesi, Khalid Almubarak, Zaid Alyafeai, Neha Sengupta, Shady Shehata, Nizar Habash, Preslav Nakov, Timothy Baldwin

Abstract: The focus of language model evaluation has transitioned towards reasoning and knowledge-intensive tasks, driven by advancements in pretraining large models. While state-of-the-art models are partially trained on large Arabic texts, evaluating their performance in Arabic remains challenging due to the limited availability of relevant datasets. To bridge this gap, we present ArabicMMLU, the first mu… ▽ More The focus of language model evaluation has transitioned towards reasoning and knowledge-intensive tasks, driven by advancements in pretraining large models. While state-of-the-art models are partially trained on large Arabic texts, evaluating their performance in Arabic remains challenging due to the limited availability of relevant datasets. To bridge this gap, we present ArabicMMLU, the first multi-task language understanding benchmark for Arabic language, sourced from school exams across diverse educational levels in different countries spanning North Africa, the Levant, and the Gulf regions. Our data comprises 40 tasks and 14,575 multiple-choice questions in Modern Standard Arabic (MSA), and is carefully constructed by collaborating with native speakers in the region. Our comprehensive evaluations of 35 models reveal substantial room for improvement, particularly among the best open-source models. Notably, BLOOMZ, mT0, LLama2, and Falcon struggle to achieve a score of 50%, while even the top-performing Arabic-centric model only achieves a score of 62.3%. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2401.13802 [pdf, other]

Investigating the Efficacy of Large Language Models for Code Clone Detection

Authors: Mohamad Khajezade, Jie JW Wu, Fatemeh Hendijani Fard, Gema Rodríguez-Pérez, Mohamed Sami Shehata

Abstract: Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks, such as code generation. The LLMs are mainly utilized in the prompt-based zero/few-shot paradigm to guide the model in accomplishing the task. GPT-based models are one of the popular ones studied for tasks such as code comment generation or test generation. These… ▽ More Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks, such as code generation. The LLMs are mainly utilized in the prompt-based zero/few-shot paradigm to guide the model in accomplishing the task. GPT-based models are one of the popular ones studied for tasks such as code comment generation or test generation. These tasks are `generative' tasks. However, there is limited research on the usage of LLMs for `non-generative' tasks such as classification using the prompt-based paradigm. In this preliminary exploratory study, we investigated the applicability of LLMs for Code Clone Detection (CCD), a non-generative task. By building a mono-lingual and cross-lingual CCD dataset derived from CodeNet, we first investigated two different prompts using ChatGPT to detect Type-4 code clones in Java-Java and Java-Ruby pairs in a zero-shot setting. We then conducted an analysis to understand the strengths and weaknesses of ChatGPT in CCD. ChatGPT surpasses the baselines in cross-language CCD attaining an F1-score of 0.877 and achieves comparable performance to fully fine-tuned models for mono-lingual CCD, with an F1-score of 0.878. Also, the prompt and the difficulty level of the problems has an impact on the performance of ChatGPT. Finally we provide insights and future directions based on our initial analysis △ Less

Submitted 30 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

arXiv:2309.11627 [pdf, other]

GenLayNeRF: Generalizable Layered Representations with 3D Model Alignment for Multi-Human View Synthesis

Authors: Youssef Abdelkareem, Shady Shehata, Fakhri Karray

Abstract: Novel view synthesis (NVS) of multi-human scenes imposes challenges due to the complex inter-human occlusions. Layered representations handle the complexities by dividing the scene into multi-layered radiance fields, however, they are mainly constrained to per-scene optimization making them inefficient. Generalizable human view synthesis methods combine the pre-fitted 3D human meshes with image fe… ▽ More Novel view synthesis (NVS) of multi-human scenes imposes challenges due to the complex inter-human occlusions. Layered representations handle the complexities by dividing the scene into multi-layered radiance fields, however, they are mainly constrained to per-scene optimization making them inefficient. Generalizable human view synthesis methods combine the pre-fitted 3D human meshes with image features to reach generalization, yet they are mainly designed to operate on single-human scenes. Another drawback is the reliance on multi-step optimization techniques for parametric pre-fitting of the 3D body models that suffer from misalignment with the images in sparse view settings causing hallucinations in synthesized views. In this work, we propose, GenLayNeRF, a generalizable layered scene representation for free-viewpoint rendering of multiple human subjects which requires no per-scene optimization and very sparse views as input. We divide the scene into multi-human layers anchored by the 3D body meshes. We then ensure pixel-level alignment of the body models with the input views through a novel end-to-end trainable module that carries out iterative parametric correction coupled with multi-view feature fusion to produce aligned 3D models. For NVS, we extract point-wise image-aligned and human-anchored features which are correlated and fused using self-attention and cross-attention modules. We augment low-level RGB values into the features with an attention-based RGB fusion module. To evaluate our approach, we construct two multi-human view synthesis datasets; DeepMultiSyn and ZJU-MultiHuman. The results indicate that our proposed approach outperforms generalizable and non-human per-scene NeRF methods while performing at par with layered per-scene methods without test time optimization. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: Accepted to GCPR 2023

arXiv:2309.09510 [pdf, ps, other]

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

Authors: Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee

Abstract: Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions. However, existing studies in speech processing primarily focus on limited or specific tasks. Moreover, the lack of standardized benchmarks hinders a fair comparison across different approaches. Thus, we present Dynamic-SUPERB, a benchmark designed for bui… ▽ More Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions. However, existing studies in speech processing primarily focus on limited or specific tasks. Moreover, the lack of standardized benchmarks hinders a fair comparison across different approaches. Thus, we present Dynamic-SUPERB, a benchmark designed for building universal speech models capable of leveraging instruction tuning to perform multiple tasks in a zero-shot fashion. To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark. To initiate, Dynamic-SUPERB features 55 evaluation instances by combining 33 tasks and 22 datasets. This spans a broad spectrum of dimensions, providing a comprehensive platform for evaluation. Additionally, we propose several approaches to establish benchmark baselines. These include the utilization of speech models, text language models, and the multimodal encoder. Evaluation results indicate that while these baselines perform reasonably on seen tasks, they struggle with unseen ones. We release all materials to the public and welcome researchers to collaborate on the project, advancing technologies in the field together. △ Less

Submitted 22 March, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: To appear in the proceedings of ICASSP 2024

arXiv:2306.04368 [pdf, other]

Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation

Authors: Massa Baali, Ibrahim Almakky, Shady Shehata, Fakhri Karray

Abstract: Despite major advancements in Automatic Speech Recognition (ASR), the state-of-the-art ASR systems struggle to deal with impaired speech even with high-resource languages. In Arabic, this challenge gets amplified, with added complexities in collecting data from dysarthric speakers. In this paper, we aim to improve the performance of Arabic dysarthric automatic speech recognition through a multi-st… ▽ More Despite major advancements in Automatic Speech Recognition (ASR), the state-of-the-art ASR systems struggle to deal with impaired speech even with high-resource languages. In Arabic, this challenge gets amplified, with added complexities in collecting data from dysarthric speakers. In this paper, we aim to improve the performance of Arabic dysarthric automatic speech recognition through a multi-stage augmentation approach. To this effect, we first propose a signal-based approach to generate dysarthric Arabic speech from healthy Arabic speech by modifying its speed and tempo. We also propose a second stage Parallel Wave Generative (PWG) adversarial model that is trained on an English dysarthric dataset to capture language-independant dysarthric speech patterns and further augment the signal-adjusted speech samples. Furthermore, we propose a fine-tuning and text-correction strategies for Arabic Conformer at different dysarthric speech severity levels. Our fine-tuned Conformer achieved 18% Word Error Rate (WER) and 17.2% Character Error Rate (CER) on synthetically generated dysarthric speech from the Arabic commonvoice speech dataset. This shows significant WER improvement of 81.8% compared to the baseline model trained solely on healthy data. We perform further validation on real English dysarthric speech showing a WER improvement of 124% compared to the baseline trained only on healthy English LJSpeech dataset. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: Accepted to Interspeech 2023

arXiv:2305.14534 [pdf, other]

Detecting Propaganda Techniques in Code-Switched Social Media Text

Authors: Muhammad Umar Salman, Asif Hanif, Shady Shehata, Preslav Nakov

Abstract: Propaganda is a form of communication intended to influence the opinions and the mindset of the public to promote a particular agenda. With the rise of social media, propaganda has spread rapidly, leading to the need for automatic propaganda detection systems. Most work on propaganda detection has focused on high-resource languages, such as English, and little effort has been made to detect propag… ▽ More Propaganda is a form of communication intended to influence the opinions and the mindset of the public to promote a particular agenda. With the rise of social media, propaganda has spread rapidly, leading to the need for automatic propaganda detection systems. Most work on propaganda detection has focused on high-resource languages, such as English, and little effort has been made to detect propaganda for low-resource languages. Yet, it is common to find a mix of multiple languages in social media communication, a phenomenon known as code-switching. Code-switching combines different languages within the same text, which poses a challenge for automatic systems. With this in mind, here we propose the novel task of detecting propaganda techniques in code-switched text. To support this task, we create a corpus of 1,030 texts code-switching between English and Roman Urdu, annotated with 20 propaganda techniques, which we make publicly available. We perform a number of experiments contrasting different experimental setups, and we find that it is important to model the multilinguality directly (rather than using translation) as well as to use the right fine-tuning strategy. The code and the dataset are publicly available at https://github.com/mbzuai-nlp/propaganda-codeswitched-text △ Less

Submitted 15 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2303.01736 [pdf]

Multi-Plane Neural Radiance Fields for Novel View Synthesis

Authors: Youssef Abdelkareem, Shady Shehata, Fakhri Karray

Abstract: Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints. Volumetric approaches provide a solution for modeling occlusions through the explicit 3D representation of the camera frustum. Multi-plane Images (MPI) are volumetric methods that represent the scene using front-parallel planes at distinct depths but suffer from depth discr… ▽ More Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints. Volumetric approaches provide a solution for modeling occlusions through the explicit 3D representation of the camera frustum. Multi-plane Images (MPI) are volumetric methods that represent the scene using front-parallel planes at distinct depths but suffer from depth discretization leading to a 2.D scene representation. Another line of approach relies on implicit 3D scene representations. Neural Radiance Fields (NeRF) utilize neural networks for encapsulating the continuous 3D scene structure within the network weights achieving photorealistic synthesis results, however, methods are constrained to per-scene optimization settings which are inefficient in practice. Multi-plane Neural Radiance Fields (MINE) open the door for combining implicit and explicit scene representations. It enables continuous 3D scene representations, especially in the depth dimension, while utilizing the input image features to avoid per-scene optimization. The main drawback of the current literature work in this domain is being constrained to single-view input, limiting the synthesis ability to narrow viewpoint ranges. In this work, we thoroughly examine the performance, generalization, and efficiency of single-view multi-plane neural radiance fields. In addition, we propose a new multiplane NeRF architecture that accepts multiple views to improve the synthesis results and expand the viewing range. Features from the input source frames are effectively fused through a proposed attention-aware fusion module to highlight important information from different viewpoints. Experiments show the effectiveness of attention-based fusion and the promising outcomes of our proposed method when compared to multi-view NeRF and MPI techniques. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: ICDIPV 2023

arXiv:2204.07501 [pdf, other]

Evaluating few shot and Contrastive learning Methods for Code Clone Detection

Authors: Mohamad Khajezade, Fatemeh Hendijani Fard, Mohamed S. Shehata

Abstract: Context: Code Clone Detection (CCD) is a software engineering task that is used for plagiarism detection, code search, and code comprehension. Recently, deep learning-based models have achieved an F1 score (a metric used to assess classifiers) of $\sim$95\% on the CodeXGLUE benchmark. These models require many training data, mainly fine-tuned on Java or C++ datasets. However, no previous study eva… ▽ More Context: Code Clone Detection (CCD) is a software engineering task that is used for plagiarism detection, code search, and code comprehension. Recently, deep learning-based models have achieved an F1 score (a metric used to assess classifiers) of $\sim$95\% on the CodeXGLUE benchmark. These models require many training data, mainly fine-tuned on Java or C++ datasets. However, no previous study evaluates the generalizability of these models where a limited amount of annotated data is available. Objective: The main objective of this research is to assess the ability of the CCD models as well as few shot learning algorithms for unseen programming problems and new languages (i.e., the model is not trained on these problems/languages). Method: We assess the generalizability of the state of the art models for CCD in few shot settings (i.e., only a few samples are available for fine-tuning) by setting three scenarios: i) unseen problems, ii) unseen languages, iii) combination of new languages and new problems. We choose three datasets of BigCloneBench, POJ-104, and CodeNet and Java, C++, and Ruby languages. Then, we employ Model Agnostic Meta-learning (MAML), where the model learns a meta-learner capable of extracting transferable knowledge from the train set; so that the model can be fine-tuned using a few samples. Finally, we combine contrastive learning with MAML to further study whether it can improve the results of MAML. △ Less

Submitted 9 November, 2023; v1 submitted 15 April, 2022; originally announced April 2022.

arXiv:2107.13093 [pdf, other]

doi 10.1038/s41598-022-06718-2

Automated Human Cell Classification in Sparse Datasets using Few-Shot Learning

Authors: Reece Walsh, Mohamed H. Abdelpakey, Mohamed S. Shehata, Mostafa M. Mohamed

Abstract: Classifying and analyzing human cells is a lengthy procedure, often involving a trained professional. In an attempt to expedite this process, an active area of research involves automating cell classification through use of deep learning-based techniques. In practice, a large amount of data is required to accurately train these deep learning models. However, due to the sparse human cell datasets c… ▽ More Classifying and analyzing human cells is a lengthy procedure, often involving a trained professional. In an attempt to expedite this process, an active area of research involves automating cell classification through use of deep learning-based techniques. In practice, a large amount of data is required to accurately train these deep learning models. However, due to the sparse human cell datasets currently available, the performance of these models is typically low. This study investigates the feasibility of using few-shot learning-based techniques to mitigate the data requirements for accurate training. The study is comprised of three parts: First, current state-of-the-art few-shot learning techniques are evaluated on human cell classification. The selected techniques are trained on a non-medical dataset and then tested on two out-of-domain, human cell datasets. The results indicate that, overall, the test accuracy of state-of-the-art techniques decreased by at least 30% when transitioning from a non-medical dataset to a medical dataset. Second, this study evaluates the potential benefits, if any, to varying the backbone architecture and training schemes in current state-of-the-art few-shot learning techniques when used in human cell classification. Even with these variations, the overall test accuracy decreased from 88.66% on non-medical datasets to 44.13% at best on the medical datasets. Third, this study presents future directions for using few-shot learning in human cell classification. In general, few-shot learning in its current state performs poorly on human cell classification. The study proves that attempts to modify existing network architectures are not effective and concludes that future research effort should be focused on improving robustness towards out-of-domain testing using optimization-based or self-supervised few-shot learning techniques. △ Less

Submitted 11 March, 2022; v1 submitted 27 July, 2021; originally announced July 2021.

Comments: 12 pages, 4 figures

Journal ref: Scientific Reports 12.1 (2022): 1-11

arXiv:2107.03924 [pdf]

Smart Healthcare in the Age of AI: Recent Advances, Challenges, and Future Prospects

Authors: Mahmoud Nasr, MD. Milon Islam, Shady Shehata, Fakhri Karray, Yuri Quintana

Abstract: The significant increase in the number of individuals with chronic ailments (including the elderly and disabled) has dictated an urgent need for an innovative model for healthcare systems. The evolved model will be more personalized and less reliant on traditional brick-and-mortar healthcare institutions such as hospitals, nursing homes, and long-term healthcare centers. The smart healthcare syste… ▽ More The significant increase in the number of individuals with chronic ailments (including the elderly and disabled) has dictated an urgent need for an innovative model for healthcare systems. The evolved model will be more personalized and less reliant on traditional brick-and-mortar healthcare institutions such as hospitals, nursing homes, and long-term healthcare centers. The smart healthcare system is a topic of recently growing interest and has become increasingly required due to major developments in modern technologies, especially in artificial intelligence (AI) and machine learning (ML). This paper is aimed to discuss the current state-of-the-art smart healthcare systems highlighting major areas like wearable and smartphone devices for health monitoring, machine learning for disease diagnosis, and the assistive frameworks, including social robots developed for the ambient assisted living environment. Additionally, the paper demonstrates software integration architectures that are very significant to create smart healthcare systems, integrating seamlessly the benefit of data analytics and other tools of AI. The explained developed systems focus on several facets: the contribution of each developed framework, the detailed working procedure, the performance as outcomes, and the comparative merits and limitations. The current research challenges with potential future directions are addressed to highlight the drawbacks of existing systems and the possible methods to introduce novel frameworks, respectively. This review aims at providing comprehensive insights into the recent developments of smart healthcare systems to equip experts to contribute to the field. △ Less

Submitted 24 June, 2021; originally announced July 2021.

arXiv:2005.08669 [pdf]

Translating the Concept of Goal Setting into Practice -- What 'Else' does it Require than a Goal Setting Tool?

Authors: Gábor Kismihók, Catherine Zhao, Michaéla C. Schippers, Stefan T. Mol, Scott Harrison, Shady Shehata

Abstract: This conceptual paper reviews the current status of goal setting in the area of technology enhanced learning and education. Besides a brief literature review, three current projects on goal setting are discussed. The paper shows that the main barriers for goal setting applications in education are not related to the technology, the available data or analytical methods, but rather the human factor.… ▽ More This conceptual paper reviews the current status of goal setting in the area of technology enhanced learning and education. Besides a brief literature review, three current projects on goal setting are discussed. The paper shows that the main barriers for goal setting applications in education are not related to the technology, the available data or analytical methods, but rather the human factor. The most important bottlenecks are the lack of students goal setting skills and abilities, and the current curriculum design, which, especially in the observed higher education institutions, provides little support for goal setting interventions. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Comments: This paper has been accepted to be published in the proceedings of CSEDU 2020 by SciTePress

arXiv:2004.12058 [pdf, other]

NullSpaceNet: Nullspace Convoluional Neural Network with Differentiable Loss Function

Authors: Mohamed H. Abdelpakey, Mohamed S. Shehata

Abstract: We propose NullSpaceNet, a novel network that maps from the pixel level input to a joint-nullspace (as opposed to the traditional feature space), where the newly learned joint-nullspace features have clearer interpretation and are more separable. NullSpaceNet ensures that all inputs from the same class are collapsed into one point in this new joint-nullspace, and the different classes are collapse… ▽ More We propose NullSpaceNet, a novel network that maps from the pixel level input to a joint-nullspace (as opposed to the traditional feature space), where the newly learned joint-nullspace features have clearer interpretation and are more separable. NullSpaceNet ensures that all inputs from the same class are collapsed into one point in this new joint-nullspace, and the different classes are collapsed into different points with high separation margins. Moreover, a novel differentiable loss function is proposed that has a closed-form solution with no free-parameters. NullSpaceNet exhibits superior performance when tested against VGG16 with fully-connected layer over 4 different datasets, with accuracy gain of up to 4.55%, a reduction in learnable parameters from 135M to 19M, and reduction in inference time of 99% in favor of NullSpaceNet. This means that NullSpaceNet needs less than 1% of the time it takes a traditional CNN to classify a batch of images with better accuracy. △ Less

Submitted 25 April, 2020; originally announced April 2020.

Comments: 17 pages

arXiv:1908.07905 [pdf, other]

DomainSiam: Domain-Aware Siamese Network for Visual Object Tracking

Authors: Mohamed H. Abdelpakey, Mohamed S. Shehata

Abstract: Visual object tracking is a fundamental task in the field of computer vision. Recently, Siamese trackers have achieved state-of-the-art performance on recent benchmarks. However, Siamese trackers do not fully utilize semantic and objectness information from pre-trained networks that have been trained on the image classification task. Furthermore, the pre-trained Siamese architecture is sparsely ac… ▽ More Visual object tracking is a fundamental task in the field of computer vision. Recently, Siamese trackers have achieved state-of-the-art performance on recent benchmarks. However, Siamese trackers do not fully utilize semantic and objectness information from pre-trained networks that have been trained on the image classification task. Furthermore, the pre-trained Siamese architecture is sparsely activated by the category label which leads to unnecessary calculations and overfitting. In this paper, we propose to learn a Domain-Aware, that is fully utilizing semantic and objectness information while producing a class-agnostic using a ridge regression network. Moreover, to reduce the sparsity problem, we solve the ridge regression problem with a differentiable weighted-dynamic loss function. Our tracker, dubbed DomainSiam, improves the feature learning in the training phase and generalization capability to other domains. Extensive experiments are performed on five tracking benchmarks including OTB2013 and OTB2015 for a validation set; as well as the VOT2017, VOT2018, LaSOT, TrackingNet, and GOT10k for a testing set. DomainSiam achieves state-of-the-art performance on these benchmarks while running at 53 FPS. △ Less

Submitted 21 August, 2019; originally announced August 2019.

Comments: 13 pages

Journal ref: 14th International Symposium on Visual Computing (ISVC2019)

arXiv:1809.02714 [pdf, other]

DensSiam: End-to-End Densely-Siamese Network with Self-Attention Model for Object Tracking

Authors: Mohamed H. Abdelpakey, Mohamed S. Shehata, Mostafa M. Mohamed

Abstract: Convolutional Siamese neural networks have been recently used to track objects using deep features. Siamese architecture can achieve real time speed, however it is still difficult to find a Siamese architecture that maintains the generalization capability, high accuracy and speed while decreasing the number of shared parameters especially when it is very deep. Furthermore, a conventional Siamese a… ▽ More Convolutional Siamese neural networks have been recently used to track objects using deep features. Siamese architecture can achieve real time speed, however it is still difficult to find a Siamese architecture that maintains the generalization capability, high accuracy and speed while decreasing the number of shared parameters especially when it is very deep. Furthermore, a conventional Siamese architecture usually processes one local neighborhood at a time, which makes the appearance model local and non-robust to appearance changes. To overcome these two problems, this paper proposes DensSiam, a novel convolutional Siamese architecture, which uses the concept of dense layers and connects each dense layer to all layers in a feed-forward fashion with a similarity-learning function. DensSiam also includes a Self-Attention mechanism to force the network to pay more attention to the non-local features during offline training. Extensive experiments are performed on four tracking benchmarks: OTB2013 and OTB2015 for validation set; and VOT2015, VOT2016 and VOT2017 for testing set. The obtained results show that DensSiam achieves superior results on these benchmarks compared to other current state-of-the-art methods. △ Less

Submitted 7 September, 2018; originally announced September 2018.

Comments: 11 pages, 3 figures, Accepted by ISVC18

arXiv:1602.03012 [pdf, other]

EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos

Authors: Andru P. Twinanda, Sherif Shehata, Didier Mutter, Jacques Marescaux, Michel de Mathelin, Nicolas Padoy

Abstract: Surgical workflow recognition has numerous potential medical applications, such as the automatic indexing of surgical video databases and the optimization of real-time operating room scheduling, among others. As a result, phase recognition has been studied in the context of several kinds of surgeries, such as cataract, neurological, and laparoscopic surgeries. In the literature, two types of featu… ▽ More Surgical workflow recognition has numerous potential medical applications, such as the automatic indexing of surgical video databases and the optimization of real-time operating room scheduling, among others. As a result, phase recognition has been studied in the context of several kinds of surgeries, such as cataract, neurological, and laparoscopic surgeries. In the literature, two types of features are typically used to perform this task: visual features and tool usage signals. However, the visual features used are mostly handcrafted. Furthermore, the tool usage signals are usually collected via a manual annotation process or by using additional equipment. In this paper, we propose a novel method for phase recognition that uses a convolutional neural network (CNN) to automatically learn features from cholecystectomy videos and that relies uniquely on visual information. In previous studies, it has been shown that the tool signals can provide valuable information in performing the phase recognition task. Thus, we present a novel CNN architecture, called EndoNet, that is designed to carry out the phase recognition and tool presence detection tasks in a multi-task manner. To the best of our knowledge, this is the first work proposing to use a CNN for multiple recognition tasks on laparoscopic videos. Extensive experimental comparisons to other methods show that EndoNet yields state-of-the-art results for both tasks. △ Less

Submitted 23 May, 2016; v1 submitted 9 February, 2016; originally announced February 2016.

Comments: Video: https://www.youtube.com/watch?v=6v0NWrFOUUM

arXiv:0710.0518 [pdf]

doi 10.1021/nl070874k

Self-directed growth of AlGaAs core-shell nanowires for visible light applications

Authors: C. Chen, S. Shehata, C. Fradin, R. LaPierre, C. Couteau, G. Weihs

Abstract: Al(0.37)Ga(0.63)As nanowires (NWs) were grown in a molecular beam epitaxy system on GaAs(111)B substrates. Micro-photoluminescence measurements and energy dispersive X-ray spectroscopy indicated a core-shell structure and Al composition gradient along the NW axis, producing a potential minimum for carrier confinement. The core-shell structure formed during the growth as a consequence of the diff… ▽ More Al(0.37)Ga(0.63)As nanowires (NWs) were grown in a molecular beam epitaxy system on GaAs(111)B substrates. Micro-photoluminescence measurements and energy dispersive X-ray spectroscopy indicated a core-shell structure and Al composition gradient along the NW axis, producing a potential minimum for carrier confinement. The core-shell structure formed during the growth as a consequence of the different Al and Ga adatom diffusion lengths. △ Less

Submitted 2 October, 2007; originally announced October 2007.

Comments: 20 pages, 7 figures

Journal ref: Nano Lett.; (Letter); 2007; 7(9); 2584-2589

Showing 1–21 of 21 results for author: Shehata, S