Search | arXiv e-print repository

arXiv:2406.19526 [pdf, other]

TocBERT: Medical Document Structure Extraction Using Bidirectional Transformers

Authors: Majd Saleh, Sarra Baghdadi, Stéphane Paquelet

Abstract: Text segmentation holds paramount importance in the field of Natural Language Processing (NLP). It plays an important role in several NLP downstream tasks like information retrieval and document summarization. In this work, we propose a new solution, namely TocBERT, for segmenting texts using bidirectional transformers. TocBERT represents a supervised solution trained on the detection of titles an… ▽ More Text segmentation holds paramount importance in the field of Natural Language Processing (NLP). It plays an important role in several NLP downstream tasks like information retrieval and document summarization. In this work, we propose a new solution, namely TocBERT, for segmenting texts using bidirectional transformers. TocBERT represents a supervised solution trained on the detection of titles and sub-titles from their semantic representations. This task was formulated as a named entity recognition (NER) problem. The solution has been applied on a medical text segmentation use-case where the Bio-ClinicalBERT model is fine-tuned to segment discharge summaries of the MIMIC-III dataset. The performance of TocBERT has been evaluated on a human-labeled ground truth corpus of 250 notes. It achieved an F1-score of 84.6% when evaluated on a linear text segmentation problem and 72.8% on a hierarchical text segmentation problem. It outperformed a carefully designed rule-based solution, particularly in distinguishing titles from subtitles. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 6 pages, 6 figures

arXiv:2405.01728 [pdf, other]

Explainability Guided Adversarial Evasion Attacks on Malware Detectors

Authors: Kshitiz Aryal, Maanak Gupta, Mahmoud Abdelsalam, Moustafa Saleh

Abstract: As the focus on security of Artificial Intelligence (AI) is becoming paramount, research on crafting and inserting optimal adversarial perturbations has become increasingly critical. In the malware domain, this adversarial sample generation relies heavily on the accuracy and placement of crafted perturbation with the goal of evading a trained classifier. This work focuses on applying explainabilit… ▽ More As the focus on security of Artificial Intelligence (AI) is becoming paramount, research on crafting and inserting optimal adversarial perturbations has become increasingly critical. In the malware domain, this adversarial sample generation relies heavily on the accuracy and placement of crafted perturbation with the goal of evading a trained classifier. This work focuses on applying explainability techniques to enhance the adversarial evasion attack on a machine-learning-based Windows PE malware detector. The explainable tool identifies the regions of PE malware files that have the most significant impact on the decision-making process of a given malware detector, and therefore, the same regions can be leveraged to inject the adversarial perturbation for maximum efficiency. Profiling all the PE malware file regions based on their impact on the malware detector's decision enables the derivation of an efficient strategy for identifying the optimal location for perturbation injection. The strategy should incorporate the region's significance in influencing the malware detector's decision and the sensitivity of the PE malware file's integrity towards modifying that region. To assess the utility of explainable AI in crafting an adversarial sample of Windows PE malware, we utilize the DeepExplainer module of SHAP for determining the contribution of each region of PE malware to its detection by a CNN-based malware detector, MalConv. Furthermore, we analyzed the significance of SHAP values at a more granular level by subdividing each section of Windows PE into small subsections. We then performed an adversarial evasion attack on the subsections based on the corresponding SHAP values of the byte sequences. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.18550 [pdf, other]

IncidentResponseGPT: Generating Traffic Incident Response Plans with Generative Artificial Intelligence

Authors: Artur Grigorev, Adriana-Simona Mihaita Khaled Saleh, Yuming Ou

Abstract: Traffic congestion due to road incidents poses a significant challenge in urban environments, leading to increased pollution, economic losses, and traffic congestion. Efficiently managing these incidents is imperative for mitigating their adverse effects; however, the complexity of urban traffic systems and the variety of potential incidents represent a considerable obstacle. This paper introduces… ▽ More Traffic congestion due to road incidents poses a significant challenge in urban environments, leading to increased pollution, economic losses, and traffic congestion. Efficiently managing these incidents is imperative for mitigating their adverse effects; however, the complexity of urban traffic systems and the variety of potential incidents represent a considerable obstacle. This paper introduces IncidentResponseGPT, an innovative solution designed to assist traffic management authorities by providing rapid, informed, and adaptable traffic incident response plans. By integrating a Generative AI platform with real-time traffic incident reports and operational guidelines, our system aims to streamline the decision-making process in responding to traffic incidents. The research addresses the critical challenges involved in deploying AI in traffic management, including overcoming the complexity of urban traffic networks, ensuring real-time decision-making capabilities, aligning with local laws and regulations, and securing public acceptance for AI-driven systems. Through a combination of text analysis of accident reports, validation of AI recommendations through traffic simulation, and implementation of transparent and validated AI systems, IncidentResponseGPT offers a promising approach to optimizing traffic flow and reducing congestion in the face of traffic incidents. The relevance of this work extends to traffic management authorities, emergency response teams, and municipal bodies, all integral stakeholders in urban traffic control and incident management. By proposing a novel solution to the identified challenges, this research aims to develop a framework that not only facilitates faster resolution of traffic incidents but also minimizes their overall impact on urban traffic systems. △ Less

Submitted 29 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.07668 [pdf, other]

Shape Completion in the Dark: Completing Vertebrae Morphology from 3D Ultrasound

Authors: Miruna-Alexandra Gafencu, Yordanka Velikova, Mahdi Saleh, Tamas Ungi, Nassir Navab, Thomas Wendler, Mohammad Farid Azampour

Abstract: Purpose: Ultrasound (US) imaging, while advantageous for its radiation-free nature, is challenging to interpret due to only partially visible organs and a lack of complete 3D information. While performing US-based diagnosis or investigation, medical professionals therefore create a mental map of the 3D anatomy. In this work, we aim to replicate this process and enhance the visual representation of… ▽ More Purpose: Ultrasound (US) imaging, while advantageous for its radiation-free nature, is challenging to interpret due to only partially visible organs and a lack of complete 3D information. While performing US-based diagnosis or investigation, medical professionals therefore create a mental map of the 3D anatomy. In this work, we aim to replicate this process and enhance the visual representation of anatomical structures. Methods: We introduce a point-cloud-based probabilistic DL method to complete occluded anatomical structures through 3D shape completion and choose US-based spine examinations as our application. To enable training, we generate synthetic 3D representations of partially occluded spinal views by mimicking US physics and accounting for inherent artifacts. Results: The proposed model performs consistently on synthetic and patient data, with mean and median differences of 2.02 and 0.03 in CD, respectively. Our ablation study demonstrates the importance of US physics-based data generation, reflected in the large mean and median difference of 11.8 CD and 9.55 CD, respectively. Additionally, we demonstrate that anatomic landmarks, such as the spinous process (with reconstruction CD of 4.73) and the facet joints (mean distance to GT of 4.96mm) are preserved in the 3D completion. Conclusion: Our work establishes the feasibility of 3D shape completion for lumbar vertebrae, ensuring the preservation of level-wise characteristics and successful generalization from synthetic to real data. The incorporation of US physics contributes to more accurate patient data completions. Notably, our method preserves essential anatomic landmarks and reconstructs crucial injections sites at their correct locations. The generated data and source code will be made publicly available (https://github.com/miruna20/Shape-Completion-in-the-Dark). △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.06428 [pdf, other]

Intra-Section Code Cave Injection for Adversarial Evasion Attacks on Windows PE Malware File

Authors: Kshitiz Aryal, Maanak Gupta, Mahmoud Abdelsalam, Moustafa Saleh

Abstract: Windows malware is predominantly available in cyberspace and is a prime target for deliberate adversarial evasion attacks. Although researchers have investigated the adversarial malware attack problem, a multitude of important questions remain unanswered, including (a) Are the existing techniques to inject adversarial perturbations in Windows Portable Executable (PE) malware files effective enough… ▽ More Windows malware is predominantly available in cyberspace and is a prime target for deliberate adversarial evasion attacks. Although researchers have investigated the adversarial malware attack problem, a multitude of important questions remain unanswered, including (a) Are the existing techniques to inject adversarial perturbations in Windows Portable Executable (PE) malware files effective enough for evasion purposes?; (b) Does the attack process preserve the original behavior of malware?; (c) Are there unexplored approaches/locations that can be used to carry out adversarial evasion attacks on Windows PE malware?; and (d) What are the optimal locations and sizes of adversarial perturbations required to evade an ML-based malware detector without significant structural change in the PE file? To answer some of these questions, this work proposes a novel approach that injects a code cave within the section (i.e., intra-section) of Windows PE malware files to make space for adversarial perturbations. In addition, a code loader is also injected inside the PE file, which reverts adversarial malware to its original form during the execution, preserving the malware's functionality and executability. To understand the effectiveness of our approach, we injected adversarial perturbations inside the .text, .data and .rdata sections, generated using the gradient descent and Fast Gradient Sign Method (FGSM), to target the two popular CNN-based malware detectors, MalConv and MalConv2. Our experiments yielded notable results, achieving a 92.31% evasion rate with gradient descent and 96.26% with FGSM against MalConv, compared to the 16.17% evasion rate for append attacks. Similarly, when targeting MalConv2, our approach achieved a remarkable maximum evasion rate of 97.93% with gradient descent and 94.34% with FGSM, significantly surpassing the 4.01% evasion rate observed with append attacks. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.03466 [pdf, other]

Physics-Encoded Graph Neural Networks for Deformation Prediction under Contact

Authors: Mahdi Saleh, Michael Sommersperger, Nassir Navab, Federico Tombari

Abstract: In robotics, it's crucial to understand object deformation during tactile interactions. A precise understanding of deformation can elevate robotic simulations and have broad implications across different industries. We introduce a method using Physics-Encoded Graph Neural Networks (GNNs) for such predictions. Similar to robotic gras** and manipulation scenarios, we focus on modeling the dynamics… ▽ More In robotics, it's crucial to understand object deformation during tactile interactions. A precise understanding of deformation can elevate robotic simulations and have broad implications across different industries. We introduce a method using Physics-Encoded Graph Neural Networks (GNNs) for such predictions. Similar to robotic gras** and manipulation scenarios, we focus on modeling the dynamics between a rigid mesh contacting a deformable mesh under external forces. Our approach represents both the soft body and the rigid body within graph structures, where nodes hold the physical states of the meshes. We also incorporate cross-attention mechanisms to capture the interplay between the objects. By jointly learning geometry and physics, our model reconstructs consistent and detailed deformations. We've made our code and dataset public to advance research in robotic simulation and gras**. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: Accepted at 2024 IEEE International Conference on Robotics and Automation (ICRA2024)

arXiv:2402.01878 [pdf, other]

LiPO: Listwise Preference Optimization through Learning-to-Rank

Authors: Tianqi Liu, Zhen Qin, Junru Wu, Jiaming Shen, Misha Khalman, Rishabh Joshi, Yao Zhao, Mohammad Saleh, Simon Baumgartner, Jialu Liu, Peter J. Liu, Xuanhui Wang

Abstract: Aligning language models (LMs) with curated human feedback is critical to control their behaviors in real-world applications. Several recent policy optimization methods, such as DPO and SLiC, serve as promising alternatives to the traditional Reinforcement Learning from Human Feedback (RLHF) approach. In practice, human feedback often comes in a format of a ranked list over multiple responses to a… ▽ More Aligning language models (LMs) with curated human feedback is critical to control their behaviors in real-world applications. Several recent policy optimization methods, such as DPO and SLiC, serve as promising alternatives to the traditional Reinforcement Learning from Human Feedback (RLHF) approach. In practice, human feedback often comes in a format of a ranked list over multiple responses to amortize the cost of reading prompt. Multiple responses can also be ranked by reward models or AI feedback. There lacks such a thorough study on directly fitting upon a list of responses. In this work, we formulate the LM alignment as a \textit{listwise} ranking problem and describe the LiPO framework, where the policy can potentially learn more effectively from a ranked list of plausible responses given the prompt. This view draws an explicit connection to Learning-to-Rank (LTR), where most existing preference optimization work can be mapped to existing ranking objectives. Following this connection, we provide an examination of ranking objectives that are not well studied for LM alignment with DPO and SLiC as special cases when list size is two. In particular, we highlight a specific method, LiPO-$λ$, which leverages a state-of-the-art \textit{listwise} ranking objective and weights each preference pair in a more advanced manner. We show that LiPO-$λ$ can outperform DPO variants and SLiC by a clear margin on several preference alignment tasks with both curated and real rankwise preference data. △ Less

Submitted 22 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.03797 [pdf]

Anatomy of Neural Language Models

Authors: Majd Saleh, Stéphane Paquelet

Abstract: The fields of generative AI and transfer learning have experienced remarkable advancements in recent years especially in the domain of Natural Language Processing (NLP). Transformers have been at the heart of these advancements where the cutting-edge transformer-based Language Models (LMs) have led to new state-of-the-art results in a wide spectrum of applications. While the number of research wor… ▽ More The fields of generative AI and transfer learning have experienced remarkable advancements in recent years especially in the domain of Natural Language Processing (NLP). Transformers have been at the heart of these advancements where the cutting-edge transformer-based Language Models (LMs) have led to new state-of-the-art results in a wide spectrum of applications. While the number of research works involving neural LMs is exponentially increasing, their vast majority are high-level and far from self-contained. Consequently, a deep understanding of the literature in this area is a tough task especially in the absence of a unified mathematical framework explaining the main types of neural LMs. We address the aforementioned problem in this tutorial where the objective is to explain neural LMs in a detailed, simplified and unambiguous mathematical framework accompanied by clear graphical illustrations. Concrete examples on widely used models like BERT and GPT2 are explored. Finally, since transformers pretrained on language-modeling-like tasks have been widely adopted in computer vision and time series applications, we briefly explore some examples of such solutions in order to enable readers to understand how transformers work in the aforementioned domains and compare this use with the original one in NLP. △ Less

Submitted 27 February, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: 36 Pages; 25 Figures; some typos and notation errors are corrected in this version

arXiv:2312.13326 [pdf, other]

doi 10.1364/OE.506519

Conditional Recurrent Neural Networks for broad applications in nonlinear optics

Authors: Simone Lauria, Mohammed F. Saleh

Abstract: We present a novel implementation of conditional Long Short-Term Memory Recurrent Neural Networks that successfully predict the spectral evolution of a pulse in nonlinear periodically-poled waveguides. The developed networks offer large flexibility by allowing the propagation of optical pulses with ranges of energies and temporal widths in waveguides with different poling periods. The results show… ▽ More We present a novel implementation of conditional Long Short-Term Memory Recurrent Neural Networks that successfully predict the spectral evolution of a pulse in nonlinear periodically-poled waveguides. The developed networks offer large flexibility by allowing the propagation of optical pulses with ranges of energies and temporal widths in waveguides with different poling periods. The results show very high agreement with the traditional numerical models. Moreover, we are able to use a single network to calculate both the real and imaginary parts of the pulse complex envelope, allowing for successfully retrieving the pulse temporal and spectral evolution using the same network. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Journal ref: Optics Express Vol. 32, Issue 4, pp. 5582-5591 (2024)

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.02222 [pdf, other]

Lessons learned while develo** the Serenity-S1 ATCA card

Authors: T. Mehner, L. E. Ardila-Perez, M. Balzer, G. Fedi, M. Fuchs, A. Howard, G. Iles, M. Loutit, S. Mansbridge, F. Palla, D. Parker, M. Pesaresi, A. Rose, M. Saleh, O. Sander, M. Schleicher, C. Strohman, D. Tcherniakhovski, T. Williams, J. Zhao

Abstract: The Serenity-S1 is a Xilinx Virtex Ultrascale+ based Advanced Telecommunications Computing Architecture (ATCA) processing blade that has been optimised for production. It incorporates many developments from the Serenity-A and Serenity-Z prototype cards and, where possible, adopts solutions being used across CERN. It also uses many new parts because commonly used parts have disappeared from the mar… ▽ More The Serenity-S1 is a Xilinx Virtex Ultrascale+ based Advanced Telecommunications Computing Architecture (ATCA) processing blade that has been optimised for production. It incorporates many developments from the Serenity-A and Serenity-Z prototype cards and, where possible, adopts solutions being used across CERN. It also uses many new parts because commonly used parts have disappeared from the market during the semiconductor crisis, with only some returning. Improvements to simplify manufacture, the performance of new components, some of the more difficult aspects of procurement, the performance of production-grade Samtec 25\,Gb/s optical firefly parts, and issues with the rack cooling infrastructure are discussed. △ Less

Submitted 14 December, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

Comments: 5 pages, 4 figures, TWEPP 2023

arXiv:2310.07264 [pdf, other]

Classification of Dysarthria based on the Levels of Severity. A Systematic Review

Authors: Afnan Al-Ali, Somaya Al-Maadeed, Moutaz Saleh, Rani Chinnappa Naidu, Zachariah C Alex, Prakash Ramachandran, Rajeev Khoodeeram, Rajesh Kumar M

Abstract: Dysarthria is a neurological speech disorder that can significantly impact affected individuals' communication abilities and overall quality of life. The accurate and objective classification of dysarthria and the determination of its severity are crucial for effective therapeutic intervention. While traditional assessments by speech-language pathologists (SLPs) are common, they are often subjecti… ▽ More Dysarthria is a neurological speech disorder that can significantly impact affected individuals' communication abilities and overall quality of life. The accurate and objective classification of dysarthria and the determination of its severity are crucial for effective therapeutic intervention. While traditional assessments by speech-language pathologists (SLPs) are common, they are often subjective, time-consuming, and can vary between practitioners. Emerging machine learning-based models have shown the potential to provide a more objective dysarthria assessment, enhancing diagnostic accuracy and reliability. This systematic review aims to comprehensively analyze current methodologies for classifying dysarthria based on severity levels. Specifically, this review will focus on determining the most effective set and type of features that can be used for automatic patient classification and evaluating the best AI techniques for this purpose. We will systematically review the literature on the automatic classification of dysarthria severity levels. Sources of information will include electronic databases and grey literature. Selection criteria will be established based on relevance to the research questions. Data extraction will include methodologies used, the type of features extracted for classification, and AI techniques employed. The findings of this systematic review will contribute to the current understanding of dysarthria classification, inform future research, and support the development of improved diagnostic tools. The implications of these findings could be significant in advancing patient care and improving therapeutic outcomes for individuals affected by dysarthria. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: no comments

arXiv:2309.06657 [pdf, other]

Statistical Rejection Sampling Improves Preference Optimization

Authors: Tianqi Liu, Yao Zhao, Rishabh Joshi, Misha Khalman, Mohammad Saleh, Peter J. Liu, Jialu Liu

Abstract: Improving the alignment of language models with human preferences remains an active research challenge. Previous approaches have primarily utilized Reinforcement Learning from Human Feedback (RLHF) via online RL methods such as Proximal Policy Optimization (PPO). Recently, offline methods such as Sequence Likelihood Calibration (SLiC) and Direct Preference Optimization (DPO) have emerged as attrac… ▽ More Improving the alignment of language models with human preferences remains an active research challenge. Previous approaches have primarily utilized Reinforcement Learning from Human Feedback (RLHF) via online RL methods such as Proximal Policy Optimization (PPO). Recently, offline methods such as Sequence Likelihood Calibration (SLiC) and Direct Preference Optimization (DPO) have emerged as attractive alternatives, offering improvements in stability and scalability while maintaining competitive performance. SLiC refines its loss function using sequence pairs sampled from a supervised fine-tuned (SFT) policy, while DPO directly optimizes language models based on preference data, foregoing the need for a separate reward model. However, the maximum likelihood estimator (MLE) of the target optimal policy requires labeled preference pairs sampled from that policy. DPO's lack of a reward model constrains its ability to sample preference pairs from the optimal policy, and SLiC is restricted to sampling preference pairs only from the SFT policy. To address these limitations, we introduce a novel approach called Statistical Rejection Sampling Optimization (RSO) that aims to source preference data from the target optimal policy using rejection sampling, enabling a more accurate estimation of the optimal policy. We also propose a unified framework that enhances the loss functions used in both SLiC and DPO from a preference modeling standpoint. Through extensive experiments across three diverse tasks, we demonstrate that RSO consistently outperforms both SLiC and DPO on evaluations from both Large Language Model (LLM) and human raters. △ Less

Submitted 23 January, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: Accepted in ICLR 2024

arXiv:2309.02965 [pdf, other]

Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction

Authors: Zhiying Leng, Shun-Cheng Wu, Mahdi Saleh, Antonio Montanaro, Hao Yu, Yin Wang, Nassir Navab, Xiaohui Liang, Federico Tombari

Abstract: Reconstructing both objects and hands in 3D from a single RGB image is complex. Existing methods rely on manually defined hand-object constraints in Euclidean space, leading to suboptimal feature learning. Compared with Euclidean space, hyperbolic space better preserves the geometric properties of meshes thanks to its exponentially-growing space distance, which amplifies the differences between th… ▽ More Reconstructing both objects and hands in 3D from a single RGB image is complex. Existing methods rely on manually defined hand-object constraints in Euclidean space, leading to suboptimal feature learning. Compared with Euclidean space, hyperbolic space better preserves the geometric properties of meshes thanks to its exponentially-growing space distance, which amplifies the differences between the features based on similarity. In this work, we propose the first precise hand-object reconstruction method in hyperbolic space, namely Dynamic Hyperbolic Attention Network (DHANet), which leverages intrinsic properties of hyperbolic space to learn representative features. Our method that projects mesh and image features into a unified hyperbolic space includes two modules, ie. dynamic hyperbolic graph convolution and image-attention hyperbolic graph convolution. With these two modules, our method learns mesh features with rich geometry-image multi-modal information and models better hand-object interaction. Our method provides a promising alternative for fine hand-object reconstruction in hyperbolic space. Extensive experiments on three public datasets demonstrate that our method outperforms most state-of-the-art methods. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: Accpeted by ICCV 2023

ACM Class: I.4.5

arXiv:2309.00372 [pdf, other]

On the Localization of Ultrasound Image Slices within Point Distribution Models

Authors: Lennart Bastian, Vincent Bürgin, Ha Young Kim, Alexander Baumann, Benjamin Busam, Mahdi Saleh, Nassir Navab

Abstract: Thyroid disorders are most commonly diagnosed using high-resolution Ultrasound (US). Longitudinal nodule tracking is a pivotal diagnostic protocol for monitoring changes in pathological thyroid morphology. This task, however, imposes a substantial cognitive load on clinicians due to the inherent challenge of maintaining a mental 3D reconstruction of the organ. We thus present a framework for autom… ▽ More Thyroid disorders are most commonly diagnosed using high-resolution Ultrasound (US). Longitudinal nodule tracking is a pivotal diagnostic protocol for monitoring changes in pathological thyroid morphology. This task, however, imposes a substantial cognitive load on clinicians due to the inherent challenge of maintaining a mental 3D reconstruction of the organ. We thus present a framework for automated US image slice localization within a 3D shape representation to ease how such sonographic diagnoses are carried out. Our proposed method learns a common latent embedding space between US image patches and the 3D surface of an individual's thyroid shape, or a statistical aggregation in the form of a statistical shape model (SSM), via contrastive metric learning. Using cross-modality registration and Procrustes analysis, we leverage features from our model to register US slices to a 3D mesh representation of the thyroid shape. We demonstrate that our multi-modal registration framework can localize images on the 3D surface topology of a patient-specific organ and the mean shape of an SSM. Experimental results indicate slice positions can be predicted within an average of 1.2 mm of the ground-truth slice location on the patient-specific 3D anatomy and 4.6 mm on the SSM, exemplifying its usefulness for slice localization during sonographic acquisitions. Code is publically available: \href{https://github.com/vuenc/slice-to-shape}{https://github.com/vuenc/slice-to-shape} △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: ShapeMI Workshop @ MICCAI 2023; 12 pages 2 figures

arXiv:2305.10425 [pdf, other]

SLiC-HF: Sequence Likelihood Calibration with Human Feedback

Authors: Yao Zhao, Rishabh Joshi, Tianqi Liu, Misha Khalman, Mohammad Saleh, Peter J. Liu

Abstract: Learning from human feedback has been shown to be effective at aligning language models with human preferences. Past work has often relied on Reinforcement Learning from Human Feedback (RLHF), which optimizes the language model using reward scores assigned from a reward model trained on human preference data. In this work we show how the recently introduced Sequence Likelihood Calibration (SLiC),… ▽ More Learning from human feedback has been shown to be effective at aligning language models with human preferences. Past work has often relied on Reinforcement Learning from Human Feedback (RLHF), which optimizes the language model using reward scores assigned from a reward model trained on human preference data. In this work we show how the recently introduced Sequence Likelihood Calibration (SLiC), can also be used to effectively learn from human preferences (SLiC-HF). Furthermore, we demonstrate this can be done with human feedback data collected for a different model, similar to off-policy, offline RL data. Automatic and human evaluation experiments on the TL;DR summarization task show that SLiC-HF significantly improves supervised fine-tuning baselines. Furthermore, SLiC-HF presents a competitive alternative to the PPO RLHF implementation used in past work while being much simpler to implement, easier to tune and more computationally efficient in practice. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2304.14736 [pdf, other]

Differentiable Sensor Layouts for End-to-End Learning of Task-Specific Camera Parameters

Authors: Hendrik Sommerhoff, Shashank Agnihotri, Mohamed Saleh, Michael Moeller, Margret Keuper, Andreas Kolb

Abstract: The success of deep learning is frequently described as the ability to train all parameters of a network on a specific application in an end-to-end fashion. Yet, several design choices on the camera level, including the pixel layout of the sensor, are considered as pre-defined and fixed, and high resolution, regular pixel layouts are considered to be the most generic ones in computer vision and gr… ▽ More The success of deep learning is frequently described as the ability to train all parameters of a network on a specific application in an end-to-end fashion. Yet, several design choices on the camera level, including the pixel layout of the sensor, are considered as pre-defined and fixed, and high resolution, regular pixel layouts are considered to be the most generic ones in computer vision and graphics, treating all regions of an image as equally important. While several works have considered non-uniform, \eg, hexagonal or foveated, pixel layouts in hardware and image processing, the layout has not been integrated into the end-to-end learning paradigm so far. In this work, we present the first truly end-to-end trained imaging pipeline that optimizes the size and distribution of pixels on the imaging sensor jointly with the parameters of a given neural network on a specific task. We derive an analytic, differentiable approach for the sensor layout parameterization that allows for task-specific, local varying pixel resolutions. We present two pixel layout parameterization functions: rectangular and curvilinear grid shapes that retain a regular topology. We provide a drop-in module that approximates sensor simulation given existing high-resolution images to directly connect our method with existing deep learning models. We show that network predictions benefit from learnable pixel layouts for two different downstream tasks, classification and semantic segmentation. △ Less

Submitted 28 April, 2023; originally announced April 2023.

arXiv:2304.07515 [pdf, other]

S3M: Scalable Statistical Shape Modeling through Unsupervised Correspondences

Authors: Lennart Bastian, Alexander Baumann, Emily Hoppe, Vincent Bürgin, Ha Young Kim, Mahdi Saleh, Benjamin Busam, Nassir Navab

Abstract: Statistical shape models (SSMs) are an established way to represent the anatomy of a population with various clinically relevant applications. However, they typically require domain expertise, and labor-intensive landmark annotations to construct. We address these shortcomings by proposing an unsupervised method that leverages deep geometric features and functional correspondences to simultaneousl… ▽ More Statistical shape models (SSMs) are an established way to represent the anatomy of a population with various clinically relevant applications. However, they typically require domain expertise, and labor-intensive landmark annotations to construct. We address these shortcomings by proposing an unsupervised method that leverages deep geometric features and functional correspondences to simultaneously learn local and global shape structures across population anatomies. Our pipeline significantly improves unsupervised correspondence estimation for SSMs compared to baseline methods, even on highly irregular surface topologies. We demonstrate this for two different anatomical structures: the thyroid and a multi-chamber heart dataset. Furthermore, our method is robust enough to learn from noisy neural network predictions, potentially enabling scaling SSMs to larger patient populations without manual segmentation annotation. △ Less

Submitted 24 July, 2023; v1 submitted 15 April, 2023; originally announced April 2023.

Comments: Accepted at MICCAI 2023. 13 pages, 6 figures

arXiv:2303.08231 [pdf, other]

Rotation-Invariant Transformer for Point Cloud Matching

Authors: Hao Yu, Zheng Qin, Ji Hou, Mahdi Saleh, Dongsheng Li, Benjamin Busam, Slobodan Ilic

Abstract: The intrinsic rotation invariance lies at the core of matching point clouds with handcrafted descriptors. However, it is widely despised by recent deep matchers that obtain the rotation invariance extrinsically via data augmentation. As the finite number of augmented rotations can never span the continuous SO(3) space, these methods usually show instability when facing rotations that are rarely se… ▽ More The intrinsic rotation invariance lies at the core of matching point clouds with handcrafted descriptors. However, it is widely despised by recent deep matchers that obtain the rotation invariance extrinsically via data augmentation. As the finite number of augmented rotations can never span the continuous SO(3) space, these methods usually show instability when facing rotations that are rarely seen. To this end, we introduce RoITr, a Rotation-Invariant Transformer to cope with the pose variations in the point cloud matching task. We contribute both on the local and global levels. Starting from the local level, we introduce an attention mechanism embedded with Point Pair Feature (PPF)-based coordinates to describe the pose-invariant geometry, upon which a novel attention-based encoder-decoder architecture is constructed. We further propose a global transformer with rotation-invariant cross-frame spatial awareness learned by the self-attention mechanism, which significantly improves the feature distinctiveness and makes the model robust with respect to the low overlap. Experiments are conducted on both the rigid and non-rigid public benchmarks, where RoITr outperforms all the state-of-the-art models by a considerable margin in the low-overlap** scenarios. Especially when the rotations are enlarged on the challenging 3DLoMatch benchmark, RoITr surpasses the existing methods by at least 13 and 5 percentage points in terms of Inlier Ratio and Registration Recall, respectively. △ Less

Submitted 27 March, 2024; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR 2023

arXiv:2212.09928 [pdf, other]

Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

Authors: Kundan Krishna, Yao Zhao, Jie Ren, Balaji Lakshminarayanan, Jiaming Luo, Mohammad Saleh, Peter J. Liu

Abstract: The evaluation of abstractive summarization models typically uses test data that is identically distributed as training data. In real-world practice, documents to be summarized may contain input noise caused by text extraction artifacts or data pipeline bugs. The robustness of model performance under distribution shift caused by such noise is relatively under-studied. We present a large empirical… ▽ More The evaluation of abstractive summarization models typically uses test data that is identically distributed as training data. In real-world practice, documents to be summarized may contain input noise caused by text extraction artifacts or data pipeline bugs. The robustness of model performance under distribution shift caused by such noise is relatively under-studied. We present a large empirical study quantifying the sometimes severe loss in performance (up to 12 ROUGE-1 points) from different types of input noise for a range of datasets and model sizes. We then propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any extra training, auxiliary models, or even prior knowledge of the type of noise. Our proposed approach effectively mitigates the loss in performance, recovering a large fraction of the performance drop, sometimes as large as 11 ROUGE-1 points. △ Less

Submitted 4 December, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: EMNLP Findings 2023 Camera Ready

arXiv:2210.00045 [pdf, other]

Calibrating Sequence likelihood Improves Conditional Language Generation

Authors: Yao Zhao, Misha Khalman, Rishabh Joshi, Shashi Narayan, Mohammad Saleh, Peter J. Liu

Abstract: Conditional language models are predominantly trained with maximum likelihood estimation (MLE), giving probability mass to sparsely observed target sequences. While MLE trained models assign high probability to plausible sequences given the context, the model probabilities often do not accurately rank-order generated sequences by quality. This has been empirically observed in beam search decoding… ▽ More Conditional language models are predominantly trained with maximum likelihood estimation (MLE), giving probability mass to sparsely observed target sequences. While MLE trained models assign high probability to plausible sequences given the context, the model probabilities often do not accurately rank-order generated sequences by quality. This has been empirically observed in beam search decoding as output quality degrading with large beam sizes, and decoding strategies benefiting from heuristics such as length normalization and repetition-blocking. In this work, we introduce sequence likelihood calibration (SLiC) where the likelihood of model generated sequences are calibrated to better align with reference sequences in the model's latent space. With SLiC, decoding heuristics become unnecessary and decoding candidates' quality significantly improves regardless of the decoding method. Furthermore, SLiC shows no sign of diminishing returns with model scale, and presents alternative ways to improve quality with limited training and inference budgets. With SLiC, we exceed or match SOTA results on a wide range of generation tasks spanning abstractive summarization, question generation, abstractive question answering and data-to-text generation, even with modest-sized models. △ Less

Submitted 30 September, 2022; originally announced October 2022.

arXiv:2209.15558 [pdf, other]

Out-of-Distribution Detection and Selective Generation for Conditional Language Models

Authors: Jie Ren, Jiaming Luo, Yao Zhao, Kundan Krishna, Mohammad Saleh, Balaji Lakshminarayanan, Peter J. Liu

Abstract: Machine learning algorithms typically assume independent and identically distributed samples in training and at test time. Much work has shown that high-performing ML classifiers can degrade significantly and provide overly-confident, wrong classification predictions, particularly for out-of-distribution (OOD) inputs. Conditional language models (CLMs) are predominantly trained to classify the nex… ▽ More Machine learning algorithms typically assume independent and identically distributed samples in training and at test time. Much work has shown that high-performing ML classifiers can degrade significantly and provide overly-confident, wrong classification predictions, particularly for out-of-distribution (OOD) inputs. Conditional language models (CLMs) are predominantly trained to classify the next token in an output sequence, and may suffer even worse degradation on OOD inputs as the prediction is done auto-regressively over many steps. Furthermore, the space of potential low-quality outputs is larger as arbitrary text can be generated and it is important to know when to trust the generated output. We present a highly accurate and lightweight OOD detection method for CLMs, and demonstrate its effectiveness on abstractive summarization and translation. We also show how our method can be used under the common and realistic setting of distribution shift for selective generation (analogous to selective prediction for classification) of high-quality outputs, while automatically abstaining from low-quality ones, enabling safer deployment of generative language models. △ Less

Submitted 7 March, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: Published in ICLR 2023

arXiv:2209.13252 [pdf, other]

RIGA: Rotation-Invariant and Globally-Aware Descriptors for Point Cloud Registration

Authors: Hao Yu, Ji Hou, Zheng Qin, Mahdi Saleh, Ivan Shugurov, Kai Wang, Benjamin Busam, Slobodan Ilic

Abstract: Successful point cloud registration relies on accurate correspondences established upon powerful descriptors. However, existing neural descriptors either leverage a rotation-variant backbone whose performance declines under large rotations, or encode local geometry that is less distinctive. To address this issue, we introduce RIGA to learn descriptors that are Rotation-Invariant by design and Glob… ▽ More Successful point cloud registration relies on accurate correspondences established upon powerful descriptors. However, existing neural descriptors either leverage a rotation-variant backbone whose performance declines under large rotations, or encode local geometry that is less distinctive. To address this issue, we introduce RIGA to learn descriptors that are Rotation-Invariant by design and Globally-Aware. From the Point Pair Features (PPFs) of sparse local regions, rotation-invariant local geometry is encoded into geometric descriptors. Global awareness of 3D structures and geometric context is subsequently incorporated, both in a rotation-invariant fashion. More specifically, 3D structures of the whole frame are first represented by our global PPF signatures, from which structural descriptors are learned to help geometric descriptors sense the 3D world beyond local regions. Geometric context from the whole scene is then globally aggregated into descriptors. Finally, the description of sparse regions is interpolated to dense point descriptors, from which correspondences are extracted for registration. To validate our approach, we conduct extensive experiments on both object- and scene-level data. With large rotations, RIGA surpasses the state-of-the-art methods by a margin of 8\degree in terms of the Relative Rotation Error on ModelNet40 and improves the Feature Matching Recall by at least 5 percentage points on 3DLoMatch. △ Less

Submitted 27 September, 2022; originally announced September 2022.

arXiv:2208.07717 [pdf, ps, other]

A new fractional model in Caputo sense for studying the dynamics of COVID-19 spread in France

Authors: Mahmoud H. A. Saleh, Tarek M. Abed-Elhameed

Abstract: The COVID-19 pandemic has rapidly spread around the world and burdened public health in almost all countries involving France. After the spread of SARS-CoV-2, France harvested many deaths in total. In this paper, we develop models with integer and fractional orders to investigate the dynamics of COVID-19 transmission in French hospitals and intensive care units (ICUs). Moreover, this paper aims to… ▽ More The COVID-19 pandemic has rapidly spread around the world and burdened public health in almost all countries involving France. After the spread of SARS-CoV-2, France harvested many deaths in total. In this paper, we develop models with integer and fractional orders to investigate the dynamics of COVID-19 transmission in French hospitals and intensive care units (ICUs). Moreover, this paper aims to explore the impact of precautionary measures on the total infected cases in hospitals and ICUs of COVID-19 for the entire France by using available actual data. △ Less

Submitted 11 August, 2022; originally announced August 2022.

Comments: 20 pages, 7 figures

MSC Class: 92Bxx (Primary)

arXiv:2208.04564 [pdf, other]

Statistical Properties of the log-cosh Loss Function Used in Machine Learning

Authors: Resve A. Saleh, A. K. Md. Ehsanes Saleh

Abstract: This paper analyzes a popular loss function used in machine learning called the log-cosh loss function. A number of papers have been published using this loss function but, to date, no statistical analysis has been presented in the literature. In this paper, we present the distribution function from which the log-cosh loss arises. We compare it to a similar distribution, called the Cauchy distribu… ▽ More This paper analyzes a popular loss function used in machine learning called the log-cosh loss function. A number of papers have been published using this loss function but, to date, no statistical analysis has been presented in the literature. In this paper, we present the distribution function from which the log-cosh loss arises. We compare it to a similar distribution, called the Cauchy distribution, and carry out various statistical procedures that characterize its properties. In particular, we examine its associated pdf, cdf, likelihood function and Fisher information. Side-by-side we consider the Cauchy and Cosh distributions as well as the MLE of the location parameter with asymptotic bias, asymptotic variance, and confidence intervals. We also provide a comparison of robust estimators from several other loss functions, including the Huber loss function and the rank dispersion function. Further, we examine the use of the log-cosh function for quantile regression. In particular, we identify a quantile distribution function from which a maximum likelihood estimator for quantile regression can be derived. Finally, we compare a quantile M-estimator based on log-cosh with robust monotonicity against another approach to quantile regression based on convolutional smoothing. △ Less

Submitted 15 March, 2024; v1 submitted 9 August, 2022; originally announced August 2022.

Comments: 10 pages, 17 figures

arXiv:2208.00524 [pdf, other]

CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning

Authors: Mahdi Saleh, Yige Wang, Nassir Navab, Benjamin Busam, Federico Tombari

Abstract: Processing 3D data efficiently has always been a challenge. Spatial operations on large-scale point clouds, stored as sparse data, require extra cost. Attracted by the success of transformers, researchers are using multi-head attention for vision tasks. However, attention calculations in transformers come with quadratic complexity in the number of inputs and miss spatial intuition on sets like poi… ▽ More Processing 3D data efficiently has always been a challenge. Spatial operations on large-scale point clouds, stored as sparse data, require extra cost. Attracted by the success of transformers, researchers are using multi-head attention for vision tasks. However, attention calculations in transformers come with quadratic complexity in the number of inputs and miss spatial intuition on sets like point clouds. We redesign set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We propose our local attention unit, which captures features in a spatial neighborhood. We also compute efficient and dynamic global cross attentions by leveraging sampling and grou** at each iteration. Finally, to mitigate the non-heterogeneity of point clouds, we propose an efficient Multi-Scale Tokenization (MST), which extracts scale-invariant tokens for attention operations. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods while requiring significantly fewer computations. Our proposed architecture predicts segmentation labels with around half the latency and parameter count of the previous most efficient method with comparable performance. The code is available at https://github.com/YigeWang-WHU/CloudAttention. △ Less

Submitted 31 July, 2022; originally announced August 2022.

arXiv:2203.09418 [pdf, other]

ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation

Authors: Yongzhi Su, Mahdi Saleh, Torben Fetzer, Jason Rambach, Nassir Navab, Benjamin Busam, Didier Stricker, Federico Tombari

Abstract: Establishing correspondences from image to 3D has been a key task of 6DoF object pose estimation for a long time. To predict pose more accurately, deeply learned dense maps replaced sparse templates. Dense methods also improved pose estimation in the presence of occlusion. More recently researchers have shown improvements by learning object fragments as segmentation. In this work, we present a dis… ▽ More Establishing correspondences from image to 3D has been a key task of 6DoF object pose estimation for a long time. To predict pose more accurately, deeply learned dense maps replaced sparse templates. Dense methods also improved pose estimation in the presence of occlusion. More recently researchers have shown improvements by learning object fragments as segmentation. In this work, we present a discrete descriptor, which can represent the object surface densely. By incorporating a hierarchical binary grou**, we can encode the object surface very efficiently. Moreover, we propose a coarse to fine training strategy, which enables fine-grained correspondence prediction. Finally, by matching predicted codes with object surface and using a PnP solver, we estimate the 6DoF pose. Results on the public LM-O and YCB-V datasets show major improvement over the state of the art w.r.t. ADD(-S) metric, even surpassing RGB-D based methods in some cases. △ Less

Submitted 29 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

Comments: CVPR2022 camera ready

arXiv:2202.01537 [pdf, other]

Bending Graphs: Hierarchical Shape Matching using Gated Optimal Transport

Authors: Mahdi Saleh, Shun-Cheng Wu, Luca Cosmo, Nassir Navab, Benjamin Busam, Federico Tombari

Abstract: Shape matching has been a long-studied problem for the computer graphics and vision community. The objective is to predict a dense correspondence between meshes that have a certain degree of deformation. Existing methods either consider the local description of sampled points or discover correspondences based on global shape information. In this work, we investigate a hierarchical learning design,… ▽ More Shape matching has been a long-studied problem for the computer graphics and vision community. The objective is to predict a dense correspondence between meshes that have a certain degree of deformation. Existing methods either consider the local description of sampled points or discover correspondences based on global shape information. In this work, we investigate a hierarchical learning design, to which we incorporate local patch-level information and global shape-level structures. This flexible representation enables correspondence prediction and provides rich features for the matching stage. Finally, we propose a novel optimal transport solver by recurrently updating features on non-confident nodes to learn globally consistent correspondences between the shapes. Our results on publicly available datasets suggest robust performance in presence of severe deformations without the need for extensive training or refinement. △ Less

Submitted 3 February, 2022; originally announced February 2022.

arXiv:2201.03673 [pdf]

doi 10.1063/5.0073502

Alloyed B-(AlxGa1-x)2O3 bulk Czochralski single B-(Al0.1Ga0.9)2O3 and polycrystals B-(Al0.33Ga0.66)2O3, B-(Al0.5Ga0.5)2O3), and property trends

Authors: Jani Jesenovec, Benjamin L. Dutton, Nicholas Stone-Weiss, Adrian Chmielewski, Muad Saleh, Carl Peterson, Nasim Alem, Sriram Krishnamoorthy, John S. McCloy

Abstract: In this work, bulk Czochralski-grown single crystals of 10 mol. % Al2O3 alloyed B-Ga2O3 - monoclinic 10% AGO or B-(Al0.1Ga0.9)2O3 - are obtained, which show +0.20 eV increase in the bandgap compared with unintentionally doped B-Ga2O3. Further, growths of 33% AGO - B-(Al0.33Ga0.67)2O3 - and 50% AGO - B-(Al0.5Ga0.5)2O3 or B-AlGaO3 - produce polycrystalline single-phase monoclinic material (B-AGO). A… ▽ More In this work, bulk Czochralski-grown single crystals of 10 mol. % Al2O3 alloyed B-Ga2O3 - monoclinic 10% AGO or B-(Al0.1Ga0.9)2O3 - are obtained, which show +0.20 eV increase in the bandgap compared with unintentionally doped B-Ga2O3. Further, growths of 33% AGO - B-(Al0.33Ga0.67)2O3 - and 50% AGO - B-(Al0.5Ga0.5)2O3 or B-AlGaO3 - produce polycrystalline single-phase monoclinic material (B-AGO). All three compositions are investigated by x-ray diffraction, Raman spectroscopy, optical absorption, and 27Al nuclear magnetic resonance (NMR). By investigating single phase B-AGO over a large range of Al2O3 concentrations (10 - 50 mol. %), broad trends in the lattice parameter, vibrational modes, optical bandgap, and crystallographic site preference are determined. All lattice parameters show a linear trend with Al incorporation. According to NMR, aluminum incorporates on both crystallographic sites of B-Ga2O3, with a slight preference for the octahedral (GaII) site, which becomes more disordered with increasing Al. Single crystals of 10% AGO were also characterized by x-ray rocking curve, transmission electron microscopy, purity (glow discharge mass spectroscopy and x-ray fluorescence), optical transmission (200 nm - 20 um wavelengths), and resistivity. These measurements suggest that electrical compensation by impurity acceptor do** is not the likely explanation for high resistivity, but rather the shift of a hydrogen level from a shallow donor to a deep acceptor due to Al alloying. .. Cont. This article may be downloaded for personal use only. Any other use requires prior permission of the author and AIP Publishing. This article appeared in Journal of Applied Physics 131 155702. △ Less

Submitted 25 April, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

arXiv:2111.13796 [pdf, ps, other]

doi 10.1103/PhysRevA.105.043511

Mixing second and third-order nonlinear interactions in nanophotonic lithium-niobate waveguides

Authors: Simone Lauria, Mohammed F. Saleh

Abstract: In this paper, we have investigated the interplay between the second and third-order nonlinearities in lithium-niobate waveguides with strong waveguide dispersion using uniform and linearly-chirped poling patterns at input powers in the pico-joule range. We have implemented the accurate unidirectional pulse propagation model to take into account all the possible nonlinear interactions inside these… ▽ More In this paper, we have investigated the interplay between the second and third-order nonlinearities in lithium-niobate waveguides with strong waveguide dispersion using uniform and linearly-chirped poling patterns at input powers in the pico-joule range. We have implemented the accurate unidirectional pulse propagation model to take into account all the possible nonlinear interactions inside these structures. In particular, the poling period has been designed to quasi-phase-match single and multiple sum- and difference-frequency generation processes. We have shown how the poling period can be used as an additional degree of freedom to tailor the output spectra of chip-based nonlinear waveguides in an unprecedented way. △ Less

Submitted 22 April, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

Journal ref: Phys. Rev. A 105, 043511 (2022)

arXiv:2111.04805 [pdf, other]

Solution to the Non-Monotonicity and Crossing Problems in Quantile Regression

Authors: Resve A. Saleh, A. K. Md. Ehsanes Saleh

Abstract: This paper proposes a new method to address the long-standing problem of lack of monotonicity in estimation of the conditional and structural quantile function, also known as quantile crossing problem. Quantile regression is a very powerful tool in data science in general and econometrics in particular. Unfortunately, the crossing problem has been confounding researchers and practitioners alike fo… ▽ More This paper proposes a new method to address the long-standing problem of lack of monotonicity in estimation of the conditional and structural quantile function, also known as quantile crossing problem. Quantile regression is a very powerful tool in data science in general and econometrics in particular. Unfortunately, the crossing problem has been confounding researchers and practitioners alike for over 4 decades. Numerous attempts have been made to find a simple and general solution. This paper describes a unique and elegant solution to the problem based on a flexible check function that is easy to understand and implement in R and Python, while greatly reducing or even eliminating the crossing problem entirely. It will be very important in all areas where quantile regression is routinely used and may also find application in robust regression, especially in the context of machine learning. From this perspective, we also utilize the flexible check function to provide insights into the root causes of the crossing problem. △ Less

Submitted 24 November, 2021; v1 submitted 8 November, 2021; originally announced November 2021.

Comments: 8 pages, 14 figures, IEEE conference format

arXiv:2110.14076 [pdf, other]

CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration

Authors: Hao Yu, Fu Li, Mahdi Saleh, Benjamin Busam, Slobodan Ilic

Abstract: We study the problem of extracting correspondences between a pair of point clouds for registration. For correspondence retrieval, existing works benefit from matching sparse keypoints detected from dense points but usually struggle to guarantee their repeatability. To address this issue, we present CoFiNet - Coarse-to-Fine Network which extracts hierarchical correspondences from coarse to fine wit… ▽ More We study the problem of extracting correspondences between a pair of point clouds for registration. For correspondence retrieval, existing works benefit from matching sparse keypoints detected from dense points but usually struggle to guarantee their repeatability. To address this issue, we present CoFiNet - Coarse-to-Fine Network which extracts hierarchical correspondences from coarse to fine without keypoint detection. On a coarse scale and guided by a weighting scheme, our model firstly learns to match down-sampled nodes whose vicinity points share more overlap, which significantly shrinks the search space of a consecutive stage. On a finer scale, node proposals are consecutively expanded to patches that consist of groups of points together with associated descriptors. Point correspondences are then refined from the overlap areas of corresponding patches, by a density-adaptive matching module capable to deal with varying point density. Extensive evaluation of CoFiNet on both indoor and outdoor standard benchmarks shows our superiority over existing methods. Especially on 3DLoMatch where point clouds share less overlap, CoFiNet significantly outperforms state-of-the-art approaches by at least 5% on Registration Recall, with at most two-third of their parameters. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: Accepted to NeurIPS 2021

arXiv:2108.05576 [pdf]

Common Investigation Process Model for Internet of Things Forensics

Authors: Muhammed Ahmed Saleh, Siti Hajar Othman, Arafat Al-Dhaqm, Mahmoud Ahmad Al-Khasawneh

Abstract: Internet of Things Forensics (IoTFs) is a new discipline in digital forensics science used in the detection, acquisition, preservation, rebuilding, analyzing, and the presentation of evidence from IoT environments. IoTFs discipline still suffers from several issues and challenges that have in the recent past been documented. For example, heterogeneity of IoT infrastructures has mainly been a key c… ▽ More Internet of Things Forensics (IoTFs) is a new discipline in digital forensics science used in the detection, acquisition, preservation, rebuilding, analyzing, and the presentation of evidence from IoT environments. IoTFs discipline still suffers from several issues and challenges that have in the recent past been documented. For example, heterogeneity of IoT infrastructures has mainly been a key challenge. The heterogeneity of the IoT infrastructures makes the IoTFs very complex, and ambiguous among various forensic domain. This paper aims to propose a common investigation processes for IoTFs using the metamodeling method called Common Investigation Process Model (CIPM) for IoTFs. The proposed CIPM consists of four common investigation processes: i) preparation process, ii) collection process, iii) analysis process and iv) final report process. The proposed CIPM can assist IoTFs users to facilitate, manage, and organize the investigation tasks. △ Less

Submitted 12 August, 2021; originally announced August 2021.

Comments: 6 pages, 5 figuers, 76 references

arXiv:2107.14501 [pdf, ps, other]

Narrow and broadband single-photon sources using customised-tapered waveguides

Authors: Harrison R. Greenwood, Mohammed F. Saleh

Abstract: In this paper, we present a thorough investigation for a spontaneous parametric four-wave mixing process in third-order nonlinear waveguides with various continuous tapering patterns. It has been previously shown that these devices can quasi-phase-match the four-wave-mixing process and enhance its conversion efficiency by orders of magnitude. By altering the tapering profile curve we found that th… ▽ More In this paper, we present a thorough investigation for a spontaneous parametric four-wave mixing process in third-order nonlinear waveguides with various continuous tapering patterns. It has been previously shown that these devices can quasi-phase-match the four-wave-mixing process and enhance its conversion efficiency by orders of magnitude. By altering the tapering profile curve we found that these devices can enable single-photon sources with either narrow or broadband spectral widths at on-demand frequencies. Using our model, we were also able to identify the waveguide length at which the single-photon spectral purity is maximised. △ Less

Submitted 30 July, 2021; originally announced July 2021.

Comments: 8 pages, 7 figures

arXiv:2102.09681 [pdf, other]

WebRED: Effective Pretraining And Finetuning For Relation Extraction On The Web

Authors: Robert Ormandi, Mohammad Saleh, Erin Winter, Vinay Rao

Abstract: Relation extraction is used to populate knowledge bases that are important to many applications. Prior datasets used to train relation extraction models either suffer from noisy labels due to distant supervision, are limited to certain domains or are too small to train high-capacity models. This constrains downstream applications of relation extraction. We therefore introduce: WebRED (Web Relation… ▽ More Relation extraction is used to populate knowledge bases that are important to many applications. Prior datasets used to train relation extraction models either suffer from noisy labels due to distant supervision, are limited to certain domains or are too small to train high-capacity models. This constrains downstream applications of relation extraction. We therefore introduce: WebRED (Web Relation Extraction Dataset), a strongly-supervised human annotated dataset for extracting relationships from a variety of text found on the World Wide Web, consisting of ~110K examples. We also describe the methods we used to collect ~200M examples as pre-training data for this task. We show that combining pre-training on a large weakly supervised dataset with fine-tuning on a small strongly-supervised dataset leads to better relation extraction performance. We provide baselines for this new dataset and present a case for the importance of human annotation in improving the performance of relation extraction from text found on the web. △ Less

Submitted 18 February, 2021; originally announced February 2021.

arXiv:2012.12958 [pdf]

Privacy Preservation for Wireless Sensor Networks in Healthcare: State of the Art, and Open Research Challenges

Authors: Yasmine N. M. Saleh, Claude C. Chibelushi, Ayman A. Abdel-Hamid, Abdel-Hamid Soliman

Abstract: The advent of miniature biosensors has generated numerous opportunities for deploying wireless sensor networks in healthcare. However, an important barrier is that acceptance by healthcare stakeholders is influenced by the effectiveness of privacy safeguards for personal and intimate information which is collected and transmitted over the air, within and beyond these networks. In particular, these… ▽ More The advent of miniature biosensors has generated numerous opportunities for deploying wireless sensor networks in healthcare. However, an important barrier is that acceptance by healthcare stakeholders is influenced by the effectiveness of privacy safeguards for personal and intimate information which is collected and transmitted over the air, within and beyond these networks. In particular, these networks are progressing beyond traditional sensors, towards also using multimedia sensors, which raise further privacy concerns. Paradoxically, less research has addressed privacy protection, compared to security. Nevertheless, privacy protection has gradually evolved from being assumed an implicit by-product of security measures, and it is maturing into a research concern in its own right. However, further technical and socio-technical advances are needed. As a contribution towards galvanising further research, the hallmarks of this paper include: (i) a literature survey explicitly anchored on privacy preservation, it is underpinned by untangling privacy goals from security goals, to avoid mixing privacy and security concerns, as is often the case in other papers; (ii) a critical survey of privacy preservation services for wireless sensor networks in healthcare, including threat analysis and assessment methodologies; it also offers classification trees for the multifaceted challenge of privacy protection in healthcare, and for privacy threats, attacks and countermeasures; (iii) a discussion of technical advances complemented by reflection over the implications of regulatory frameworks; (iv) a discussion of open research challenges, leading onto offers of directions for future research towards unlocking the door onto privacy protection which is appropriate for healthcare in the twenty-first century. △ Less

Submitted 23 December, 2020; originally announced December 2020.

Comments: 42 pages, 15 figures and 4 tables

arXiv:2012.09518 [pdf, other]

doi 10.1088/1748-0221/16/03/p03022

The upgrade of the ALICE TPC with GEMs and continuous readout

Authors: J. Adolfsson, M. Ahmed, S. Aiola, J. Alme, T. Alt, W. Amend, F. Anastasopoulos, C. Andrei, M. Angelsmark, V. Anguelov, A. Anjam, H. Appelshäuser, V. Aprodu, O. Arnold, M. Arslandok, D. Baitinger, M. Ball, G. G. Barnaföldi, E. Bartsch, P. Becht, R. Bellwied, A. Berdnikova, M. Berger, N. Bialas, P. Bialas , et al. (210 additional authors not shown)

Abstract: The upgrade of the ALICE TPC will allow the experiment to cope with the high interaction rates foreseen for the forthcoming Run 3 and Run 4 at the CERN LHC. In this article, we describe the design of new readout chambers and front-end electronics, which are driven by the goals of the experiment. Gas Electron Multiplier (GEM) detectors arranged in stacks containing four GEMs each, and continuous re… ▽ More The upgrade of the ALICE TPC will allow the experiment to cope with the high interaction rates foreseen for the forthcoming Run 3 and Run 4 at the CERN LHC. In this article, we describe the design of new readout chambers and front-end electronics, which are driven by the goals of the experiment. Gas Electron Multiplier (GEM) detectors arranged in stacks containing four GEMs each, and continuous readout electronics based on the SAMPA chip, an ALICE development, are replacing the previous elements. The construction of these new elements, together with their associated quality control procedures, is explained in detail. Finally, the readout chamber and front-end electronics cards replacement, together with the commissioning of the detector prior to installation in the experimental cavern, are presented. After a nine-year period of R&D, construction, and assembly, the upgrade of the TPC was completed in 2020. △ Less

Submitted 25 March, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

Comments: 88 pages, 60 figures

Journal ref: JINST 16 (2021) P03022

arXiv:2012.01398 [pdf, ps, other]

doi 10.1364/OL.421649

Ultra-broadband supercontinuum generation in gas-filled photonic-crystal fibers: The epsilon-near-zero regime

Authors: Mohammed F. Saleh, Fabio Biancalana

Abstract: In this Letter, we show theoretically that the nonlinear photoionisation process of a noble gas inside a hollow-core photonic crystal fibre can be exploited in obtaining broadband supercontinuum generation via pum** close to the mid-infrared regime. The interplay between the Kerr and photoionisation nonlinearities is strongly enhanced in this regime. Photoionisation continuously modifies the med… ▽ More In this Letter, we show theoretically that the nonlinear photoionisation process of a noble gas inside a hollow-core photonic crystal fibre can be exploited in obtaining broadband supercontinuum generation via pum** close to the mid-infrared regime. The interplay between the Kerr and photoionisation nonlinearities is strongly enhanced in this regime. Photoionisation continuously modifies the medium dispersion, in which the refractive index starts to significantly decrease and approach the epsilon-near-zero regime. Subsequently, the self-phase modulation induced by the Kerr effect is boosted because of the accompanied slow-light effect. As a result of this interplay, an output spectrum that comprises of a broadband light with multiple dispersive-wave emission is obtained. △ Less

Submitted 2 December, 2020; originally announced December 2020.

Comments: 5 pages, 5 figures

Journal ref: Optics Letters Vol. 46, Issue 8, pp. 1959-1962 (2021)

arXiv:2011.03657 [pdf]

doi 10.1063/5.0029442

Defect states and their electric field-enhanced electron thermal emission in heavily Zr-doped beta-Ga2O3 crystals

Authors: Rujun Sun, Yu Kee Ooi, Arkka Bhattacharyya, Muad Saleh, Sriram Krishnamoorthy, Kelvin G. Lynn, Michael A. Scarpulla

Abstract: Performing deep level transient spectroscopy (DLTS) on Schottky diodes, we investigated defect levels below the conduction band minima (Ec) in Czochralski (CZ) grown unintentionally-doped (UID) and vertical gradient freeze (VGF)-grown Zr-doped beta-Ga2O3 crystals. In UID crystals with an electron concentration of 10^17 cm-3, we observe levels at 0.18 eV and 0.46 eV in addition to the previously re… ▽ More Performing deep level transient spectroscopy (DLTS) on Schottky diodes, we investigated defect levels below the conduction band minima (Ec) in Czochralski (CZ) grown unintentionally-doped (UID) and vertical gradient freeze (VGF)-grown Zr-doped beta-Ga2O3 crystals. In UID crystals with an electron concentration of 10^17 cm-3, we observe levels at 0.18 eV and 0.46 eV in addition to the previously reported 0.86 (E2) and 1.03 eV (E3) levels. For 10^18 cm-3 Zr-doped Ga2O3, signatures at 0.30 eV (E15) and 0.71 eV (E16) are present. For the highest Zr do** of 5*10^18 cm-3, we observe only one signature at 0.59 eV. Electric field-enhanced emission rates are demonstrated via increasing the reverse bias during measurement. The 0.86 eV signature in the UID sample displays phonon-assisted tunneling enhanced thermal emission and is consistent with the widely reported E2 (FeGa) defect. The 0.71 eV (E16) signature in the lower-Zr-doped crystal also exhibits phonon-assisted tunneling emission enhancement. Taking into account that the high do** in the Zr-doped diodes also increases the electric field, we propose that the 0.59 eV signature in the highest Zr-doped sample likely corresponds to the 0.71 eV signature in lower-doped samples. Our analysis highlights the importance of testing for and reporting on field-enhanced emission especially the electric field present during DLTS and other characterization experiments on beta-Ga2O3 along with the standard emission energy, cross-section, and lambda-corrected trap density. This is important because of the intended use of beta-Ga2O3 in high-field devices and the many orders of magnitude of possible do**. △ Less

Submitted 6 November, 2020; originally announced November 2020.

Comments: 18 pages, 3 figures

arXiv:2010.09079 [pdf, other]

Graphite: GRAPH-Induced feaTure Extraction for Point Cloud Registration

Authors: Mahdi Saleh, Shervin Dehghani, Benjamin Busam, Nassir Navab, Federico Tombari

Abstract: 3D Point clouds are a rich source of information that enjoy growing popularity in the vision community. However, due to the sparsity of their representation, learning models based on large point clouds is still a challenge. In this work, we introduce Graphite, a GRAPH-Induced feaTure Extraction pipeline, a simple yet powerful feature transform and keypoint detector. Graphite enables intensive down… ▽ More 3D Point clouds are a rich source of information that enjoy growing popularity in the vision community. However, due to the sparsity of their representation, learning models based on large point clouds is still a challenge. In this work, we introduce Graphite, a GRAPH-Induced feaTure Extraction pipeline, a simple yet powerful feature transform and keypoint detector. Graphite enables intensive down-sampling of point clouds with keypoint detection accompanied by a descriptor. We construct a generic graph-based learning scheme to describe point cloud regions and extract salient points. To this end, we take advantage of 6D pose information and metric learning to learn robust descriptions and keypoints across different scans. We Reformulate the 3D keypoint pipeline with graph neural networks which allow efficient processing of the point set while boosting its descriptive power which ultimately results in more accurate 3D registrations. We demonstrate our lightweight descriptor on common 3D descriptor matching and point cloud registration benchmarks and achieve comparable results with the state of the art. Describing 100 patches of a point cloud and detecting their keypoints takes only ~0.018 seconds with our proposed network. △ Less

Submitted 18 October, 2020; originally announced October 2020.

arXiv:2010.03483 [pdf, other]

doi 10.5121/csit.2020.101101

Evaluating the impact of different types of crossover and selection methods on the convergence of 0/1 Knapsack using Genetic Algorithm

Authors: Waleed Bin Owais, Iyad W. J. Alkhazendar, Dr. Mohammad Saleh

Abstract: Genetic Algorithm is an evolutionary algorithm and a metaheuristic that was introduced to overcome the failure of gradient based method in solving the optimization and search problems. The purpose of this paper is to evaluate the impact on the convergence of Genetic Algorithm vis-a-vis 0/1 knapsack. By kee** the number of generations and the initial population fixed, different crossover methods… ▽ More Genetic Algorithm is an evolutionary algorithm and a metaheuristic that was introduced to overcome the failure of gradient based method in solving the optimization and search problems. The purpose of this paper is to evaluate the impact on the convergence of Genetic Algorithm vis-a-vis 0/1 knapsack. By kee** the number of generations and the initial population fixed, different crossover methods like one point crossover and two-point crossover were evaluated and juxtaposed with each other. In addition to this, the impact of different selection methods like rank-selection, roulette wheel and tournament selection were evaluated and compared. Our results indicate that convergence rate of combination of one point crossover with tournament selection, with respect to 0/1 knapsack problem that we considered, is the highest and thereby most efficient in solving 0/1 knapsack. △ Less

Submitted 7 October, 2020; originally announced October 2020.

Comments: 7th International Conference on Computer Science, Engineering and Information Technology (CSEIT 2020) September 26 ~ 27, 2020, Copenhagen, Denmark

arXiv:2007.06387 [pdf, ps, other]

The unified version of mixing maps between non-void sets

Authors: Salam Adel Al-Bayati, Akram Al-Sabbagh, Manaf Adnan Saleh Saleh

Abstract: The nonlinear concepts of mixed summable families and maps for the spaces that only non-void sets are developed. Several characterizations of the corresponding concepts are achieved and the proof for a general Pietsch Domination-type theorem is established. Furthermore, this work has presented plenty of composition and inclusion results between different classes of map**s in the abstract setting… ▽ More The nonlinear concepts of mixed summable families and maps for the spaces that only non-void sets are developed. Several characterizations of the corresponding concepts are achieved and the proof for a general Pietsch Domination-type theorem is established. Furthermore, this work has presented plenty of composition and inclusion results between different classes of map**s in the abstract settings. Finally, a generalized notation of mixing maps and their characteristics are extended to a more general setting. △ Less

Submitted 11 July, 2021; v1 submitted 13 July, 2020; originally announced July 2020.

MSC Class: Primary 47Jxx; 47Hxx; Secondary 54Cxx; 03E75

arXiv:2006.10213 [pdf, other]

SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization

Authors: Yao Zhao, Mohammad Saleh, Peter J. Liu

Abstract: Most prior work in the sequence-to-sequence paradigm focused on datasets with input sequence lengths in the hundreds of tokens due to the computational constraints of common RNN and Transformer architectures. In this paper, we study long-form abstractive text summarization, a sequence-to-sequence setting with input sequence lengths up to 100,000 tokens and output sequence lengths up to 768 tokens.… ▽ More Most prior work in the sequence-to-sequence paradigm focused on datasets with input sequence lengths in the hundreds of tokens due to the computational constraints of common RNN and Transformer architectures. In this paper, we study long-form abstractive text summarization, a sequence-to-sequence setting with input sequence lengths up to 100,000 tokens and output sequence lengths up to 768 tokens. We propose SEAL, a Transformer-based model, featuring a new encoder-decoder attention that dynamically extracts/selects input snippets to sparsely attend to for each output segment. Using only the original documents and summaries, we derive proxy labels that provide weak supervision for extractive layers simultaneously with regular supervision from abstractive summaries. The SEAL model achieves state-of-the-art results on existing long-form summarization tasks, and outperforms strong baseline models on a new dataset/task we introduce, Search2Wiki, with much longer input text. Since content selection is explicit in the SEAL model, a desirable side effect is that the selection can be inspected for enhanced interpretability. △ Less

Submitted 17 June, 2020; originally announced June 2020.

arXiv:2004.03675 [pdf, other]

Spatio-temporal Learning from Longitudinal Data for Multiple Sclerosis Lesion Segmentation

Authors: Stefan Denner, Ashkan Khakzar, Moiz Sajid, Mahdi Saleh, Ziga Spiclin, Seong Tae Kim, Nassir Navab

Abstract: Segmentation of Multiple Sclerosis (MS) lesions in longitudinal brain MR scans is performed for monitoring the progression of MS lesions. We hypothesize that the spatio-temporal cues in longitudinal data can aid the segmentation algorithm. Therefore, we propose a multi-task learning approach by defining an auxiliary self-supervised task of deformable registration between two time-points to guide t… ▽ More Segmentation of Multiple Sclerosis (MS) lesions in longitudinal brain MR scans is performed for monitoring the progression of MS lesions. We hypothesize that the spatio-temporal cues in longitudinal data can aid the segmentation algorithm. Therefore, we propose a multi-task learning approach by defining an auxiliary self-supervised task of deformable registration between two time-points to guide the neural network toward learning from spatio-temporal changes. We show the efficacy of our method on a clinical dataset comprised of 70 patients with one follow-up study for each patient. Our results show that spatio-temporal information in longitudinal data is a beneficial cue for improving segmentation. We improve the result of current state-of-the-art by 2.6% in terms of overall score (p<0.05). Code is publicly available. △ Less

Submitted 26 September, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

Comments: Accepted at BrainLes Workshop in MICCAI2020

arXiv:2001.11187 [pdf]

doi 10.1088/1361-6641/ab75a6

Degenerate do** in \b{eta}-Ga2O3 Single Crystals through Hf-do**

Authors: Muad Saleh, Joel B. Varley, Jani Jesenovec, Arkka Bhattacharyya, Sriram Krishnamoorthy, Santosh Swain, Kelvin Lynn

Abstract: N type conductivity of \b{eta}-Ga2O3 grown from the melt is typically achieved using Sn and Si. In this paper, we experimentally and computationally investigate Hf do** of \b{eta}-Ga2O3 single crystals using UV-Vis-NIR absorption and Hall Effect measurements and hybrid functional calculations. Unintentionally-doped and Hf-doped samples with a nominal concentration of 0.5at% were grown from the m… ▽ More N type conductivity of \b{eta}-Ga2O3 grown from the melt is typically achieved using Sn and Si. In this paper, we experimentally and computationally investigate Hf do** of \b{eta}-Ga2O3 single crystals using UV-Vis-NIR absorption and Hall Effect measurements and hybrid functional calculations. Unintentionally-doped and Hf-doped samples with a nominal concentration of 0.5at% were grown from the melt using vertical gradient freeze (VGF) and Czochralski method in mixed Ar+O2 atmosphere. We demonstrate Hf dopants, predicted to incorporate on the octahedral GaII site as a shallow donor, achieve degenerate do** in \b{eta}-Ga2O3 with a measured electron concentration 2 x 10^19 cm^-3 , mobility 80-65 cm^2 /Vs, and resistivity down to 5 mOhm-cm in our samples. The concentration of Hf was measured to be 1.3 x 10^19 atoms/cm^3 using glow discharge mass spectroscopy (GDMS) on doped samples, confirming Hf to be the cause of n-type conductivity (electron concentration ~2 x 10^19 cm-3). △ Less

Submitted 30 January, 2020; originally announced January 2020.

arXiv:2001.07326 [pdf, other]

doi 10.1109/JEDS.2020.2974260

Schottky Barrier Height Engineering In $β$-Ga$_2$O$_3$ Using SiO$_2$ Interlayer Dielectric

Authors: Arkka Bhattacharyya, Praneeth Ranga, Muad Saleh, Saurav Roy, Michael A. Scarpulla, Kelvin G. Lynn, Sriram Krishnamoorthy

Abstract: This paper reports on the modulation of Schottky barrier heights (SBH) on three different orientations of $β$-Ga$_2$O$_3$ by insertion of an ultra-thin SiO$_2$ dielectric interlayer at the metal-semiconductor junction, which can potentially lower the Fermi-level pinning (FLP) effect due to metal-induced gap states (MIGS). Pt and Ni metal-semiconductor (MS) and metal-interlayer-semiconductor (MIS)… ▽ More This paper reports on the modulation of Schottky barrier heights (SBH) on three different orientations of $β$-Ga$_2$O$_3$ by insertion of an ultra-thin SiO$_2$ dielectric interlayer at the metal-semiconductor junction, which can potentially lower the Fermi-level pinning (FLP) effect due to metal-induced gap states (MIGS). Pt and Ni metal-semiconductor (MS) and metal-interlayer-semiconductor (MIS) Schottky barrier diodes were fabricated on bulk n-type doped $β$-Ga$_2$O$_3$ single crystal substrates along the (010), (-201) and (100) orientations and were characterized by room temperature current-voltage (I-V) and capacitance-voltage (C-V) measurements. Pt MIS diodes exhibited 0.53 eV and 0.37 eV increment in SBH along the (010) and (-201) orientations respectively as compared to their respective MS counterparts. The highest SBH of 1.81 eV was achieved on the (010)-oriented MIS SBD using Pt metal. The MIS SBDs on (100)-oriented substrates exhibited a dramatic increment ($>$1.5$\times$) in SBH as well as reduction in reverse leakage current. The use of thin dielectric interlayers can be an efficient experimental method to modulate SBH of metal/Ga$_2$O$_3$ junctions. △ Less

Submitted 20 January, 2020; originally announced January 2020.

Comments: 8 pages, 10 figures, 2 tables

arXiv:1912.08777 [pdf, other]

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Authors: **gqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu

Abstract: Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-trainin… ▽ More Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective. In PEGASUS, important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. Our model also shows surprising performance on low-resource summarization, surpassing previous state-of-the-art results on 6 datasets with only 1000 examples. Finally we validated our results using human evaluation and show that our model summaries achieve human performance on multiple datasets. △ Less

Submitted 10 July, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

Comments: Added results from mixed+stochastic model, test-set overlap** analysis; Code link added; Accepted for ICML 2020. arXiv admin note: text overlap with arXiv:1605.06560, arXiv:1205.2395, arXiv:0902.4351, arXiv:1610.09932, arXiv:nucl-ex/0512029 by other authors

arXiv:1908.09168 [pdf, other]

A Novel Method to Generate Key-Dependent S-Boxes with Identical Algebraic Properties

Authors: Ahmad Y. Al-Dweik, Iqtadar Hussain, Moutaz S. Saleh, M. T. Mustafa

Abstract: The s-box plays the vital role of creating confusion between the ciphertext and secret key in any cryptosystem, and is the only nonlinear component in many block ciphers. Dynamic s-boxes, as compared to static, improve entropy of the system, hence leading to better resistance against linear and differential attacks. It was shown in [2] that while incorporating dynamic s-boxes in cryptosystems is s… ▽ More The s-box plays the vital role of creating confusion between the ciphertext and secret key in any cryptosystem, and is the only nonlinear component in many block ciphers. Dynamic s-boxes, as compared to static, improve entropy of the system, hence leading to better resistance against linear and differential attacks. It was shown in [2] that while incorporating dynamic s-boxes in cryptosystems is sufficiently secure, they do not keep non-linearity invariant. This work provides an algorithmic scheme to generate key-dependent dynamic $n\times n$ clone s-boxes having the same algebraic properties namely bijection, nonlinearity, the strict avalanche criterion (SAC), the output bits independence criterion (BIC) as of the initial seed s-box. The method is based on group action of symmetric group $S_n$ and a subgroup $S_{2^n}$ respectively on columns and rows of Boolean functions ($GF(2^n)\to GF(2)$) of s-box. Invariance of the bijection, nonlinearity, SAC, and BIC for the generated clone copies is proved. As illustration, examples are provided for $n=8$ and $n=4$ along with comparison of the algebraic properties of the clone and initial seed s-box. The proposed method is an extension of [3,4,5,6] which involved group action of $S_8$ only on columns of Boolean functions ($GF(2^8)\to GF(2)$ ) of s-box. For $n=4$, we have used an initial $4\times 4$ s-box constructed by Carlisle Adams and Stafford Tavares [7] to generated $(4!)^2$ clone copies. For $n=8$, it can be seen [3,4,5,6] that the number of clone copies that can be constructed by permuting the columns is $8!$. For each column permutation, the proposed method enables to generate $8!$ clone copies by permuting the rows. △ Less

Submitted 3 May, 2021; v1 submitted 24 August, 2019; originally announced August 2019.

arXiv:1905.13322 [pdf, other]

doi 10.1145/3292500.3330955

Assessing The Factual Accuracy of Generated Text

Authors: Ben Goodrich, Vinay Rao, Mohammad Saleh, Peter J Liu

Abstract: We propose a model-based metric to estimate the factual accuracy of generated text that is complementary to typical scoring schemes like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy). We introduce and release a new large-scale dataset based on Wikipedia and Wikidata to train relation classifiers and end-to-end fact extraction models. The end-t… ▽ More We propose a model-based metric to estimate the factual accuracy of generated text that is complementary to typical scoring schemes like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy). We introduce and release a new large-scale dataset based on Wikipedia and Wikidata to train relation classifiers and end-to-end fact extraction models. The end-to-end models are shown to be able to extract complete sets of facts from datasets with full pages of text. We then analyse multiple models that estimate factual accuracy on a Wikipedia text summarization task, and show their efficacy compared to ROUGE and other model-free variants by conducting a human evaluation study. △ Less

Submitted 25 May, 2021; v1 submitted 30 May, 2019; originally announced May 2019.

Journal ref: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '19), August 4--8, 2019, Anchorage, AK, USA

Showing 1–50 of 97 results for author: Saleh, M