Search | arXiv e-print repository

OCT-SelfNet: A Self-Supervised Framework with Multi-Modal Datasets for Generalized and Robust Retinal Disease Detection

Authors: Fatema-E Jannat, Sina Gholami, Minhaj Nur Alam, Hamed Tabkhi

Abstract: Despite the revolutionary impact of AI and the development of locally trained algorithms, achieving widespread generalized learning from multi-modal data in medical AI remains a significant challenge. This gap hinders the practical deployment of scalable medical AI solutions. Addressing this challenge, our research contributes a self-supervised robust machine learning framework, OCT-SelfNet, for d… ▽ More Despite the revolutionary impact of AI and the development of locally trained algorithms, achieving widespread generalized learning from multi-modal data in medical AI remains a significant challenge. This gap hinders the practical deployment of scalable medical AI solutions. Addressing this challenge, our research contributes a self-supervised robust machine learning framework, OCT-SelfNet, for detecting eye diseases using optical coherence tomography (OCT) images. In this work, various data sets from various institutions are combined enabling a more comprehensive range of representation. Our method addresses the issue using a two-phase training approach that combines self-supervised pretraining and supervised fine-tuning with a mask autoencoder based on the SwinV2 backbone by providing a solution for real-world clinical deployment. Extensive experiments on three datasets with different encoder backbones, low data settings, unseen data settings, and the effect of augmentation show that our method outperforms the baseline model, Resnet-50 by consistently attaining AUC-ROC performance surpassing 77% across all tests, whereas the baseline model exceeds 54%. Moreover, in terms of the AUC-PR metric, our proposed method exceeded 42%, showcasing a substantial increase of at least 10% in performance compared to the baseline, which exceeded only 33%. This contributes to our understanding of our approach's potential and emphasizes its usefulness in clinical settings. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 12 pages, 7 figures, 6 tables

arXiv:2310.07830 [pdf, other]

Does Synthetic Data Make Large Language Models More Efficient?

Authors: Sia Gholami, Marwan Omar

Abstract: Natural Language Processing (NLP) has undergone transformative changes with the advent of deep learning methodologies. One challenge persistently confronting researchers is the scarcity of high-quality, annotated datasets that drive these models. This paper explores the nuances of synthetic data generation in NLP, with a focal point on template-based question generation. By assessing its advantage… ▽ More Natural Language Processing (NLP) has undergone transformative changes with the advent of deep learning methodologies. One challenge persistently confronting researchers is the scarcity of high-quality, annotated datasets that drive these models. This paper explores the nuances of synthetic data generation in NLP, with a focal point on template-based question generation. By assessing its advantages, including data augmentation potential and the introduction of structured variety, we juxtapose these benefits against inherent limitations, such as the risk of overfitting and the constraints posed by pre-defined templates. Drawing from empirical evaluations, we demonstrate the impact of template-based synthetic data on the performance of modern transformer models. We conclude by emphasizing the delicate balance required between synthetic and real-world data, and the future trajectories of integrating synthetic data in model training pipelines. The findings aim to guide NLP practitioners in harnessing synthetic data's potential, ensuring optimal model performance in diverse applications. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.04573 [pdf, other]

Can pruning make Large Language Models more efficient?

Authors: Sia Gholami, Marwan Omar

Abstract: Transformer models have revolutionized natural language processing with their unparalleled ability to grasp complex contextual relationships. However, the vast number of parameters in these models has raised concerns regarding computational efficiency, environmental impact, and deployability on resource-limited platforms. To address these challenges, this paper investigates the application of weig… ▽ More Transformer models have revolutionized natural language processing with their unparalleled ability to grasp complex contextual relationships. However, the vast number of parameters in these models has raised concerns regarding computational efficiency, environmental impact, and deployability on resource-limited platforms. To address these challenges, this paper investigates the application of weight pruning-a strategic reduction of model parameters based on their significance-as an optimization strategy for Transformer architectures. Through extensive experimentation, we explore various pruning methodologies, highlighting their impact on model performance, size, and computational demands. Our findings suggest that with judicious selection of pruning hyperparameters, significant reductions in model size are attainable without considerable compromise on performance. Moreover, when coupled with post-pruning fine-tuning strategies, some pruned models even exhibit enhanced generalization capabilities. This work seeks to bridge the gap between model efficiency and performance, paving the way for more scalable and environmentally responsible deep learning applications. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2310.02421 [pdf, ps, other]

Can a student Large Language Model perform as well as it's teacher?

Authors: Sia Gholami, Marwan Omar

Abstract: The burgeoning complexity of contemporary deep learning models, while achieving unparalleled accuracy, has inadvertently introduced deployment challenges in resource-constrained environments. Knowledge distillation, a technique aiming to transfer knowledge from a high-capacity "teacher" model to a streamlined "student" model, emerges as a promising solution to this dilemma. This paper provides a c… ▽ More The burgeoning complexity of contemporary deep learning models, while achieving unparalleled accuracy, has inadvertently introduced deployment challenges in resource-constrained environments. Knowledge distillation, a technique aiming to transfer knowledge from a high-capacity "teacher" model to a streamlined "student" model, emerges as a promising solution to this dilemma. This paper provides a comprehensive overview of the knowledge distillation paradigm, emphasizing its foundational principles such as the utility of soft labels and the significance of temperature scaling. Through meticulous examination, we elucidate the critical determinants of successful distillation, including the architecture of the student model, the caliber of the teacher, and the delicate balance of hyperparameters. While acknowledging its profound advantages, we also delve into the complexities and challenges inherent in the process. Our exploration underscores knowledge distillation's potential as a pivotal technique in optimizing the trade-off between model performance and deployment efficiency. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.06589 [pdf, ps, other]

Do Generative Large Language Models need billions of parameters?

Authors: Sia Gholami, Marwan Omar

Abstract: This paper presents novel systems and methodologies for the development of efficient large language models (LLMs). It explores the trade-offs between model size, performance, and computational resources, with the aim of maximizing the efficiency of these AI systems. The research explores novel methods that allow different parts of the model to share parameters, reducing the total number of unique… ▽ More This paper presents novel systems and methodologies for the development of efficient large language models (LLMs). It explores the trade-offs between model size, performance, and computational resources, with the aim of maximizing the efficiency of these AI systems. The research explores novel methods that allow different parts of the model to share parameters, reducing the total number of unique parameters required. This approach ensures that the model remains compact without sacrificing its ability to learn and represent complex language structures. This study provides valuable insights and tools for creating more efficient and effective LLMs, contributing to a more sustainable and accessible future for AI language modeling. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2212.02450 [pdf, other]

Framework for 2D Ad placements in LinearTV

Authors: Divya Bhargavi, Karan Sindwani, Sia Gholami

Abstract: Virtual Product placement(VPP) is the advertising technique of digitally placing a branded object into the scene of a movie or TV show. This type of advertising provides the ability for brands to reach consumers without interrupting the viewing experience with a commercial break, as the products are seen in the background or as props. Despite this being a billion-dollar industry, ad rendering tech… ▽ More Virtual Product placement(VPP) is the advertising technique of digitally placing a branded object into the scene of a movie or TV show. This type of advertising provides the ability for brands to reach consumers without interrupting the viewing experience with a commercial break, as the products are seen in the background or as props. Despite this being a billion-dollar industry, ad rendering technique is currently executed at post production stage, manually either with the help of VFx artists or through semi-automated solutions. In this paper, we demonstrate a fully automated framework to digitally place 2-D ads in linear TV cooking shows captured using single-view camera with small camera movements. Without access to full video or production camera configuration, this framework performs the following tasks (i) identifying empty space for 2-D ad placement (ii) kitchen scene understanding (iii) occlusion handling (iv) ambient lighting and (v) ad tracking. △ Less

Submitted 5 December, 2022; originally announced December 2022.

arXiv:2208.09921 [pdf]

Alexa, Predict My Flight Delay

Authors: Sia Gholami, Saba Khashe

Abstract: Airlines are critical today for carrying people and commodities on time. Any delay in the schedule of these planes can potentially disrupt the business and trade of thousands of employees at any given time. Therefore, precise flight delay prediction is beneficial for the aviation industry and passenger travel. Recent research has focused on using artificial intelligence algorithms to predict the p… ▽ More Airlines are critical today for carrying people and commodities on time. Any delay in the schedule of these planes can potentially disrupt the business and trade of thousands of employees at any given time. Therefore, precise flight delay prediction is beneficial for the aviation industry and passenger travel. Recent research has focused on using artificial intelligence algorithms to predict the possibility of flight delays. Earlier prediction algorithms were designed for a specific air route or airfield. Many present flight delay prediction algorithms rely on tiny samples and are challenging to understand, allowing almost no room for machine learning implementation. This research study develops a flight delay prediction system by analyzing data from domestic flights inside the United States of America. The proposed models learn about the factors that cause flight delays and cancellations and the link between departure and arrival delays. △ Less

Submitted 21 August, 2022; originally announced August 2022.

arXiv:2203.00734 [pdf, other]

Knock, knock. Who's there? -- Identifying football player jersey numbers with synthetic data

Authors: Divya Bhargavi, Erika Pelaez Coyotl, Sia Gholami

Abstract: Automatic player identification is an essential and complex task in sports video analysis. Different strategies have been devised over the years, but identification based on jersey numbers is one of the most common approaches given its versatility and relative simplicity. However, automatic detection of jersey numbers is still challenging due to changing camera angles, low video resolution, small… ▽ More Automatic player identification is an essential and complex task in sports video analysis. Different strategies have been devised over the years, but identification based on jersey numbers is one of the most common approaches given its versatility and relative simplicity. However, automatic detection of jersey numbers is still challenging due to changing camera angles, low video resolution, small object size in wide-range shots and transient changes in the player's posture and movement. In this paper we present a novel approach for jersey number identification in a small, highly imbalanced dataset from the Seattle Seahawks practice videos. Our results indicate that simple models can achieve an acceptable performance on the jersey number detection task and that synthetic data can improve the performance dramatically (accuracy increase of ~9% overall, ~18% on low frequency numbers) making our approach achieve state of the art results. △ Less

Submitted 4 April, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

arXiv:2111.11520 [pdf, other]

Zero-Shot Open-Book Question Answering

Authors: Sia Gholami, Mehdi Noori

Abstract: Open book question answering is a subset of question answering tasks where the system aims to find answers in a given set of documents (open-book) and common knowledge about a topic. This article proposes a solution for answering natural language questions from a corpus of Amazon Web Services (AWS) technical documents with no domain-specific labeled data (zero-shot). These questions can have yes-n… ▽ More Open book question answering is a subset of question answering tasks where the system aims to find answers in a given set of documents (open-book) and common knowledge about a topic. This article proposes a solution for answering natural language questions from a corpus of Amazon Web Services (AWS) technical documents with no domain-specific labeled data (zero-shot). These questions can have yes-no-none answers, short answers, long answers, or any combination of the above. This solution comprises a two-step architecture in which a retriever finds the right document and an extractor finds the answers in the retrieved document. We are introducing a new test dataset for open-book QA based on real customer questions on AWS technical documentation. After experimenting with several information retrieval systems and extractor models based on extractive language models, the solution attempts to find the yes-no-none answers and text answers in the same pass. The model is trained on the The Stanford Question Answering Dataset - SQuAD (Rajpurkaret al., 2016) and Natural Questions (Kwiatkowski et al., 2019) datasets. We were able to achieve 49% F1 and 39% exact match score (EM) end-to-end with no domain-specific training. △ Less

Submitted 22 November, 2021; originally announced November 2021.

arXiv:2105.09809 [pdf, other]

Quantitative Physical Ergonomics Assessment of Teleoperation Interfaces

Authors: Soheil Gholami, Marta Lorenzini, Elena De Momi, Arash Ajoudani

Abstract: Human factors and ergonomics are the essential constituents of teleoperation interfaces, which can significantly affect the human operator's performance. Thus, a quantitative evaluation of these elements and the ability to establish reliable comparison bases for different teleoperation interfaces are the keys to select the most suitable one for a particular application. However, most of the works… ▽ More Human factors and ergonomics are the essential constituents of teleoperation interfaces, which can significantly affect the human operator's performance. Thus, a quantitative evaluation of these elements and the ability to establish reliable comparison bases for different teleoperation interfaces are the keys to select the most suitable one for a particular application. However, most of the works on teleoperation have so far focused on the stability analysis and the transparency improvement of these systems, and do not cover the important usability aspects. In this work, we propose a foundation to build a general framework for the analysis of human factors and ergonomics in employing diverse teleoperation interfaces. The proposed framework will go beyond the traditional subjective analyses of usability by complementing it with online measurements of the human body configurations. As a result, multiple quantitative metrics such as joints' usage, range of motion comfort, center of mass divergence, and posture comfort are introduced. To demonstrate the potential of the proposed framework, two different teleoperation interfaces are considered, and real-world experiments with eleven participants performing a simulated industrial remote pick-and-place task are conducted. The quantitative results of this analysis are provided, and compared with subjective questionnaires, illustrating the effectiveness of the proposed framework. △ Less

Submitted 20 May, 2021; originally announced May 2021.

Comments: 10 pages, 9 figures, submitted to IEEE Transactions on Human-Machine Systems

arXiv:2104.11757 [pdf, ps, other]

Becoming Good at AI for Good

Authors: Meghana Kshirsagar, Caleb Robinson, Siyu Yang, Shahrzad Gholami, Ivan Klyuzhin, Sumit Mukherjee, Md Nasir, Anthony Ortiz, Felipe Oviedo, Darren Tanner, Anusua Trivedi, Yixi Xu, Ming Zhong, Bistra Dilkina, Rahul Dodhia, Juan M. Lavista Ferres

Abstract: AI for good (AI4G) projects involve develo** and applying artificial intelligence (AI) based solutions to further goals in areas such as sustainability, health, humanitarian aid, and social justice. Develo** and deploying such solutions must be done in collaboration with partners who are experts in the domain in question and who already have experience in making progress towards such goals. Ba… ▽ More AI for good (AI4G) projects involve develo** and applying artificial intelligence (AI) based solutions to further goals in areas such as sustainability, health, humanitarian aid, and social justice. Develo** and deploying such solutions must be done in collaboration with partners who are experts in the domain in question and who already have experience in making progress towards such goals. Based on our experiences, we detail the different aspects of this type of collaboration broken down into four high-level categories: communication, data, modeling, and impact, and distill eleven takeaways to guide such projects in the future. We briefly describe two case studies to illustrate how some of these takeaways were applied in practice during our past collaborations. △ Less

Submitted 3 May, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

Comments: Accepted to AIES-2021

arXiv:1903.06669 [pdf, other]

Stay Ahead of Poachers: Illegal Wildlife Poaching Prediction and Patrol Planning Under Uncertainty with Field Test Evaluations

Authors: Lily Xu, Shahrzad Gholami, Sara Mc Carthy, Bistra Dilkina, Andrew Plumptre, Milind Tambe, Rohit Singh, Mustapha Nsubuga, Joshua Mabonga, Margaret Driciru, Fred Wanyama, Aggrey Rwetsiba, Tom Okello, Eric Enyel

Abstract: Illegal wildlife poaching threatens ecosystems and drives endangered species toward extinction. However, efforts for wildlife protection are constrained by the limited resources of law enforcement agencies. To help combat poaching, the Protection Assistant for Wildlife Security (PAWS) is a machine learning pipeline that has been developed as a data-driven approach to identify areas at high risk of… ▽ More Illegal wildlife poaching threatens ecosystems and drives endangered species toward extinction. However, efforts for wildlife protection are constrained by the limited resources of law enforcement agencies. To help combat poaching, the Protection Assistant for Wildlife Security (PAWS) is a machine learning pipeline that has been developed as a data-driven approach to identify areas at high risk of poaching throughout protected areas and compute optimal patrol routes. In this paper, we take an end-to-end approach to the data-to-deployment pipeline for anti-poaching. In doing so, we address challenges including extreme class imbalance (up to 1:200), bias, and uncertainty in wildlife poaching data to enhance PAWS, and we apply our methodology to three national parks with diverse characteristics. (i) We use Gaussian processes to quantify predictive uncertainty, which we exploit to improve robustness of our prescribed patrols and increase detection of snares by an average of 30%. We evaluate our approach on real-world historical poaching data from Murchison Falls and Queen Elizabeth National Parks in Uganda and, for the first time, Srepok Wildlife Sanctuary in Cambodia. (ii) We present the results of large-scale field tests conducted in Murchison Falls and Srepok Wildlife Sanctuary which confirm that the predictive power of PAWS extends promisingly to multiple parks. This paper is part of an effort to expand PAWS to 800 parks around the world through integration with SMART conservation software. △ Less

Submitted 5 November, 2019; v1 submitted 8 March, 2019; originally announced March 2019.

Comments: 12 pages, 11 figures. Short paper published in ICDE 2020

arXiv:1312.6157 [pdf, other]

Distinction between features extracted using deep belief networks

Authors: Mohammad Pezeshki, Sajjad Gholami, Ahmad Nickabadi

Abstract: Data representation is an important pre-processing step in many machine learning algorithms. There are a number of methods used for this task such as Deep Belief Networks (DBNs) and Discrete Fourier Transforms (DFTs). Since some of the features extracted using automated feature extraction methods may not always be related to a specific machine learning task, in this paper we propose two methods in… ▽ More Data representation is an important pre-processing step in many machine learning algorithms. There are a number of methods used for this task such as Deep Belief Networks (DBNs) and Discrete Fourier Transforms (DFTs). Since some of the features extracted using automated feature extraction methods may not always be related to a specific machine learning task, in this paper we propose two methods in order to make a distinction between extracted features based on their relevancy to the task. We applied these two methods to a Deep Belief Network trained for a face recognition task. △ Less

Submitted 2 January, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

Comments: 4 pages, 4 figures, ICLR 2014 workshop track

Showing 1–13 of 13 results for author: Gholami, S