Search | arXiv e-print repository

AdaEmbed: Semi-supervised Domain Adaptation in the Embedding Space

Authors: Ali Mottaghi, Mohammad Abdullah Jamal, Serena Yeung, Omid Mohareri

Abstract: Semi-supervised domain adaptation (SSDA) presents a critical hurdle in computer vision, especially given the frequent scarcity of labeled data in real-world settings. This scarcity often causes foundation models, trained on extensive datasets, to underperform when applied to new domains. AdaEmbed, our newly proposed methodology for SSDA, offers a promising solution to these challenges. Leveraging… ▽ More Semi-supervised domain adaptation (SSDA) presents a critical hurdle in computer vision, especially given the frequent scarcity of labeled data in real-world settings. This scarcity often causes foundation models, trained on extensive datasets, to underperform when applied to new domains. AdaEmbed, our newly proposed methodology for SSDA, offers a promising solution to these challenges. Leveraging the potential of unlabeled data, AdaEmbed facilitates the transfer of knowledge from a labeled source domain to an unlabeled target domain by learning a shared embedding space. By generating accurate and uniform pseudo-labels based on the established embedding space, the model overcomes the limitations of conventional SSDA, thus enhancing performance significantly. Our method's effectiveness is validated through extensive experiments on benchmark datasets such as DomainNet, Office-Home, and VisDA-C, where AdaEmbed consistently outperforms all the baselines, setting a new state of the art for SSDA. With its straightforward implementation and high data efficiency, AdaEmbed stands out as a robust and pragmatic solution for real-world scenarios, where labeled data is scarce. To foster further research and application in this area, we are sharing the codebase of our unified framework for semi-supervised domain adaptation. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2207.03083 [pdf, other]

Adaptation of Surgical Activity Recognition Models Across Operating Rooms

Authors: Ali Mottaghi, Aidean Sharghi, Serena Yeung, Omid Mohareri

Abstract: Automatic surgical activity recognition enables more intelligent surgical devices and a more efficient workflow. Integration of such technology in new operating rooms has the potential to improve care delivery to patients and decrease costs. Recent works have achieved a promising performance on surgical activity recognition; however, the lack of generalizability of these models is one of the criti… ▽ More Automatic surgical activity recognition enables more intelligent surgical devices and a more efficient workflow. Integration of such technology in new operating rooms has the potential to improve care delivery to patients and decrease costs. Recent works have achieved a promising performance on surgical activity recognition; however, the lack of generalizability of these models is one of the critical barriers to the wide-scale adoption of this technology. In this work, we study the generalizability of surgical activity recognition models across operating rooms. We propose a new domain adaptation method to improve the performance of the surgical activity recognition model in a new operating room for which we only have unlabeled videos. Our approach generates pseudo labels for unlabeled video clips that it is confident about and trains the model on the augmented version of the clips. We extend our method to a semi-supervised domain adaptation setting where a small portion of the target domain is also labeled. In our experiments, our proposed method consistently outperforms the baselines on a dataset of more than 480 long surgical videos collected from two operating rooms. △ Less

Submitted 7 July, 2022; originally announced July 2022.

Comments: MICCAI 2022

arXiv:2205.02805 [pdf, other]

An Empirical Study on Activity Recognition in Long Surgical Videos

Authors: Zhuohong He, Ali Mottaghi, Aidean Sharghi, Muhammad Abdullah Jamal, Omid Mohareri

Abstract: Activity recognition in surgical videos is a key research area for develo** next-generation devices and workflow monitoring systems. Since surgeries are long processes with highly-variable lengths, deep learning models used for surgical videos often consist of a two-stage setup using a backbone and temporal sequence model. In this paper, we investigate many state-of-the-art backbones and tempora… ▽ More Activity recognition in surgical videos is a key research area for develo** next-generation devices and workflow monitoring systems. Since surgeries are long processes with highly-variable lengths, deep learning models used for surgical videos often consist of a two-stage setup using a backbone and temporal sequence model. In this paper, we investigate many state-of-the-art backbones and temporal models to find architectures that yield the strongest performance for surgical activity recognition. We first benchmark the models performance on a large-scale activity recognition dataset containing over 800 surgery videos captured in multiple clinical operating rooms. We further evaluate the models on the two smaller public datasets, the Cholec80 and Cataract-101 datasets, containing only 80 and 101 videos respectively. We empirically found that Swin-Transformer+BiGRU temporal model yielded strong performance on both datasets. Finally, we investigate the adaptability of the model to new domains by fine-tuning models to a new hospital and experimenting with a recent unsupervised domain adaptation approach. △ Less

Submitted 6 September, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

Comments: 9 pages, excluding references

arXiv:2011.06874 [pdf, other]

Medical symptom recognition from patient text: An active learning approach for long-tailed multilabel distributions

Authors: Ali Mottaghi, Prathusha K Sarma, Xavier Amatriain, Serena Yeung, Anitha Kannan

Abstract: We study the problem of medical symptoms recognition from patient text, for the purposes of gathering pertinent information from the patient (known as history-taking). A typical patient text is often descriptive of the symptoms the patient is experiencing and a single instance of such a text can be "labeled" with multiple symptoms. This makes learning a medical symptoms recognizer challenging on a… ▽ More We study the problem of medical symptoms recognition from patient text, for the purposes of gathering pertinent information from the patient (known as history-taking). A typical patient text is often descriptive of the symptoms the patient is experiencing and a single instance of such a text can be "labeled" with multiple symptoms. This makes learning a medical symptoms recognizer challenging on account of i) the lack of availability of voluminous annotated data as well as ii) the large unknown universe of multiple symptoms that a single text can map to. Furthermore, patient text is often characterized by a long tail in the data (i.e., some labels/symptoms occur more frequently than others for e.g "fever" vs "hematochezia"). In this paper, we introduce an active learning method that leverages underlying structure of a continually refined, learned latent space to select the most informative examples to label. This enables the selection of the most informative examples that progressively increases the coverage on the universe of symptoms via the learned model, despite the long tail in data distribution. △ Less

Submitted 28 March, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

arXiv:1912.09720 [pdf, other]

Adversarial Representation Active Learning

Authors: Ali Mottaghi, Serena Yeung

Abstract: Active learning aims to develop label-efficient algorithms by querying the most informative samples to be labeled by an oracle. The design of efficient training methods that require fewer labels is an important research direction that allows more effective use of computational and human resources for labeling and training deep neural networks. In this work, we demonstrate how we can use recent adv… ▽ More Active learning aims to develop label-efficient algorithms by querying the most informative samples to be labeled by an oracle. The design of efficient training methods that require fewer labels is an important research direction that allows more effective use of computational and human resources for labeling and training deep neural networks. In this work, we demonstrate how we can use recent advances in deep generative models, to outperform the state-of-the-art in achieving the highest classification accuracy using as few labels as possible. Unlike previous approaches, our approach uses not only labeled images to train the classifier but also unlabeled images and generated images for co-training the whole model. Our experiments show that the proposed method significantly outperforms existing approaches in active learning on a wide range of datasets (MNIST, CIFAR-10, SVHN, CelebA, and ImageNet). △ Less

Submitted 20 December, 2019; originally announced December 2019.

arXiv:1704.02216 [pdf]

OBTAIN: Real-Time Beat Tracking in Audio Signals

Authors: Ali Mottaghi, Kayhan Behdin, Ashkan Esmaeili, Mohammadreza Heydari, Farokh Marvasti

Abstract: In this paper, we design a system in order to perform the real-time beat tracking for an audio signal. We use Onset Strength Signal (OSS) to detect the onsets and estimate the tempos. Then, we form Cumulative Beat Strength Signal (CBSS) by taking advantage of OSS and estimated tempos. Next, we perform peak detection by extracting the periodic sequence of beats among all CBSS peaks. In simulations,… ▽ More In this paper, we design a system in order to perform the real-time beat tracking for an audio signal. We use Onset Strength Signal (OSS) to detect the onsets and estimate the tempos. Then, we form Cumulative Beat Strength Signal (CBSS) by taking advantage of OSS and estimated tempos. Next, we perform peak detection by extracting the periodic sequence of beats among all CBSS peaks. In simulations, we can see that our proposed algorithm, Online Beat TrAckINg (OBTAIN), outperforms state-of-art results in terms of prediction accuracy while maintaining comparable and practical computational complexity. The real-time performance is tractable visually as illustrated in the simulations. △ Less

Submitted 27 October, 2017; v1 submitted 7 April, 2017; originally announced April 2017.

Showing 1–6 of 6 results for author: Mottaghi, A