-
Blackbox Adaptation for Medical Image Segmentation
Authors:
Jay N. Paranjape,
Shameema Sikder,
S. Swaroop Vedula,
Vishal M. Patel
Abstract:
In recent years, various large foundation models have been proposed for image segmentation. There models are often trained on large amounts of data corresponding to general computer vision tasks. Hence, these models do not perform well on medical data. There have been some attempts in the literature to perform parameter-efficient finetuning of such foundation models for medical image segmentation.…
▽ More
In recent years, various large foundation models have been proposed for image segmentation. There models are often trained on large amounts of data corresponding to general computer vision tasks. Hence, these models do not perform well on medical data. There have been some attempts in the literature to perform parameter-efficient finetuning of such foundation models for medical image segmentation. However, these approaches assume that all the parameters of the model are available for adaptation. But, in many cases, these models are released as APIs or blackboxes, with no or limited access to the model parameters and data. In addition, finetuning methods also require a significant amount of compute, which may not be available for the downstream task. At the same time, medical data can't be shared with third-party agents for finetuning due to privacy reasons. To tackle these challenges, we pioneer a blackbox adaptation technique for prompted medical image segmentation, called BAPS. BAPS has two components - (i) An Image-Prompt decoder (IP decoder) module that generates visual prompts given an image and a prompt, and (ii) A Zero Order Optimization (ZOO) Method, called SPSA-GC that is used to update the IP decoder without the need for backpropagating through the foundation model. Thus, our method does not require any knowledge about the foundation model's weights or gradients. We test BAPS on four different modalities and show that our method can improve the original model's performance by around 4%.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Constraining the clustering and 21-cm signature of radio galaxies at cosmic dawn
Authors:
Sudipta Sikder,
Rennan Barkana,
Anastasia Fialkov
Abstract:
The efficiency of radio emission is an important unknown parameter of early galaxies at cosmic dawn, as models with high efficiency have been shown to modify the cosmological 21-cm signal substantially, deepening the absorption trough and boosting the 21-cm power spectrum. Such models have been previously directly constrained by the overall extragalactic radio background as observed by ARCADE-2 an…
▽ More
The efficiency of radio emission is an important unknown parameter of early galaxies at cosmic dawn, as models with high efficiency have been shown to modify the cosmological 21-cm signal substantially, deepening the absorption trough and boosting the 21-cm power spectrum. Such models have been previously directly constrained by the overall extragalactic radio background as observed by ARCADE-2 and LWA-1. In this work, we constrain the clustering of high redshift radio sources by utilizing the observed upper limits on arcminute-scale anisotropy from the VLA at 4.9~GHz and ATCA at 8.7~GHz. Using a semi-numerical simulation of a plausible astrophysical model for illustration, we show that the clustering constraints on the radio efficiency are much stronger than those from the overall background intensity, by a factor that varies from 12 at redshift 7 to 30 at redshift 22. As a result, the predicted maximum depth of the global 21-cm signal is lowered by a factor of 5 (to 1700~mK), and the maximum 21-cm power spectrum peak at cosmic dawn is lowered by a factor of 24 (to $2\times 10^5$~mK$^2$). We conclude that the observed clustering is the strongest current direct constraint on such models, but strong early radio emission from galaxies remains viable for producing a strongly enhanced 21-cm signal from cosmic dawn.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Constraining the properties of Population III galaxies with multi-wavelength observations
Authors:
S. Pochinda,
T. Gessey-Jones,
H. T. J. Bevins,
A. Fialkov,
S. Heimersheim,
I. Abril-Cabezas,
E. de Lera Acedo,
S. Singh,
S. Sikder,
R. Barkana
Abstract:
The early Universe, spanning 400,000 to 400 million years after the Big Bang ($z\approx1100-11$), has been left largely unexplored as the light from luminous objects is too faint to be observed directly. While new experiments are pushing the redshift limit of direct observations, measurements in the low-frequency radio band promise to probe early star and black hole formation via observations of t…
▽ More
The early Universe, spanning 400,000 to 400 million years after the Big Bang ($z\approx1100-11$), has been left largely unexplored as the light from luminous objects is too faint to be observed directly. While new experiments are pushing the redshift limit of direct observations, measurements in the low-frequency radio band promise to probe early star and black hole formation via observations of the hydrogen 21-cm line. In this work we explore synergies between 21-cm data from the HERA and SARAS 3 experiments and observations of the unresolved radio and X-ray backgrounds using multi-wavelength Bayesian analysis. We use the combined data set to constrain properties of Population II and Population III stars as well as early X-ray and radio sources. The joint fit reveals a 68 percentile disfavouring of Population III star formation efficiencies $\gtrsim5.7\%$. We also show how the 21-cm and the X-ray background data synergistically constrain opposite ends of the X-ray efficiency prior distribution to produce a peak in the 1D posterior of the X-ray luminosity per star formation rate. We find (at 68\% confidence) that early galaxies were likely 0.3 to 318 times as X-ray efficient as present-day starburst galaxies. We also show that the functional posteriors from our joint fit rule out global 21-cm signals deeper than $\lesssim-203\ \mathrm{mK}$ and power spectrum amplitudes at $k=0.34\ h\mathrm{Mpc^{-1}}$ greater than $Δ_{21}^2 \gtrsim 946\ \mathrm{mK}^2$ with $3σ$ confidence.
△ Less
Submitted 1 May, 2024; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach
Authors:
Nazmus Sakib Ahmed,
Saad Sakib Noor,
Ashraful Islam Shanto Sikder,
Abhijit Paul
Abstract:
This paper focuses on enhancing Bengali Document Layout Analysis (DLA) using the YOLOv8 model and innovative post-processing techniques. We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness. After meticulous validation set evaluation, we fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate e…
▽ More
This paper focuses on enhancing Bengali Document Layout Analysis (DLA) using the YOLOv8 model and innovative post-processing techniques. We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness. After meticulous validation set evaluation, we fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation. Our ensemble model, combined with post-processing, outperforms individual base architectures, addressing issues identified in the BaDLAD dataset. By leveraging this approach, we aim to advance Bengali document analysis, contributing to improved OCR and document comprehension and BaDLAD serves as a foundational resource for this endeavor, aiding future research in the field. Furthermore, our experiments provided key insights to incorporate new strategies into the established solution.
△ Less
Submitted 29 April, 2024; v1 submitted 2 September, 2023;
originally announced September 2023.
-
AMDNet23: A combined deep Contour-based Convolutional Neural Network and Long Short Term Memory system to diagnose Age-related Macular Degeneration
Authors:
Md. Aiyub Ali,
Md. Shakhawat Hossain,
Md. Kawar Hossain,
Subhadra Soumi Sikder,
Sharun Akter Khushbu,
Mirajul Islam
Abstract:
In light of the expanding population, an automated framework of disease detection can assist doctors in the diagnosis of ocular diseases, yields accurate, stable, rapid outcomes, and improves the success rate of early detection. The work initially intended the enhancing the quality of fundus images by employing an adaptive contrast enhancement algorithm (CLAHE) and Gamma correction. In the preproc…
▽ More
In light of the expanding population, an automated framework of disease detection can assist doctors in the diagnosis of ocular diseases, yields accurate, stable, rapid outcomes, and improves the success rate of early detection. The work initially intended the enhancing the quality of fundus images by employing an adaptive contrast enhancement algorithm (CLAHE) and Gamma correction. In the preprocessing techniques, CLAHE elevates the local contrast of the fundus image and gamma correction increases the intensity of relevant features. This study operates on a AMDNet23 system of deep learning that combined the neural networks made up of convolutions (CNN) and short-term and long-term memory (LSTM) to automatically detect aged macular degeneration (AMD) disease from fundus ophthalmology. In this mechanism, CNN is utilized for extracting features and LSTM is utilized to detect the extracted features. The dataset of this research is collected from multiple sources and afterward applied quality assessment techniques, 2000 experimental fundus images encompass four distinct classes equitably. The proposed hybrid deep AMDNet23 model demonstrates to detection of AMD ocular disease and the experimental result achieved an accuracy 96.50%, specificity 99.32%, sensitivity 96.5%, and F1-score 96.49.0%. The system achieves state-of-the-art findings on fundus imagery datasets to diagnose AMD ocular disease and findings effectively potential of our method.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Cross-Dataset Adaptation for Instrument Classification in Cataract Surgery Videos
Authors:
Jay N. Paranjape,
Shameema Sikder,
Vishal M. Patel,
S. Swaroop Vedula
Abstract:
Surgical tool presence detection is an important part of the intra-operative and post-operative analysis of a surgery. State-of-the-art models, which perform this task well on a particular dataset, however, perform poorly when tested on another dataset. This occurs due to a significant domain shift between the datasets resulting from the use of different tools, sensors, data resolution etc. In thi…
▽ More
Surgical tool presence detection is an important part of the intra-operative and post-operative analysis of a surgery. State-of-the-art models, which perform this task well on a particular dataset, however, perform poorly when tested on another dataset. This occurs due to a significant domain shift between the datasets resulting from the use of different tools, sensors, data resolution etc. In this paper, we highlight this domain shift in the commonly performed cataract surgery and propose a novel end-to-end Unsupervised Domain Adaptation (UDA) method called the Barlow Adaptor that addresses the problem of distribution shift without requiring any labels from another domain. In addition, we introduce a novel loss called the Barlow Feature Alignment Loss (BFAL) which aligns features across different domains while reducing redundancy and the need for higher batch sizes, thus improving cross-dataset performance. The use of BFAL is a novel approach to address the challenge of domain shift in cataract surgery data. Extensive experiments are conducted on two cataract surgery datasets and it is shown that the proposed method outperforms the state-of-the-art UDA methods by 6%. The code can be found at https://github.com/JayParanjape/Barlow-Adaptor
△ Less
Submitted 31 July, 2023;
originally announced August 2023.
-
AdaptiveSAM: Towards Efficient Tuning of SAM for Surgical Scene Segmentation
Authors:
Jay N. Paranjape,
Nithin Gopalakrishnan Nair,
Shameema Sikder,
S. Swaroop Vedula,
Vishal M. Patel
Abstract:
Segmentation is a fundamental problem in surgical scene analysis using artificial intelligence. However, the inherent data scarcity in this domain makes it challenging to adapt traditional segmentation techniques for this task. To tackle this issue, current research employs pretrained models and finetunes them on the given data. Even so, these require training deep networks with millions of parame…
▽ More
Segmentation is a fundamental problem in surgical scene analysis using artificial intelligence. However, the inherent data scarcity in this domain makes it challenging to adapt traditional segmentation techniques for this task. To tackle this issue, current research employs pretrained models and finetunes them on the given data. Even so, these require training deep networks with millions of parameters every time new data becomes available. A recently published foundation model, Segment-Anything (SAM), generalizes well to a large variety of natural images, hence tackling this challenge to a reasonable extent. However, SAM does not generalize well to the medical domain as is without utilizing a large amount of compute resources for fine-tuning and using task-specific prompts. Moreover, these prompts are in the form of bounding-boxes or foreground/background points that need to be annotated explicitly for every image, making this solution increasingly tedious with higher data size. In this work, we propose AdaptiveSAM - an adaptive modification of SAM that can adjust to new datasets quickly and efficiently, while enabling text-prompted segmentation. For finetuning AdaptiveSAM, we propose an approach called bias-tuning that requires a significantly smaller number of trainable parameters than SAM (less than 2\%). At the same time, AdaptiveSAM requires negligible expert intervention since it uses free-form text as prompt and can segment the object of interest with just the label name as prompt. Our experiments show that AdaptiveSAM outperforms current state-of-the-art methods on various medical imaging datasets including surgery, ultrasound and X-ray. Code is available at https://github.com/JayParanjape/biastuning
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
GLSFormer: Gated - Long, Short Sequence Transformer for Step Recognition in Surgical Videos
Authors:
Nisarg A. Shah,
Shameema Sikder,
S. Swaroop Vedula,
Vishal M. Patel
Abstract:
Automated surgical step recognition is an important task that can significantly improve patient safety and decision-making during surgeries. Existing state-of-the-art methods for surgical step recognition either rely on separate, multi-stage modeling of spatial and temporal information or operate on short-range temporal resolution when learned jointly. However, the benefits of joint modeling of sp…
▽ More
Automated surgical step recognition is an important task that can significantly improve patient safety and decision-making during surgeries. Existing state-of-the-art methods for surgical step recognition either rely on separate, multi-stage modeling of spatial and temporal information or operate on short-range temporal resolution when learned jointly. However, the benefits of joint modeling of spatio-temporal features and long-range information are not taken in account. In this paper, we propose a vision transformer-based approach to jointly learn spatio-temporal features directly from sequence of frame-level patches. Our method incorporates a gated-temporal attention mechanism that intelligently combines short-term and long-term spatio-temporal feature representations. We extensively evaluate our approach on two cataract surgery video datasets, namely Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods. These results validate the suitability of our proposed approach for automated surgical step recognition. Our code is released at: https://github.com/nisargshah1999/GLSFormer
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Strong 21-cm fluctuations and anisotropy due to the line-of-sight effect of radio galaxies at cosmic dawn
Authors:
Sudipta Sikder,
Rennan Barkana,
Anastasia Fialkov,
Itamar Reis
Abstract:
The reported detection of the global 21-cm signal by the EDGES collaboration is significantly stronger than standard astrophysical predictions. One possible explanation is an early radio excess above the cosmic microwave background. Such a radio background could have been produced by high redshift galaxies, if they were especially efficient in producing low-frequency synchrotron radiation. We have…
▽ More
The reported detection of the global 21-cm signal by the EDGES collaboration is significantly stronger than standard astrophysical predictions. One possible explanation is an early radio excess above the cosmic microwave background. Such a radio background could have been produced by high redshift galaxies, if they were especially efficient in producing low-frequency synchrotron radiation. We have previously studied the effects of such an inhomogeneous radio background on the 21-cm signal; however, we made a simplifying assumption of isotropy of the background seen by each hydrogen cloud. Here we perform a complete calculation that accounts for the fact that the 21-cm absorption occurs along the line of sight, and is therefore sensitive to radio sources lying behind each absorbing cloud. We find that the complete calculation strongly enhances the 21-cm power spectrum during cosmic dawn, by up to two orders of magnitude; on the other hand, the effect on the global 21-cm signal is only at the $5\%$ level. In addition to making the high-redshift 21-cm fluctuations potentially more easily observable, the line of sight radio effect induces a new anisotropy in the 21-cm power spectrum. While these effects are particularly large for the case of an extremely-enhanced radio efficiency, they make it more feasible to detect even a moderately-enhanced radio efficiency in early galaxies. This is especially relevant since the EDGES signal has been contested by the SARAS experiment.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Video-based assessment of intraoperative surgical skill
Authors:
Sanchit Hira,
Digvijay Singh,
Tae Soo Kim,
Shobhit Gupta,
Gregory Hager,
Shameema Sikder,
S. Swaroop Vedula
Abstract:
Purpose: The objective of this investigation is to provide a comprehensive analysis of state-of-the-art methods for video-based assessment of surgical skill in the operating room. Methods: Using a data set of 99 videos of capsulorhexis, a critical step in cataract surgery, we evaluate feature based methods previously developed for surgical skill assessment mostly under benchtop settings. In additi…
▽ More
Purpose: The objective of this investigation is to provide a comprehensive analysis of state-of-the-art methods for video-based assessment of surgical skill in the operating room. Methods: Using a data set of 99 videos of capsulorhexis, a critical step in cataract surgery, we evaluate feature based methods previously developed for surgical skill assessment mostly under benchtop settings. In addition, we present and validate two deep learning methods that directly assess skill using RGB videos. In the first method, we predict instrument tips as keypoints, and learn surgical skill using temporal convolutional neural networks. In the second method, we propose a novel architecture for surgical skill assessment that includes a frame-wise encoder (2D convolutional neural network) followed by a temporal model (recurrent neural network), both of which are augmented by visual attention mechanisms. We report the area under the receiver operating characteristic curve, sensitivity, specificity, and predictive values with each method through 5-fold cross-validation. Results: For the task of binary skill classification (expert vs. novice), deep neural network based methods exhibit higher AUC than the classical spatiotemporal interest point based methods. The neural network approach using attention mechanisms also showed high sensitivity and specificity. Conclusion: Deep learning methods are necessary for video-based assessment of surgical skill in the operating room. Our findings of internal validity of a network using attention mechanisms to assess skill directly using RGB videos should be evaluated for external validity in other data sets.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Emulation of the Cosmic Dawn 21-cm Power Spectrum and Classification of Excess Radio Models Using an Artificial Neural Network
Authors:
Sudipta Sikder,
Rennan Barkana,
Itamar Reis,
Anastasia Fialkov
Abstract:
The cosmic 21-cm line of hydrogen is expected to be measured in detail by the next generation of radio telescopes. The enormous dataset from future 21-cm surveys will revolutionize our understanding of early cosmic times. We present a machine learning approach based on an Artificial Neural Network that uses emulation in order to uncover the astrophysics in the epoch of reionization and cosmic dawn…
▽ More
The cosmic 21-cm line of hydrogen is expected to be measured in detail by the next generation of radio telescopes. The enormous dataset from future 21-cm surveys will revolutionize our understanding of early cosmic times. We present a machine learning approach based on an Artificial Neural Network that uses emulation in order to uncover the astrophysics in the epoch of reionization and cosmic dawn. Using a seven-parameter astrophysical model that covers a very wide range of possible 21-cm signals, over the redshift range 6 to 30 and wavenumber range $0.05$ to $1 \ \rm{Mpc}^{-1}$ we emulate the 21-cm power spectrum with a typical accuracy of $10 - 20\%$. As a realistic example, we train an emulator using the power spectrum with an optimistic noise model of the Square Kilometre Array (SKA). Fitting to mock SKA data results in a typical measurement accuracy of $2.8\%$ in the optical depth to the cosmic microwave background, $34\%$ in the star-formation efficiency of galactic halos, and a factor of 9.6 in the X-ray efficiency of galactic halos. Also, with our modeling we reconstruct the true 21-cm power spectrum from the mock SKA data with a typical accuracy of $15 - 30\%$. In addition to standard astrophysical models, we consider two exotic possibilities of strong excess radio backgrounds at high redshifts. We use a neural network to identify the type of radio background present in the 21-cm power spectrum, with an accuracy of $87\%$ for mock SKA data.
△ Less
Submitted 10 January, 2024; v1 submitted 20 January, 2022;
originally announced January 2022.
-
HERA Phase I Limits on the Cosmic 21-cm Signal: Constraints on Astrophysics and Cosmology During the Epoch of Reionization
Authors:
The HERA Collaboration,
Zara Abdurashidova,
James E. Aguirre,
Paul Alexander,
Zaki Ali,
Yanga Balfour,
Rennan Barkana,
Adam Beardsley,
Gianni Bernardi,
Tashalee Billings,
Judd Bowman,
Richard Bradley,
Phillip Bull,
Jacob Burba,
Steven Carey,
Christopher Carilli,
Carina Cheng,
David DeBoer,
Matthew Dexter,
Eloy de Lera Acedo,
Joshua Dillon,
John Ely,
Aaron Ewall-Wice,
Nicolas Fagnoni,
Anastasia Fialkov
, et al. (59 additional authors not shown)
Abstract:
Recently, the Hydrogen Epoch of Reionization Array (HERA) collaboration has produced the experiment's first upper limits on the power spectrum of 21-cm fluctuations at z~8 and 10. Here, we use several independent theoretical models to infer constraints on the intergalactic medium (IGM) and galaxies during the epoch of reionization (EoR) from these limits. We find that the IGM must have been heated…
▽ More
Recently, the Hydrogen Epoch of Reionization Array (HERA) collaboration has produced the experiment's first upper limits on the power spectrum of 21-cm fluctuations at z~8 and 10. Here, we use several independent theoretical models to infer constraints on the intergalactic medium (IGM) and galaxies during the epoch of reionization (EoR) from these limits. We find that the IGM must have been heated above the adiabatic cooling threshold by z~8, independent of uncertainties about the IGM ionization state and the nature of the radio background. Combining HERA limits with galaxy and EoR observations constrains the spin temperature of the z~8 neutral IGM to 27 K < T_S < 630 K (2.3 K < T_S < 640 K) at 68% (95%) confidence. They therefore also place a lower bound on X-ray heating, a previously unconstrained aspects of early galaxies. For example, if the CMB dominates the z~8 radio background, the new HERA limits imply that the first galaxies produced X-rays more efficiently than local ones (with soft band X-ray luminosities per star formation rate constrained to L_X/SFR = { 10^40.2, 10^41.9 } erg/s/(M_sun/yr) at 68% confidence), consistent with expectations of X-ray binaries in low-metallicity environments. The z~10 limits require even earlier heating if dark-matter interactions (e.g., through millicharges) cool down the hydrogen gas. Using a model in which an extra radio background is produced by galaxies, we rule out (at 95% confidence) the combination of high radio and low X-ray luminosities of L_{r,ν}/SFR > 3.9 x 10^24 W/Hz/(M_sun/yr) and L_X/SFR<10^40 erg/s/(M_sun/yr). The new HERA upper limits neither support nor disfavor a cosmological interpretation of the recent EDGES detection. The analysis framework described here provides a foundation for the interpretation of future HERA results.
△ Less
Submitted 20 December, 2022; v1 submitted 16 August, 2021;
originally announced August 2021.