Search | arXiv e-print repository

SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery

Authors: Lalithkumar Seenivasan, Mobarakol Islam, Gokul Kannan, Hongliang Ren

Abstract: Advances in GPT-based large language models (LLMs) are revolutionizing natural language processing, exponentially increasing its use across various domains. Incorporating uni-directional attention, these autoregressive LLMs can generate long and coherent paragraphs. However, for visual question answering (VQA) tasks that require both vision and language processing, models with bi-directional atten… ▽ More Advances in GPT-based large language models (LLMs) are revolutionizing natural language processing, exponentially increasing its use across various domains. Incorporating uni-directional attention, these autoregressive LLMs can generate long and coherent paragraphs. However, for visual question answering (VQA) tasks that require both vision and language processing, models with bi-directional attention or models employing fusion techniques are often employed to capture the context of multiple modalities all at once. As GPT does not natively process vision tokens, to exploit the advancements in GPT models for VQA in robotic surgery, we design an end-to-end trainable Language-Vision GPT (LV-GPT) model that expands the GPT2 model to include vision input (image). The proposed LV-GPT incorporates a feature extractor (vision tokenizer) and vision token embedding (token type and pose). Given the limitations of unidirectional attention in GPT models and their ability to generate coherent long paragraphs, we carefully sequence the word tokens before vision tokens, mimicking the human thought process of understanding the question to infer an answer from an image. Quantitatively, we prove that the LV-GPT model outperforms other state-of-the-art VQA models on two publically available surgical-VQA datasets (based on endoscopic vision challenge robotic scene segmentation 2018 and CholecTriplet2021) and on our newly annotated dataset (based on the holistic surgical scene dataset). We further annotate all three datasets to include question-type annotations to allow sub-type analysis. Furthermore, we extensively study and present the effects of token sequencing, token type and pose embedding for vision tokens in the LV-GPT model. △ Less

Submitted 22 July, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: The manuscript is accepted in MICCAI 2023. Code are available at: https://github.com/lalithjets/SurgicalGPT

arXiv:2203.02142 [pdf]

Benchmarking tunnel and encryption methodologies in cloud environments

Authors: Pravein Govindan Kannan, Brent Salisbury, Palanivel Kodeswaran, Sayandeep Sen

Abstract: The recent past has seen the adoption of multi-cloud deployments by enterprises due to availability, features, and regulatory requirements. A typical deployment involves parts of an application/workloads running inside a private cloud with the other parts spread across multiple on-prem/public clouds. Typical cluster-to-cluster networking in such deployments involve the establishment of site-to-sit… ▽ More The recent past has seen the adoption of multi-cloud deployments by enterprises due to availability, features, and regulatory requirements. A typical deployment involves parts of an application/workloads running inside a private cloud with the other parts spread across multiple on-prem/public clouds. Typical cluster-to-cluster networking in such deployments involve the establishment of site-to-site encrypted tunnels to connect the workloads. In this report, we benchmark the performance of various tunneling and encryption technologies to provide directions on their use in multi-cloud deployments. Based on the various experiments conducted on three different testbeds, we present quantifiable data which can be leveraged by operators and cloud providers tasked with design and development decisions of multi-cloud network connectivity and orchestration. △ Less

Submitted 4 March, 2022; originally announced March 2022.

arXiv:2011.05250 [pdf]

doi 10.1021/acs.jpca.0c10983

Rydberg States of H$_3$ and HeH as Potential Coolants for Primordial Star Formation

Authors: Gokul Kannan, Jeremy R. Chien, Anthony J. Benjamin, Niranjan Bhatia, Richard J. Saykally

Abstract: Current theory and measurements establish the age of the universe as ca. 13.8 billion years. For the first several hundred million years of its existence, it was a dark, opaque void. After that, the hydrogen atoms comprising most of the "ordinary" matter began to condense and ionize, eventually forming the first stars that would illuminate the sky. Details of how these "primordial" stars formed ha… ▽ More Current theory and measurements establish the age of the universe as ca. 13.8 billion years. For the first several hundred million years of its existence, it was a dark, opaque void. After that, the hydrogen atoms comprising most of the "ordinary" matter began to condense and ionize, eventually forming the first stars that would illuminate the sky. Details of how these "primordial" stars formed have been widely debated, but remain elusive. A central issue in this process is the mechanism by which the primordial gas (mainly hydrogen and helium atoms) collected via the action of dark matter cools and further accretes to fusion densities. Current models invoke collisional excitation of H$_2$ molecular rotations and subsequent radiative rotational transitions allowed by the weak molecular quadrupole moment. In this article, we review the salient considerations, and present some new ideas, bases on recent spectroscopic observations of neutral H$_3$ Rydberg electronic state emission in the mid-infrared. △ Less

Submitted 3 November, 2020; originally announced November 2020.

arXiv:2007.13538 [pdf]

A Novel adaptive optimization of Dual-Tree Complex Wavelet Transform for Medical Image Fusion

Authors: T. Deepika, G. Karpaga Kannan

Abstract: In recent years, many research achievements are made in the medical image fusion field. Fusion is basically extraction of best of inputs and conveying it to the output. Medical Image fusion means that several of various modality image information is comprehended together to form one image to express its information. The aim of image fusion is to integrate complementary and redundant information. I… ▽ More In recent years, many research achievements are made in the medical image fusion field. Fusion is basically extraction of best of inputs and conveying it to the output. Medical Image fusion means that several of various modality image information is comprehended together to form one image to express its information. The aim of image fusion is to integrate complementary and redundant information. In this paper, a multimodal image fusion algorithm based on the dual-tree complex wavelet transform (DT-CWT) and adaptive particle swarm optimization (APSO) is proposed. Fusion is achieved through the formation of a fused pyramid using the DTCWT coefficients from the decomposed pyramids of the source images. The coefficients are fused by the weighted average method based on pixels, and the weights are estimated by the APSO to gain optimal fused images. The fused image is obtained through conventional inverse dual-tree complex wavelet transform reconstruction process. Experiment results show that the proposed method based on adaptive particle swarm optimization algorithm is remarkably better than the method based on particle swarm optimization. The resulting fused images are compared visually and through benchmarks such as Entropy (E), Peak Signal to Noise Ratio, (PSNR), Root Mean Square Error (RMSE), Standard deviation (SD) and Structure Similarity Index Metric (SSIM) computations. △ Less

Submitted 22 July, 2020; originally announced July 2020.

Comments: Conference on Computing Communication and Signal Processing. arXiv admin note: text overlap with arXiv:2007.11488

arXiv:1601.00141 [pdf]

doi 10.1007/s11192-016-1877-6

Detecting the historical roots of tribology research: a bibliometric analysis

Authors: Bakthavachalam Elango, Lutz Bornmann, Govindaraju Kannan

Abstract: In this study, the historical roots of tribology are investigated using a newly developed scientometric method called Referenced Publication Years Spectroscopy. The study is based on cited references in tribology research publications. The Science Citation Index Expanded is used as data source. The results show that RPYS has the potential to identify the important publications : Most of the public… ▽ More In this study, the historical roots of tribology are investigated using a newly developed scientometric method called Referenced Publication Years Spectroscopy. The study is based on cited references in tribology research publications. The Science Citation Index Expanded is used as data source. The results show that RPYS has the potential to identify the important publications : Most of the publications which have been identified in this study as highly cited (referenced) publications are landmark publications in the field of tribology. △ Less

Submitted 1 February, 2016; v1 submitted 2 January, 2016; originally announced January 2016.

Comments: Accepted for publicaion in the Scientometrics

Showing 1–5 of 5 results for author: Kannan, G