Search | arXiv e-print repository

Instrument-tissue Interaction Detection Framework for Surgical Video Understanding

Authors: Wenjun Lin, Yan Hu, Huazhu Fu, Mingming Yang, Chin-Boon Chng, Ryo Kawasaki, Cheekong Chui, Jiang Liu

Abstract: Instrument-tissue interaction detection task, which helps understand surgical activities, is vital for constructing computer-assisted surgery systems but with many challenges. Firstly, most models represent instrument-tissue interaction in a coarse-grained way which only focuses on classification and lacks the ability to automatically detect instruments and tissues. Secondly, existing works do not… ▽ More Instrument-tissue interaction detection task, which helps understand surgical activities, is vital for constructing computer-assisted surgery systems but with many challenges. Firstly, most models represent instrument-tissue interaction in a coarse-grained way which only focuses on classification and lacks the ability to automatically detect instruments and tissues. Secondly, existing works do not fully consider relations between intra- and inter-frame of instruments and tissues. In the paper, we propose to represent instrument-tissue interaction as <instrument class, instrument bounding box, tissue class, tissue bounding box, action class> quintuple and present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding. Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet. We also propose a Spatial Corresponding Attention (SCA) Layer to incorporate features of proposals between adjacent frames through spatial encoding. To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance. For evaluation, we build a cataract surgery video (PhacoQ) dataset and a cholecystectomy surgery video (CholecQ) dataset. Experimental results demonstrate the promising performance of our model, which outperforms other state-of-the-art models on both datasets. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2308.14298 [pdf, other]

Direct initial orbit determination

Authors: Chee-Kheng Chng, Trent Jansen-Sturgeon, Timothy Payne, Tat-Jun Chin

Abstract: Initial orbit determination (IOD) is an important early step in the processing chain that makes sense of and reconciles the multiple optical observations of a resident space object. IOD methods generally operate on line-of-sight (LOS) vectors extracted from images of the object, hence the LOS vectors can be seen as discrete point samples of the raw optical measurements. Typically, the number of LO… ▽ More Initial orbit determination (IOD) is an important early step in the processing chain that makes sense of and reconciles the multiple optical observations of a resident space object. IOD methods generally operate on line-of-sight (LOS) vectors extracted from images of the object, hence the LOS vectors can be seen as discrete point samples of the raw optical measurements. Typically, the number of LOS vectors used by an IOD method is much smaller than the available measurements (\ie, the set of pixel intensity values), hence current IOD methods arguably under-utilize the rich information present in the data. In this paper, we propose a \emph{direct} IOD method called D-IOD that fits the orbital parameters directly on the observed streak images, without requiring LOS extraction. Since it does not utilize LOS vectors, D-IOD avoids potential inaccuracies or errors due to an imperfect LOS extraction step. Two innovations underpin our novel orbit-fitting paradigm: first, we introduce a novel non-linear least-squares objective function that computes the loss between the candidate-orbit-generated streak images and the observed streak images. Second, the objective function is minimized with a gradient descent approach that is embedded in our proposed optimization strategies designed for streak images. We demonstrate the effectiveness of D-IOD on a variety of simulated scenarios and challenging real streak images. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: 28 pages, 17 figures, Submitted to Advances in Space Research

arXiv:2210.00429 [pdf, other]

doi 10.1109/TAES.2023.3279353

ROSIA: Rotation-Search-Based Star Identification Algorithm

Authors: Chee-Kheng Chng, Alvaro Parra Bustos, Benjamin McCarthy, Tat-Jun Chin

Abstract: This paper presents a rotation-search-based approach for addressing the star identification (Star-ID) problem. The proposed algorithm, ROSIA, is a heuristics-free algorithm that seeks the optimal rotation that maximally aligns the input and catalog stars in their respective coordinates. ROSIA searches the rotation space systematically with the Branch-and-Bound (BnB) method. Crucially affecting the… ▽ More This paper presents a rotation-search-based approach for addressing the star identification (Star-ID) problem. The proposed algorithm, ROSIA, is a heuristics-free algorithm that seeks the optimal rotation that maximally aligns the input and catalog stars in their respective coordinates. ROSIA searches the rotation space systematically with the Branch-and-Bound (BnB) method. Crucially affecting the runtime feasibility of ROSIA is the upper bound function that prioritizes the search space. In this paper, we make a theoretical contribution by proposing a tight (provable) upper bound function that enables a 400x speed-up compared to an existing formulation. Coupling the bounding function with an efficient evaluation scheme that leverages stereographic projection and the R-tree data structure, ROSIA achieves feasible operational speed on embedded processors with state-of-the-art performances under different sources of noise. The source code of ROSIA is available at https://github.com/ckchng/ROSIA. △ Less

Submitted 28 August, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

Comments: 21 pages, 16 figures, Accepted to IEEE Transactions on Aerospace and Electronic Systems

arXiv:2103.13111 [pdf, other]

MIcro-Surgical Anastomose Workflow recognition challenge report

Authors: Arnaud Huaulmé, Duygu Sarikaya, Kévin Le Mut, Fabien Despinoy, Yonghao Long, Qi Dou, Chin-Boon Chng, Wenjun Lin, Satoshi Kondo, Laura Bravo-Sánchez, Pablo Arbeláez, Wolfgang Reiter, Manoru Mitsuishi, Kanako Harada, Pierre Jannin

Abstract: The "MIcro-Surgical Anastomose Workflow recognition on training sessions" (MISAW) challenge provided a data set of 27 sequences of micro-surgical anastomosis on artificial blood vessels. This data set was composed of videos, kinematics, and workflow annotations described at three different granularity levels: phase, step, and activity. The participants were given the option to use kinematic data a… ▽ More The "MIcro-Surgical Anastomose Workflow recognition on training sessions" (MISAW) challenge provided a data set of 27 sequences of micro-surgical anastomosis on artificial blood vessels. This data set was composed of videos, kinematics, and workflow annotations described at three different granularity levels: phase, step, and activity. The participants were given the option to use kinematic data and videos to develop workflow recognition models. Four tasks were proposed to the participants: three of them were related to the recognition of surgical workflow at three different granularity levels, while the last one addressed the recognition of all granularity levels in the same model. One ranking was made for each task. We used the average application-dependent balanced accuracy (AD-Accuracy) as the evaluation metric. This takes unbalanced classes into account and it is more clinically relevant than a frame-by-frame score. Six teams, including a non-competing team, participated in at least one task. All models employed deep learning models, such as CNN or RNN. The best models achieved more than 95% AD-Accuracy for phase recognition, 80% for step recognition, 60% for activity recognition, and 75% for all granularity levels. For high levels of granularity (i.e., phases and steps), the best models had a recognition rate that may be sufficient for applications such as prediction of remaining surgical time or resource management. However, for activities, the recognition rate was still low for applications that can be employed clinically. The MISAW data set is publicly available to encourage further research in surgical workflow recognition. It can be found at www.synapse.org/MISAW △ Less

Submitted 24 March, 2021; originally announced March 2021.

Comments: MICCAI2020 challenge report, 36 pages including 15 for supplementary material (complet results for each participating teams), 17 figures

arXiv:2010.01872 [pdf, other]

Monocular Rotational Odometry with Incremental Rotation Averaging and Loop Closure

Authors: Chee-Kheng Chng, Alvaro Parra, Tat-Jun Chin, Yasir Latif

Abstract: Estimating absolute camera orientations is essential for attitude estimation tasks. An established approach is to first carry out visual odometry (VO) or visual SLAM (V-SLAM), and retrieve the camera orientations (3 DOF) from the camera poses (6 DOF) estimated by VO or V-SLAM. One drawback of this approach, besides the redundancy in estimating full 6 DOF camera poses, is the dependency on estimati… ▽ More Estimating absolute camera orientations is essential for attitude estimation tasks. An established approach is to first carry out visual odometry (VO) or visual SLAM (V-SLAM), and retrieve the camera orientations (3 DOF) from the camera poses (6 DOF) estimated by VO or V-SLAM. One drawback of this approach, besides the redundancy in estimating full 6 DOF camera poses, is the dependency on estimating a map (3D scene points) jointly with the 6 DOF poses due to the basic constraint on structure-and-motion. To simplify the task of absolute orientation estimation, we formulate the monocular rotational odometry problem and devise a fast algorithm to accurately estimate camera orientations with 2D-2D feature matches alone. Underpinning our system is a new incremental rotation averaging method for fast and constant time iterative updating. Furthermore, our system maintains a view-graph that 1) allows solving loop closure to remove camera orientation drift, and 2) can be used to warm start a V-SLAM system. We conduct extensive quantitative experiments on real-world datasets to demonstrate the accuracy of our incremental camera orientation solver. Finally, we showcase the benefit of our algorithm to V-SLAM: 1) solving the known rotation problem to estimate the trajectory of the camera and the surrounding map, and 2)enabling V-SLAM systems to track pure rotational motions. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: Accepted to DICTA 2020

arXiv:1909.07741 [pdf, other]

ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling -- RRC-LSVT

Authors: Yipeng Sun, Zihan Ni, Chee-Kheng Chng, Yuliang Liu, Canjie Luo, Chun Chet Ng, Junyu Han, Errui Ding, **gtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen **

Abstract: Robust text reading from street view images provides valuable information for various applications. Performance improvement of existing methods in such a challenging scenario heavily relies on the amount of fully annotated training data, which is costly and in-efficient to obtain. To scale up the amount of training data while kee** the labeling procedure cost-effective, this competition introduc… ▽ More Robust text reading from street view images provides valuable information for various applications. Performance improvement of existing methods in such a challenging scenario heavily relies on the amount of fully annotated training data, which is costly and in-efficient to obtain. To scale up the amount of training data while kee** the labeling procedure cost-effective, this competition introduces a new challenge on Large-scale Street View Text with Partial Labeling (LSVT), providing 50, 000 and 400, 000 images in full and weak annotations, respectively. This competition aims to explore the abilities of state-of-the-art methods to detect and recognize text instances from large-scale street view images, closing the gap between research benchmarks and real applications. During the competition period, a total of 41 teams participated in the two proposed tasks with 132 valid submissions, i.e., text detection and end-to-end text spotting. This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of the ICDAR 2019-LSVT challenge. △ Less

Submitted 17 September, 2019; originally announced September 2019.

Comments: ICDAR 2019 Robust Reading Challenge in IAPR International Conference on Document Analysis and Recognition (ICDAR)

arXiv:1909.07145 [pdf, other]

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)

Authors: Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, ChuanMing Fang, Shuaitao Zhang, Junyu Han, Errui Ding, **gtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen **

Abstract: This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting. A total of 78 submissions from 46 unique teams/individuals were received for this competition. The top performing score of each challenge is as follows: i) T1 - 82.65%, ii) T2.1 - 74.… ▽ More This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting. A total of 78 submissions from 46 unique teams/individuals were received for this competition. The top performing score of each challenge is as follows: i) T1 - 82.65%, ii) T2.1 - 74.3%, iii) T2.2 - 85.32%, iv) T3.1 - 53.86%, and v) T3.2 - 54.91%. Apart from the results, this paper also details the ArT dataset, tasks description, evaluation metrics and participants methods. The dataset, the evaluation kit as well as the results are publicly available at https://rrc.cvc.uab.es/?ch=14 △ Less

Submitted 16 September, 2019; originally announced September 2019.

Comments: Technical report of ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) Competition

arXiv:1710.10400 [pdf, other]

Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition

Authors: Chee Kheng Chng, Chee Seng Chan

Abstract: Text in curve orientation, despite being one of the common text orientations in real world environment, has close to zero existence in well received scene text datasets such as ICDAR2013 and MSRA-TD500. The main motivation of Total-Text is to fill this gap and facilitate a new research direction for the scene text community. On top of the conventional horizontal and multi-oriented texts, it featur… ▽ More Text in curve orientation, despite being one of the common text orientations in real world environment, has close to zero existence in well received scene text datasets such as ICDAR2013 and MSRA-TD500. The main motivation of Total-Text is to fill this gap and facilitate a new research direction for the scene text community. On top of the conventional horizontal and multi-oriented texts, it features curved-oriented text. Total-Text is highly diversified in orientations, more than half of its images have a combination of more than two orientations. Recently, a new breed of solutions that casted text detection as a segmentation problem has demonstrated their effectiveness against multi-oriented text. In order to evaluate its robustness against curved text, we fine-tuned DeconvNet and benchmark it on Total-Text. Total-Text with its annotation is available at https://github.com/cs-chan/Total-Text-Dataset △ Less

Submitted 28 October, 2017; originally announced October 2017.

Comments: Accepted as Oral presentation in ICDAR2017 (Extended version, 13 pages 17 figures). We introduce a new scene text dataset namely as Total-Text, which is more comprehensive than the existing scene text datasets as it consists of 1555 natural images with more than 3 different text orientations, one of a kind

Showing 1–8 of 8 results for author: Chng, C