Aim of the PhD Project:
- Robust, real-time detection and tracking of surgical tools
- Learning from combined small-scale annotated and large-scale but non-annotated datasets of robotic surgery video footages
- Advancing the state of the art in combining self-supervision, week-supervision, and semi-supervision for surgical vision tasks
- Designing and validating stereo-vision based learning paradigms
Project Description / Background:
This project seeks to advance the state of the art in AI-based surgical tool detection and tracking by designing novel semi-supervised and weakly-supervised approaches able to achieve robust and real-time performance.
Automatic detection and tracking of surgical tools from laparoscopic surgery videos is bound to play a key role in enabling surgery with autonomous features and for providing advanced surgical assistance to the clinical team. Being able to know how many and where the instruments are has a wealth of applications such as: placing informative overlays on the screen; performing augmented reality without occluding instruments; surgical workflow analysis; visual servoing; surgical task automation; etc.
When the tools and vision system (endoscope) are robotically manipulated, one could expect the kinematic information of the robot to provide accurate tool positioning information. In practice, the large number of joints, cables and other sources of slack and hysteresis in the robots make the translation of kinematic information into calibrated positioning data mostly unrepeatable and prone to error. An alternative is to use the endoscope itself, which is already present, as a sensor. One of the first methods devised to help in the surgical tool detection process from videos was the placement of fiducial markers on the instruments. If the markers are remarkably different from observed tissue, the detection task would likely be solved through very simple processing. Yet, adding fiducials on surgical instruments has been strongly rejected by the medical device manufacturers as it presents diverse and important disadvantages (e.g. sterilisation, positioning, occlusion from blood, etc.)
Training deep convolutional networks from manually annotated datasets of instrument detection and tracking in surgical scenes is a promising approach for this task. Yet, producing manual annotations of surgical videos is tedious, time-consuming and non-scalable given the evolutive nature of surgical technology and the expertise required for the annotation task. No large datasets comparable to industry standards in the computer vision community are available for these tasks. The lack of large-scale annotated datasets will remain a rate-limiting factor to achieve the robustness and accuracy required to use surgical detection and tracking in patient critical task such as autonomous surgery. As in the autonomous driving research field, automating the creation of synthetic realistic training datasets by exploiting advanced simulation may help improve the performance of deep learning approaches but a gap is likely to remain if only small amounts of real data are being exploited in addition to simulations.
This project aims at exploiting weakly-supervised, self-supervised and semi-supervised learning to exploit large amounts of non-annotated real datasets combined with small-scale manually annotated datasets, thereby keeping the required annotation effort low. Although each of these approaches have been developed in the literature and some of them have been evaluated on surgical videos, the challenging open research questions addressed in this proposal relate to optimally combining these in a methodologically sound and scalable framework.
This project is open to candidates with a strong background in machine learning or computer vision, a solid mathematical understanding and a keen interest in advancing autonomous surgery.
Figure 1: Representative sample images of robotic surgery (left) and state-of-the-art instrument segmentation results (right). True positive (white), true negative (black), false positive (magenta), and false negative (green).