Aim of the PhD Project:
- Learning appropriate surgical vision models with multi-task to perform better than learning each task independently
- Learning appropriate surgical vision models with multi-task and limited supervision to perform close to learning with multi-task and adequate supervision
- Advancing the state of the art in combining depth and optical flow estimation, surgical instrument detection and anatomy recognition, as well as surgical action recognition.
- Evaluating and validating endoscopic vision-based learning paradigms
Multi-task learning is common in deep learning: For similar tasks like detection and segmentation, or detection and counting, this has already been achieved given the supervision of one for the other. There exists clear evidence that adding one side task would help the improvement of the main task, yet it is unclear how much benefits both tasks can get in these combinations, especially if they are not strongly correlated. For this reason, multiple tasks are normally processed independently in the current fashion. Another reason lies in the scalability of learning multiple tasks together in terms of both network optimization and practical implementation. To tackle this, careful designs of the conjunction of multiple tasks are needed; novel methodologies of learning paradigms are also expected.
This project is placed in the endoscopic image processing domain. We aim to develop a machine learning model with general visual intelligence capacity in robotic surgery, which includes depth and optical flow estimation, surgical instrument detection and anatomy recognition, as well as surgical action recognition. Depth and optical flow estimation as well as anatomy recognition are key requirements to develop autonomous robotic control schemes that are cognizant of the surgical scene. Automatic detection and tracking of surgical instruments from laparoscopic surgery videos further plays an important role for providing advanced surgical assistance to the clinical team, given the uncertainties associated with surgical robots kinematic chains and the potential presence of tools not directly manipulated by the robot. Being able to know how many and where are the instruments finds its applications such as: placing informative overlays on the screen; performing augmented reality without occluding instruments; visual servoing; surgical task automation; etc. Surgical action recognition is also critical to advance autonomous robotic assistance during the procedure and for automated auditing purposes.