The integration of robotic surgical assistants (RSAs) in operating rooms offers substantial advantages for both surgeons and patient outcomes. Currently operated through teleoperation by trained surgeons at a console, these surgical robot platforms provide augmented dexterity that has the potential to streamline surgical workflows and alleviate surgeon workloads. Exploring visual behavior cloning for next-generation surgical assistants could further enhance the capabilities and efficiency of robotic-assisted surgeries.
This post introduces two template frameworks for robotic surgical assistance: Surgical First Interactive Autonomy Assistant (SuFIA) and Surgical First Interactive Autonomy Assistant – Behavior Cloning (SuFIA-BC). SuFIA uses natural language guidance and large language models (LLMs) for high-level planning and control of surgical robots, while SuFIA-BC enhances the dexterity and precision of robotic surgical assistants through behavior cloning (BC) techniques. These frameworks explore the recent advances in both LLMs and BC techniques and tune them to excel to the unique challenges of surgical scenes.Â
This research aims to accelerate the development of surgical robotic assistants, with the eventual goal of alleviating surgeon fatigue, enhancing patient safety, and democratizing access to high-quality healthcare. SuFIA and SuFIA-BC advance this field by demonstrating their capabilities across various surgical subtasks in simulated and physical settings. Moreover, the photorealistic assets introduced in this work enable the broader research community to explore surgical robotics—a field that has traditionally faced significant barriers to entry due to limited data accessibility, the high costs of expert demonstrations, and the expensive hardware required.
This research enhances the ORBIT-Surgical framework to create a photorealistic training environment for surgical robots, featuring anatomically accurate models and high-fidelity rendering using NVIDIA Omniverse. ORBIT-Surgical is an open-simulation framework for learning surgical augmented dexterity. It’s based on NVIDIA Isaac Lab, a modular framework for robot learning built on NVIDIA Isaac Sim, which provides support for various libraries for reinforcement learning and imitation learning.
Surgical digital twins
Figure 1 shows a surgical digital twin workflow that illustrates the full pipeline for creating photorealistic anatomical models, from raw CT volume data to final Universal Scene Description (OpenUSD) in Omniverse. The process includes organ segmentation, mesh conversion, mesh cleaning and refinement, photorealistic texturing, and culminating in the assembly of all textured organs into a unified OpenUSD file.
The resulting digital twin simulator generates high-quality synthetic data crucial for training and evaluating behavior cloning models in complex surgical tasks. The study investigates various visual observation modalities, including RGB images from single and multicamera setups and point cloud representations derived from single camera depth data.Â
Policy learning and expert demonstrations with teleoperation
The experimental framework includes five fundamental surgical subtasks designed for evaluation: tissue retraction, needle lift, needle handover, suture pad threading, and block transfer. To learn more and view task videos, see SuFIA-BC: Generating High Quality Demonstration Data for Visuomotor Policy Learning in Surgical Subtasks.Â
Results indicate that while simpler tasks yield comparable performance across models, complex tasks reveal significant differences in encoder effectiveness. Point cloud-based models generally excel in spatially defined tasks such as needle lift and needle handover, while RGB-based models perform better where color cues are necessary for semantic understanding.
The number of expert demonstrations were varied to determine the trained models’ sample efficiency. In this experiment, the models demonstrated varying success rates based on the number of training demonstrations, highlighting common failure modes when fewer demonstrations were used. These findings emphasize the importance of architectures with greater sample efficiency and underline the importance of the introduced framework where data collection is significantly more accessible than real-world data. Furthermore, generalization capabilities were assessed using different needle instances, with multicamera RGB models showing better adaptability compared to point cloud-based models.
Robustness to changes in camera perspectives was evaluated, revealing that point cloud models exhibited superior resilience to viewpoint changes compared to RGB-based models, highlighting their potential for practical deployment in surgical settings.
Summary
Explore this groundbreaking technology by accessing the open-source assets linked in this article. Visit ORBIT-Surgical on GitHub to access video demonstrations used for training policies, along with photorealistic human organ models. By leveraging these resources, you can advance surgical robotics research, experiment with different learning approaches, and develop innovative solutions for complex surgical procedures. We encourage the community to build upon this foundation, share insights, and collaborate to enhance robotic-assisted surgery.