Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery

Karimi, Zohre; Ho, Shing-Hei; Thach, Bao; Kuntz, Alan; Brown, Daniel S.

Computer Science > Robotics

arXiv:2404.07185 (cs)

[Submitted on 10 Apr 2024 (v1), last revised 16 Apr 2024 (this version, v2)]

Title:Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery

Authors:Zohre Karimi, Shing-Hei Ho, Bao Thach, Alan Kuntz, Daniel S. Brown

View PDF HTML (experimental)

Abstract:Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This paper introduces a sample-efficient method that learns a robust reward function from a limited amount of ranked suboptimal demonstrations consisting of partial-view point cloud observations. The method then learns a policy by optimizing the learned reward function using reinforcement learning (RL). We show that using a learned reward function to obtain a policy is more robust than pure imitation learning. We apply our approach on a physical surgical electrocautery task and demonstrate that our method can perform well even when the provided demonstrations are suboptimal and the observations are high-dimensional point clouds. Code and videos available here: this https URL

Comments:	In proceedings of the International Symposium on Medical Robotics (ISMR) 2024. Equal contribution from two first authors
Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2404.07185 [cs.RO]
	(or arXiv:2404.07185v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2404.07185

Submission history

From: Zohre Karimi [view email]
[v1] Wed, 10 Apr 2024 17:40:27 UTC (9,632 KB)
[v2] Tue, 16 Apr 2024 00:23:03 UTC (7,629 KB)

Computer Science > Robotics

Title:Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators