Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations

Zhao, Tianxiang; Yu, Wenchao; Wang, Suhang; Wang, Lu; Zhang, Xiang; Chen, Yuncong; Liu, Yanchi; Cheng, Wei; Chen, Haifeng

doi:10.1145/3580305.3599506

Computer Science > Machine Learning

arXiv:2306.07919 (cs)

[Submitted on 13 Jun 2023]

Title:Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations

Authors:Tianxiang Zhao, Wenchao Yu, Suhang Wang, Lu Wang, Xiang Zhang, Yuncong Chen, Yanchi Liu, Wei Cheng, Haifeng Chen

View PDF

Abstract:Imitation learning has achieved great success in many sequential decision-making tasks, in which a neural agent is learned by imitating collected human demonstrations. However, existing algorithms typically require a large number of high-quality demonstrations that are difficult and expensive to collect. Usually, a trade-off needs to be made between demonstration quality and quantity in practice. Targeting this problem, in this work we consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set. Some pioneering works have been proposed, but they suffer from many limitations, e.g., assuming a demonstration to be of the same optimality throughout time steps and failing to provide any interpretation w.r.t knowledge learned from the noisy set. Addressing these problems, we propose {\method} by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills. Concretely, {\method} consists of a high-level controller to discover skills and a skill-conditioned module to capture action-taking policies, and is trained following a two-phase pipeline by first discovering skills with all demonstrations and then adapting the controller to only the clean set. A mutual-information-based regularization and a dynamic sub-demonstration optimality estimator are designed to promote disentanglement in the skill space. Extensive experiments are conducted over two gym environments and a real-world healthcare dataset to demonstrate the superiority of {\method} in learning from sub-optimal demonstrations and its improved interpretability by examining learned skills.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2306.07919 [cs.LG]
	(or arXiv:2306.07919v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.07919
Journal reference:	Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '23), August 6--10, 2023, Long Beach, CA, USA
Related DOI:	https://doi.org/10.1145/3580305.3599506

Submission history

From: Tianxiang Zhao [view email]
[v1] Tue, 13 Jun 2023 17:24:37 UTC (3,534 KB)

Computer Science > Machine Learning

Title:Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators