Skip to main content

Showing 1–1 of 1 results for author: Lukosuite, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2212.08073  [pdf, other

    cs.CL cs.AI

    Constitutional AI: Harmlessness from AI Feedback

    Authors: Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite , et al. (26 additional authors not shown)

    Abstract: As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supe… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.