Skip to main content

Showing 1–3 of 3 results for author: Justen, L

.
  1. arXiv:2403.03218  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

    Authors: Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer , et al. (32 additional authors not shown)

    Abstract: The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing furthe… ▽ More

    Submitted 15 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: See the project page at https://wmdp.ai

  2. arXiv:2310.18233  [pdf

    cs.AI

    Will releasing the weights of future large language models grant widespread access to pandemic agents?

    Authors: Anjali Gopal, Nathan Helm-Burger, Lennart Justen, Emily H. Soice, Tiffany Tzeng, Geetha Jeyapragasan, Simon Grimm, Benjamin Mueller, Kevin M. Esvelt

    Abstract: Large language models can benefit research and human understanding by providing tutorials that draw on expertise from many different fields. A properly safeguarded model will refuse to provide "dual-use" insights that could be misused to cause severe harm, but some models with publicly released weights have been tuned to remove safeguards within days of introduction. Here we investigated whether c… ▽ More

    Submitted 1 November, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: Updates in response to online feedback: emphasized the focus on risks from future rather than current models; explained the reasoning behind - and minimal effects of - fine-tuning on virology papers; elaborated on how easier access to synthesized information can reduce barriers to entry; clarified policy recommendations regarding what is necessary but not sufficient; corrected a citation link

  3. arXiv:2207.04003  [pdf, other

    cs.CL

    No Time Like the Present: Effects of Language Change on Automated Comment Moderation

    Authors: Lennart Justen, Kilian Müller, Marco Niemann, Jörg Becker

    Abstract: The spread of online hate has become a significant problem for newspapers that host comment sections. As a result, there is growing interest in using machine learning and natural language processing for (semi-) automated abusive language detection to avoid manual comment moderation costs or having to shut down comment sections altogether. However, much of the past work on abusive language detectio… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: Published in proceedings of the 2022 IEEE 24th Conference on Business Informatics (CBI), Amsterdam, Netherlands. 17 pages, 4 figures

    Journal ref: In 2022 IEEE 24th Conference on Business Informatics, 40-50. Amsterdam, Netherlands