Skip to main content

Showing 1–3 of 3 results for author: Das, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02522  [pdf

    cs.HC cs.AI cs.CY cs.SI

    New contexts, old heuristics: How young people in India and the US trust online content in the age of generative AI

    Authors: Rachel Xu, Nhu Le, Rebekah Park, Laura Murray, Vishnupriya Das, Devika Kumar, Beth Goldberg

    Abstract: We conducted an in-person ethnography in India and the US to investigate how young people (18-24) trusted online content, with a focus on generative AI (GenAI). We had four key findings about how young people use GenAI and determine what to trust online. First, when online, we found participants fluidly shifted between mindsets and emotional states, which we term "information modes." Second, these… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: 14 pages

  2. arXiv:2312.00267  [pdf, other

    cs.LG cs.AI stat.ML

    Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration

    Authors: Viraj Mehta, Vikramjeet Das, Ojash Neopane, Yijia Dai, Ilija Bogunovic, Jeff Schneider, Willie Neiswanger

    Abstract: Preference-based feedback is important for many applications in reinforcement learning where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback (RLHF) on large language models. For many applications of RLHF, the cost of acquiring the human feedback can be substantial. In this work, we take advantage of the fact that… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  3. arXiv:2307.11288  [pdf, other

    cs.LG cs.AI stat.ML

    Kernelized Offline Contextual Dueling Bandits

    Authors: Viraj Mehta, Ojash Neopane, Vikramjeet Das, Sen Lin, Jeff Schneider, Willie Neiswanger

    Abstract: Preference-based feedback is important for many applications where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback on large language models. For many of these applications, the cost of acquiring the human feedback can be substantial or even prohibitive. In this work, we take advantage of the fact that often the a… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.