-
Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation
Authors:
Se-eun Yoon,
Zhankui He,
Jessica Maria Echterhoff,
Julian McAuley
Abstract:
Synthetic users are cost-effective proxies for real users in the evaluation of conversational recommender systems. Large language models show promise in simulating human-like behavior, raising the question of their ability to represent a diverse population of users. We introduce a new protocol to measure the degree to which language models can accurately emulate human behavior in conversational re…
▽ More
Synthetic users are cost-effective proxies for real users in the evaluation of conversational recommender systems. Large language models show promise in simulating human-like behavior, raising the question of their ability to represent a diverse population of users. We introduce a new protocol to measure the degree to which language models can accurately emulate human behavior in conversational recommendation. This protocol is comprised of five tasks, each designed to evaluate a key property that a synthetic user should exhibit: choosing which items to talk about, expressing binary preferences, expressing open-ended preferences, requesting recommendations, and giving feedback. Through evaluation of baseline simulators, we demonstrate these tasks effectively reveal deviations of language models from human behavior, and offer insights on how to reduce the deviations with model selection and prompting strategies.
△ Less
Submitted 25 March, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Cognitive Bias in High-Stakes Decision-Making with LLMs
Authors:
Jessica Echterhoff,
Yao Liu,
Abeer Alessa,
Julian McAuley,
Zexue He
Abstract:
Large language models (LLMs) offer significant potential as tools to support an expanding range of decision-making tasks. However, given their training on human (created) data, LLMs can inherit both societal biases against protected groups, as well as be subject to cognitive bias. Such human-like bias can impede fair and explainable decisions made with LLM assistance. Our work introduces BiasBuste…
▽ More
Large language models (LLMs) offer significant potential as tools to support an expanding range of decision-making tasks. However, given their training on human (created) data, LLMs can inherit both societal biases against protected groups, as well as be subject to cognitive bias. Such human-like bias can impede fair and explainable decisions made with LLM assistance. Our work introduces BiasBuster, a framework designed to uncover, evaluate, and mitigate cognitive bias in LLMs, particularly in high-stakes decision-making tasks. Inspired by prior research in psychology and cognitive sciences, we develop a dataset containing 16,800 prompts to evaluate different cognitive biases (e.g., prompt-induced, sequential, inherent). We test various bias mitigation strategies, amidst proposing a novel method using LLMs to debias their own prompts. Our analysis provides a comprehensive picture on the presence and effects of cognitive bias across different commercial and open-source models. We demonstrate that our self-help debiasing effectively mitigate cognitive bias without having to manually craft examples for each bias type.
△ Less
Submitted 24 February, 2024;
originally announced March 2024.
-
Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving
Authors:
Jessica Echterhoff,
An Yan,
Kyungtae Han,
Amr Abdelraouf,
Rohit Gupta,
Julian McAuley
Abstract:
Concept bottleneck models have been successfully used for explainable machine learning by encoding information within the model with a set of human-defined concepts. In the context of human-assisted or autonomous driving, explainability models can help user acceptance and understanding of decisions made by the autonomous vehicle, which can be used to rationalize and explain driver or vehicle behav…
▽ More
Concept bottleneck models have been successfully used for explainable machine learning by encoding information within the model with a set of human-defined concepts. In the context of human-assisted or autonomous driving, explainability models can help user acceptance and understanding of decisions made by the autonomous vehicle, which can be used to rationalize and explain driver or vehicle behavior. We propose a new approach using concept bottlenecks as visual features for control command predictions and explanations of user and vehicle behavior. We learn a human-understandable concept layer that we use to explain sequential driving scenes while learning vehicle control commands. This approach can then be used to determine whether a change in a preferred gap or steering commands from a human (or autonomous vehicle) is led by an external stimulus or change in preferences. We achieve competitive performance to latent visual features while gaining interpretability within our model setup.
△ Less
Submitted 26 October, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Should you make your decisions on a WhIM? Data-Driven Decision making using a What-If Machine for Evaluation of Hypothetical Scenarios
Authors:
Jessica Maria Echterhoff,
Bhaskar Sen,
Yifei Ren,
Nikhil Gopal
Abstract:
What-if analysis can be used as a process in data-driven decision making to inspect the behavior of a complex system under some given hypothesis. We propose a What-If Machine that creates hypothetical realities by resampling the data distribution and comparing it to the an alternate baseline to measure the impact on a target metric. Our What-If Machine enables both a method to confirm/reject manua…
▽ More
What-if analysis can be used as a process in data-driven decision making to inspect the behavior of a complex system under some given hypothesis. We propose a What-If Machine that creates hypothetical realities by resampling the data distribution and comparing it to the an alternate baseline to measure the impact on a target metric. Our What-If Machine enables both a method to confirm/reject manually developed intuitions of practitioners as well as give high-impact insights on a target metric automatically. This can support data-informed decision making by using historical data to infer future possibilities. Our method is not bound by a specific use-case and can be used on any tabular data. Compared to previous work, our work enables real-time analysis and gives insights into areas with high impact on the target metric automatically, moving beyond human intuitions to provide data-driven insights.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
SpecTracle: Wearable Facial Motion Tracking from Unobtrusive Peripheral Cameras
Authors:
Yinan Xuan,
Varun Viswanath,
Sunny Chu,
Owen Bartolf,
Jessica Echterhoff,
Edward Wang
Abstract:
Facial motion tracking in head-mounted displays (HMD) has the potential to enable immersive "face-to-face" interaction in a virtual environment. However, current works on facial tracking are not suitable for unobtrusive augmented reality (AR) glasses or do not have the ability to track arbitrary facial movements. In this work, we demonstrate a novel system called SpecTracle that tracks a user's fa…
▽ More
Facial motion tracking in head-mounted displays (HMD) has the potential to enable immersive "face-to-face" interaction in a virtual environment. However, current works on facial tracking are not suitable for unobtrusive augmented reality (AR) glasses or do not have the ability to track arbitrary facial movements. In this work, we demonstrate a novel system called SpecTracle that tracks a user's facial motions using two wide-angle cameras mounted right next to the visor of a Hololens. Avoiding the usage of cameras extended in front of the face, our system greatly improves the feasibility to integrate full-face tracking into a low-profile form factor. We also demonstrate that a neural network-based model processing the wide-angle cameras can run in real-time at 24 frames per second (fps) on a mobile GPU and track independent facial movement for different parts of the face with a user-independent model. Using a short personalized calibration, the system improves its tracking performance by 42.3% compared to the user-independent model.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Comparing Apples to Apples: Generating Aspect-Aware Comparative Sentences from User Reviews
Authors:
Jessica Echterhoff,
An Yan,
Julian McAuley
Abstract:
It is time-consuming to find the best product among many similar alternatives. Comparative sentences can help to contrast one item from others in a way that highlights important features of an item that stand out. Given reviews of one or multiple items and relevant item features, we generate comparative review sentences to aid users to find the best fit. Specifically, our model consists of three s…
▽ More
It is time-consuming to find the best product among many similar alternatives. Comparative sentences can help to contrast one item from others in a way that highlights important features of an item that stand out. Given reviews of one or multiple items and relevant item features, we generate comparative review sentences to aid users to find the best fit. Specifically, our model consists of three successive components in a transformer: (i) an item encoding module to encode an item for comparison, (ii) a comparison generation module that generates comparative sentences in an autoregressive manner, (iii) a novel decoding method for user personalization. We show that our pipeline generates fluent and diverse comparative sentences. We run experiments on the relevance and fidelity of our generated sentences in a human evaluation study and find that our algorithm creates comparative review sentences that are relevant and truthful.
△ Less
Submitted 23 July, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
PAR: Personal Activity Radius Camera View for Contextual Sensing
Authors:
Jessica Maria Echterhoff,
Edward J. Wang
Abstract:
Contextual sensing using wearable cameras has seen a variety of different camera angles proposed to capture a wide gamut of different visual scenes. In this paper, we propose a new camera view that aims to capture the same visual information as many of the camera positions and orientations combined from a single camera view point. The camera, mounted on the corner of a glasses frame is pointing do…
▽ More
Contextual sensing using wearable cameras has seen a variety of different camera angles proposed to capture a wide gamut of different visual scenes. In this paper, we propose a new camera view that aims to capture the same visual information as many of the camera positions and orientations combined from a single camera view point. The camera, mounted on the corner of a glasses frame is pointing downwards towards the floor, a field-of-view we named Personal Activity Radius (PAR). The PAR field-of-view captures the visual information around a wearer's personal bubble, including items they interact with, their body motion, their surrounding environment, etc. In our evaluation, we tested the PAR view's interpretability by human labelers in two different activity tracking scenarios: food related behaviors and exercise tracking. Human labelers achieved an overall high level of precision in identifying body motions in exercise tracking of 91% precision and eating/drinking motions at 96% precision. Item interaction identification reached a precision of 86% precision for labeling grocery categories. We show a high level on the device setup and contextual views we were able to capture with the device. We see that the camera wide angle captures different activities such as driving, shop**, gym exercises, walking and eating and can observe the specific interaction item of the user as well as the immediate contextual surrounding.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.