Search | arXiv e-print repository

Trust in Shared Automated Vehicles: Study on Two Mobility Platforms

Authors: Shashank Mehrotra, Jacob G Hunter, Matthew Konishi, Kumar Akash, Zhaobo Zheng, Teruhisa Misu, Anil Kumar, Tahira Reid, Neera Jain

Abstract: The ever-increasing adoption of shared transportation modalities across the United States has the potential to fundamentally change the preferences and usage of different mobilities. It also raises several challenges with respect to the design and development of automated mobilities that can enable a large population to take advantage of this emergent technology. One such challenge is the lack of… ▽ More The ever-increasing adoption of shared transportation modalities across the United States has the potential to fundamentally change the preferences and usage of different mobilities. It also raises several challenges with respect to the design and development of automated mobilities that can enable a large population to take advantage of this emergent technology. One such challenge is the lack of understanding of how trust in one automated mobility may impact trust in another. Without this understanding, it is difficult for researchers to determine whether future mobility solutions will have acceptance within different population groups. This study focuses on identifying the differences in trust across different mobility and how trust evolves across their use for participants who preferred an aggressive driving style. A dual mobility simulator study was designed in which 48 participants experienced two different automated mobilities (car and sidewalk). The results found that participants showed increasing levels of trust when they transitioned from the car to the sidewalk mobility. In comparison, participants showed decreasing levels of trust when they transitioned from the sidewalk to the car mobility. The findings from the study help inform and identify how people can develop trust in future mobility platforms and could inform the design of interventions that may help improve the trust and acceptance of future mobility. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: https://trid.trb.org/view/2117834

Journal ref: Transportation Research Board 102nd Annual Meeting, Washington DC, United States, 1-12 Jan 2023, No. TRBAM-23-04456. 2023

arXiv:2211.12112 [pdf, other]

Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark

Authors: Vitali Petsiuk, Alexander E. Siemenn, Saisamrit Surbehera, Zad Chin, Keith Tyser, Gregory Hunter, Arvind Raghavan, Yann Hicke, Bryan A. Plummer, Ori Kerret, Tonio Buonassisi, Kate Saenko, Armando Solar-Lezama, Iddo Drori

Abstract: We provide a new multi-task benchmark for evaluating text-to-image models. We perform a human evaluation comparing the most common open-source (Stable Diffusion) and commercial (DALL-E 2) models. Twenty computer science AI graduate students evaluated the two models, on three tasks, at three difficulty levels, across ten prompts each, providing 3,600 ratings. Text-to-image generation has seen rapid… ▽ More We provide a new multi-task benchmark for evaluating text-to-image models. We perform a human evaluation comparing the most common open-source (Stable Diffusion) and commercial (DALL-E 2) models. Twenty computer science AI graduate students evaluated the two models, on three tasks, at three difficulty levels, across ten prompts each, providing 3,600 ratings. Text-to-image generation has seen rapid progress to the point that many recent models have demonstrated their ability to create realistic high-resolution images for various prompts. However, current text-to-image methods and the broader body of research in vision-language understanding still struggle with intricate text prompts that contain many objects with multiple attributes and relationships. We introduce a new text-to-image benchmark that contains a suite of thirty-two tasks over multiple applications that capture a model's ability to handle different features of a text prompt. For example, asking a model to generate a varying number of the same object to measure its ability to count or providing a text prompt with several objects that each have a different attribute to identify its ability to match objects and attributes correctly. Rather than subjectively evaluating text-to-image results on a set of prompts, our new multi-task benchmark consists of challenge tasks at three difficulty levels (easy, medium, and hard) and human ratings for each generated image. △ Less

Submitted 22 November, 2022; originally announced November 2022.

Comments: NeurIPS 2022 Workshop on Human Evaluation of Generative Models (HEGM)

arXiv:2209.10640 [pdf]

doi 10.1177/1071181322661311

The Interaction Gap: A Step Toward Understanding Trust in Autonomous Vehicles Between Encounters

Authors: Jacob G. Hunter, Matthew Konishi, Neera Jain, Kumar Akash, Xingwei Wu, Teruhisa Misu, Tahira Reid

Abstract: Shared autonomous vehicles (SAVs) will be introduced in greater numbers over the coming decade. Due to rapid advances in shared mobility and the slower development of fully autonomous vehicles (AVs), SAVs will likely be deployed before privately-owned AVs. Moreover, existing shared mobility services are transitioning their vehicle fleets toward those with increasingly higher levels of driving auto… ▽ More Shared autonomous vehicles (SAVs) will be introduced in greater numbers over the coming decade. Due to rapid advances in shared mobility and the slower development of fully autonomous vehicles (AVs), SAVs will likely be deployed before privately-owned AVs. Moreover, existing shared mobility services are transitioning their vehicle fleets toward those with increasingly higher levels of driving automation. Consequently, people who use shared vehicles on an "as needed" basis will have infrequent interactions with automated driving, thereby experiencing interaction gaps. Using human trust data of 25 participants, we show that interaction gaps can affect human trust in automated driving. Participants engaged in a simulator study consisting of two interactions separated by a one-week interaction gap. A moderate, inverse correlation was found between the change in trust during the initial interaction and the interaction gap, suggesting people "forget" some of their gained trust or distrust in automation during an interaction gap. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: 5 pages, 3 figures

Journal ref: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 2022, 66(1), 147-151

arXiv:2206.05442 [pdf, ps, other]

From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams

Authors: Iddo Drori, Sarah J. Zhang, Reece Shuttleworth, Sarah Zhang, Keith Tyser, Zad Chin, Pedro Lantigua, Saisamrit Surbehera, Gregory Hunter, Derek Austin, Leonard Tang, Yann Hicke, Sage Simhon, Sathwik Karnik, Darnell Granberry, Madeleine Udell

Abstract: A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level, on finals available online after the models were trained, and automatically generate new human-quality final exam questions in seconds. Previous work has de… ▽ More A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level, on finals available online after the models were trained, and automatically generate new human-quality final exam questions in seconds. Previous work has developed program synthesis and few-shot learning methods to solve university-level problem set questions in mathematics and STEM courses. In this work, we develop and compare methods that solve final exams, which differ from problem sets in several ways: the questions are longer, have multiple parts, are more complicated, and span a broader set of topics. We curate a dataset and benchmark of questions from machine learning final exams available online and code for answering these questions and generating new questions. We show how to generate new questions from other questions and course notes. For reproducibility and future research on this final exam benchmark, we use automatic checkers for multiple-choice, numeric, and questions with expression answers. We perform ablation studies comparing zero-shot learning with few-shot learning and chain-of-thought prompting using GPT-3, OPT, Codex, and ChatGPT across machine learning topics and find that few-shot learning methods perform best. We highlight the transformative potential of language models to streamline the writing and solution of large-scale assessments, significantly reducing the workload from human days to mere machine seconds. Our results suggest that rather than banning large language models such as ChatGPT in class, instructors should teach students to harness them by asking students meta-questions about correctness, completeness, and originality of the responses generated, encouraging critical thinking in academic studies. △ Less

Submitted 28 June, 2023; v1 submitted 11 June, 2022; originally announced June 2022.

Comments: 9 pages

arXiv:2205.13430 [pdf, other]

GNOLL: Efficient Software for Real-World Dice Notation and Extensions

Authors: Ian Frederick Vigogne Goodbody Hunter

Abstract: GNOLL ("GNOLL's Not *OLL") is a software library for dice notation. Unlike previous papers, GNOLL's dice notation syntax is focused on parsing a language that tabletop role-players and board gamers are already used to for specifying dice rolls in many popular software applications. Existing implementations of such a syntax are either incomplete, fragile, or proprietary, meaning that anyone ho**… ▽ More GNOLL ("GNOLL's Not *OLL") is a software library for dice notation. Unlike previous papers, GNOLL's dice notation syntax is focused on parsing a language that tabletop role-players and board gamers are already used to for specifying dice rolls in many popular software applications. Existing implementations of such a syntax are either incomplete, fragile, or proprietary, meaning that anyone ho** to use such syntax in their application likely needs to write their own solution. GNOLL is an open-source project using the compilation tool 'YACC' and lexical tool 'LEX' which can be integrated into many applications with relative ease. This paper explores GNOLL's extended dice notation syntax and its competitive performance. △ Less

Submitted 4 July, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: 11 pages, 12 figures, Under Review for JCDCG^3 '22

arXiv:2205.04586 [pdf, other]

Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to Infer Hardware Performances

Authors: Ian Frederick Vigogne Goodbody Hunter, Alessandro Palla, Sebastian Eusebiu Nagy, Richard Richmond, Kyle McAdoo

Abstract: Calculating the most efficient schedule of work in a neural network compiler is a difficult task. There are many parameters to be accounted for that can positively or adversely affect that schedule depending on their configuration - How work is shared between distributed targets, the subdivision of tensors to fit in memory, toggling the enablement of optimizations, etc. Traditionally, neural netwo… ▽ More Calculating the most efficient schedule of work in a neural network compiler is a difficult task. There are many parameters to be accounted for that can positively or adversely affect that schedule depending on their configuration - How work is shared between distributed targets, the subdivision of tensors to fit in memory, toggling the enablement of optimizations, etc. Traditionally, neural network compilers determine how to set these values by building a graph of choices and choosing the path with minimal 'cost'. These choices and their corresponding costs are usually determined by an algorithm crafted by engineers with a deep knowledge of the target platform. However, when the amount of options available to a compiler is large, it is very difficult to ensure that these models consistently produce an optimal schedule for all scenarios, whilst still completing compilation in an acceptable timeframe. This paper presents 'VPUNN' - a neural network-based cost model trained on low-level task profiling that consistently outperforms the state-of-the-art cost modeling in Intel's line of VPU processors. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: 9 pages, 10 figures, 2 tables, Under Review for NeurIPS 2022

Showing 1–6 of 6 results for author: Hunter, G