-
ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors
Authors:
Xiaoxuan Ma,
Stephan P. Kaufhold,
Jiajun Su,
Wentao Zhu,
Jack Terwilliger,
Andres Meza,
Yixin Zhu,
Federico Rossano,
Yizhou Wang
Abstract:
Understanding the behavior of non-human primates is crucial for improving animal welfare, modeling social behavior, and gaining insights into distinctively human and phylogenetically shared behaviors. However, the lack of datasets on non-human primate behavior hinders in-depth exploration of primate social interactions, posing challenges to research on our closest living relatives. To address thes…
▽ More
Understanding the behavior of non-human primates is crucial for improving animal welfare, modeling social behavior, and gaining insights into distinctively human and phylogenetically shared behaviors. However, the lack of datasets on non-human primate behavior hinders in-depth exploration of primate social interactions, posing challenges to research on our closest living relatives. To address these limitations, we present ChimpACT, a comprehensive dataset for quantifying the longitudinal behavior and social relations of chimpanzees within a social group. Spanning from 2015 to 2018, ChimpACT features videos of a group of over 20 chimpanzees residing at the Leipzig Zoo, Germany, with a particular focus on documenting the developmental trajectory of one young male, Azibo. ChimpACT is both comprehensive and challenging, consisting of 163 videos with a cumulative 160,500 frames, each richly annotated with detection, identification, pose estimation, and fine-grained spatiotemporal behavior labels. We benchmark representative methods of three tracks on ChimpACT: (i) tracking and identification, (ii) pose estimation, and (iii) spatiotemporal action detection of the chimpanzees. Our experiments reveal that ChimpACT offers ample opportunities for both devising new methods and adapting existing ones to solve fundamental computer vision tasks applied to chimpanzee groups, such as detection, pose estimation, and behavior analysis, ultimately deepening our comprehension of communication and sociality in non-human primates.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild
Authors:
Li Ding,
Jack Terwilliger,
Aishni Parab,
Meng Wang,
Lex Fridman,
Bruce Mehler,
Bryan Reimer
Abstract:
Non-intrusive, real-time analysis of the dynamics of the eye region allows us to monitor humans' visual attention allocation and estimate their mental state during the performance of real-world tasks, which can potentially benefit a wide range of human-computer interaction (HCI) applications. While commercial eye-tracking devices have been frequently employed, the difficulty of customizing these d…
▽ More
Non-intrusive, real-time analysis of the dynamics of the eye region allows us to monitor humans' visual attention allocation and estimate their mental state during the performance of real-world tasks, which can potentially benefit a wide range of human-computer interaction (HCI) applications. While commercial eye-tracking devices have been frequently employed, the difficulty of customizing these devices places unnecessary constraints on the exploration of more efficient, end-to-end models of eye dynamics. In this work, we propose CLERA, a unified model for Cognitive Load and Eye Region Analysis, which achieves precise keypoint detection and spatiotemporal tracking in a joint-learning framework. Our method demonstrates significant efficiency and outperforms prior work on tasks including cognitive load estimation, eye landmark detection, and blink estimation. We also introduce a large-scale dataset of 30k human faces with joint pupil, eye-openness, and landmark annotation, which aims to support future HCI research on human factors and eye-related analysis.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Dynamics of Pedestrian Crossing Decisions Based on Vehicle Trajectories in Large-Scale Simulated and Real-World Data
Authors:
Jack Terwilliger,
Michael Glazer,
Henri Schmidt,
Josh Domeyer,
Heishiro Toyoda,
Bruce Mehler,
Bryan Reimer,
Lex Fridman
Abstract:
Humans, as both pedestrians and drivers, generally skillfully navigate traffic intersections. Despite the uncertainty, danger, and the non-verbal nature of communication commonly found in these interactions, there are surprisingly few collisions considering the total number of interactions. As the role of automation technology in vehicles grows, it becomes increasingly critical to understand the r…
▽ More
Humans, as both pedestrians and drivers, generally skillfully navigate traffic intersections. Despite the uncertainty, danger, and the non-verbal nature of communication commonly found in these interactions, there are surprisingly few collisions considering the total number of interactions. As the role of automation technology in vehicles grows, it becomes increasingly critical to understand the relationship between pedestrian and driver behavior: how pedestrians perceive the actions of a vehicle/driver and how pedestrians make crossing decisions. The relationship between time-to-arrival (TTA) and pedestrian gap acceptance (i.e., whether a pedestrian chooses to cross under a given window of time to cross) has been extensively investigated. However, the dynamic nature of vehicle trajectories in the context of non-verbal communication has not been systematically explored. Our work provides evidence that trajectory dynamics, such as changes in TTA, can be powerful signals in the non-verbal communication between drivers and pedestrians. Moreover, we investigate these effects in both simulated and real-world datasets, both larger than have previously been considered in literature to the best of our knowledge.
△ Less
Submitted 8 April, 2019;
originally announced April 2019.
-
Eye Contact Between Pedestrians and Drivers
Authors:
Dina AlAdawy,
Michael Glazer,
Jack Terwilliger,
Henri Schmidt,
Josh Domeyer,
Bruce Mehler,
Bryan Reimer,
Lex Fridman
Abstract:
When asked, a majority of people believe that, as pedestrians, they make eye contact with the driver of an approaching vehicle when making their crossing decisions. This work presents evidence that this widely held belief is false. We do so by showing that, in majority of cases where conflict is possible, pedestrians begin crossing long before they are able to see the driver through the windshield…
▽ More
When asked, a majority of people believe that, as pedestrians, they make eye contact with the driver of an approaching vehicle when making their crossing decisions. This work presents evidence that this widely held belief is false. We do so by showing that, in majority of cases where conflict is possible, pedestrians begin crossing long before they are able to see the driver through the windshield. In other words, we are able to circumvent the very difficult question of whether pedestrians choose to make eye contact with drivers, by showing that whether they think they do or not, they can't. Specifically, we show that over 90\% of people in representative lighting conditions cannot determine the gaze of the driver at 15m and see the driver at all at 30m. This means that, for example, that given the common city speed limit of 25mph, more than 99% of pedestrians would have begun crossing before being able to see either the driver or the driver's gaze. In other words, from the perspective of the pedestrian, in most situations involving an approaching vehicle, the crossing decision is made by the pedestrian solely based on the kinematics of the vehicle without needing to determine that eye contact was made by explicitly detecting the eyes of the driver.
△ Less
Submitted 8 April, 2019;
originally announced April 2019.
-
Hacking Nonverbal Communication Between Pedestrians and Vehicles in Virtual Reality
Authors:
Henri Schmidt,
Jack Terwilliger,
Dina AlAdawy,
Lex Fridman
Abstract:
We use an immersive virtual reality environment to explore the intricate social cues that underlie non-verbal communication involved in a pedestrian's crossing decision. We "hack" non-verbal communication between pedestrian and vehicle by engineering a set of 15 vehicle trajectories, some of which follow social conventions and some that break them. By subverting social expectations of vehicle beha…
▽ More
We use an immersive virtual reality environment to explore the intricate social cues that underlie non-verbal communication involved in a pedestrian's crossing decision. We "hack" non-verbal communication between pedestrian and vehicle by engineering a set of 15 vehicle trajectories, some of which follow social conventions and some that break them. By subverting social expectations of vehicle behavior we show that pedestrians may use vehicle kinematics to infer social intentions and not merely as the state of a moving object. We investigate human behavior in this virtual world by conducting a study of 22 subjects, with each subject experiencing and responding to each of the trajectories by moving their body, legs, arms, and head in both the physical and the virtual world. Both quantitative and qualitative responses are collected and analyzed, showing that, in fact, social cues can be engineered through vehicle trajectory manipulation. In addition, we demonstrate that immersive virtual worlds which allow the pedestrian to move around freely, provide a powerful way to understand both the mechanisms of human perception and the social signaling involved in pedestrian-vehicle interaction.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.
-
Value of Temporal Dynamics Information in Driving Scene Segmentation
Authors:
Li Ding,
Jack Terwilliger,
Rini Sherony,
Bryan Reimer,
Lex Fridman
Abstract:
Semantic scene segmentation has primarily been addressed by forming representations of single images both with supervised and unsupervised methods. The problem of semantic segmentation in dynamic scenes has begun to recently receive attention with video object segmentation approaches. What is not known is how much extra information the temporal dynamics of the visual scene carries that is complime…
▽ More
Semantic scene segmentation has primarily been addressed by forming representations of single images both with supervised and unsupervised methods. The problem of semantic segmentation in dynamic scenes has begun to recently receive attention with video object segmentation approaches. What is not known is how much extra information the temporal dynamics of the visual scene carries that is complimentary to the information available in the individual frames of the video. There is evidence that the human visual system can effectively perceive the scene from temporal dynamics information of the scene's changing visual characteristics without relying on the visual characteristics of individual snapshots themselves. Our work takes steps to explore whether machine perception can exhibit similar properties by combining appearance-based representations and temporal dynamics representations in a joint-learning problem that reveals the contribution of each toward successful dynamic scene segmentation. Additionally, we provide the MIT Driving Scene Segmentation dataset, which is a large-scale full driving scene segmentation dataset, densely annotated for every pixel and every one of 5,000 video frames. This dataset is intended to help further the exploration of the value of temporal dynamics information for semantic segmentation in video.
△ Less
Submitted 20 March, 2019;
originally announced April 2019.
-
DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation
Authors:
Lex Fridman,
Jack Terwilliger,
Benedikt Jenik
Abstract:
We present a traffic simulation named DeepTraffic where the planning systems for a subset of the vehicles are handled by a neural network as part of a model-free, off-policy reinforcement learning process. The primary goal of DeepTraffic is to make the hands-on study of deep reinforcement learning accessible to thousands of students, educators, and researchers in order to inspire and fuel the expl…
▽ More
We present a traffic simulation named DeepTraffic where the planning systems for a subset of the vehicles are handled by a neural network as part of a model-free, off-policy reinforcement learning process. The primary goal of DeepTraffic is to make the hands-on study of deep reinforcement learning accessible to thousands of students, educators, and researchers in order to inspire and fuel the exploration and evaluation of deep Q-learning network variants and hyperparameter configurations through large-scale, open competition. This paper investigates the crowd-sourced hyperparameter tuning of the policy network that resulted from the first iteration of the DeepTraffic competition where thousands of participants actively searched through the hyperparameter space.
△ Less
Submitted 2 January, 2019; v1 submitted 9 January, 2018;
originally announced January 2018.
-
MIT Advanced Vehicle Technology Study: Large-Scale Naturalistic Driving Study of Driver Behavior and Interaction with Automation
Authors:
Lex Fridman,
Daniel E. Brown,
Michael Glazer,
William Angell,
Spencer Dodd,
Benedikt Jenik,
Jack Terwilliger,
Aleksandr Patsekin,
Julia Kindelsberger,
Li Ding,
Sean Seaman,
Alea Mehler,
Andrew Sipperley,
Anthony Pettinato,
Bobbie Seppelt,
Linda Angell,
Bruce Mehler,
Bryan Reimer
Abstract:
For the foreseeble future, human beings will likely remain an integral part of the driving task, monitoring the AI system as it performs anywhere from just over 0% to just under 100% of the driving. The governing objectives of the MIT Autonomous Vehicle Technology (MIT-AVT) study are to (1) undertake large-scale real-world driving data collection that includes high-definition video to fuel the dev…
▽ More
For the foreseeble future, human beings will likely remain an integral part of the driving task, monitoring the AI system as it performs anywhere from just over 0% to just under 100% of the driving. The governing objectives of the MIT Autonomous Vehicle Technology (MIT-AVT) study are to (1) undertake large-scale real-world driving data collection that includes high-definition video to fuel the development of deep learning based internal and external perception systems, (2) gain a holistic understanding of how human beings interact with vehicle automation technology by integrating video data with vehicle state data, driver characteristics, mental models, and self-reported experiences with technology, and (3) identify how technology and other factors related to automation adoption and use can be improved in ways that save lives. In pursuing these objectives, we have instrumented 23 Tesla Model S and Model X vehicles, 2 Volvo S90 vehicles, 2 Range Rover Evoque, and 2 Cadillac CT6 vehicles for both long-term (over a year per driver) and medium term (one month per driver) naturalistic driving data collection. Furthermore, we are continually develo** new methods for analysis of the massive-scale dataset collected from the instrumented vehicle fleet. The recorded data streams include IMU, GPS, CAN messages, and high-definition video streams of the driver face, the driver cabin, the forward roadway, and the instrument cluster (on select vehicles). The study is on-going and growing. To date, we have 122 participants, 15,610 days of participation, 511,638 miles, and 7.1 billion video frames. This paper presents the design of the study, the data collection hardware, the processing of the data, and the computer vision algorithms currently being used to extract actionable knowledge from the data.
△ Less
Submitted 14 August, 2019; v1 submitted 19 November, 2017;
originally announced November 2017.