Search | arXiv e-print repository

doi 10.1145/3603622

CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild

Authors: Li Ding, Jack Terwilliger, Aishni Parab, Meng Wang, Lex Fridman, Bruce Mehler, Bryan Reimer

Abstract: Non-intrusive, real-time analysis of the dynamics of the eye region allows us to monitor humans' visual attention allocation and estimate their mental state during the performance of real-world tasks, which can potentially benefit a wide range of human-computer interaction (HCI) applications. While commercial eye-tracking devices have been frequently employed, the difficulty of customizing these d… ▽ More Non-intrusive, real-time analysis of the dynamics of the eye region allows us to monitor humans' visual attention allocation and estimate their mental state during the performance of real-world tasks, which can potentially benefit a wide range of human-computer interaction (HCI) applications. While commercial eye-tracking devices have been frequently employed, the difficulty of customizing these devices places unnecessary constraints on the exploration of more efficient, end-to-end models of eye dynamics. In this work, we propose CLERA, a unified model for Cognitive Load and Eye Region Analysis, which achieves precise keypoint detection and spatiotemporal tracking in a joint-learning framework. Our method demonstrates significant efficiency and outperforms prior work on tasks including cognitive load estimation, eye landmark detection, and blink estimation. We also introduce a large-scale dataset of 30k human faces with joint pupil, eye-openness, and landmark annotation, which aims to support future HCI research on human factors and eye-related analysis. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: ACM Transactions on Computer-Human Interaction

arXiv:1907.12929 [pdf, other]

Object as Distribution

Authors: Li Ding, Lex Fridman

Abstract: Object detection is a critical part of visual scene understanding. The representation of the object in the detection task has important implications on the efficiency and feasibility of annotation, robustness to occlusion, pose, lighting, and other visual sources of semantic uncertainty, and effectiveness in real-world applications (e.g., autonomous driving). Popular object representations include… ▽ More Object detection is a critical part of visual scene understanding. The representation of the object in the detection task has important implications on the efficiency and feasibility of annotation, robustness to occlusion, pose, lighting, and other visual sources of semantic uncertainty, and effectiveness in real-world applications (e.g., autonomous driving). Popular object representations include 2D and 3D bounding boxes, polygons, splines, pixels, and voxels. Each have their strengths and weakness. In this work, we propose a new representation of objects based on the bivariate normal distribution. This distribution-based representation has the benefit of robust detection of highly-overlap** objects and the potential for improved downstream tracking and instance segmentation tasks due to the statistical representation of object edges. We provide qualitative evaluation of this representation for the object detection task and quantitative evaluation of its use in a baseline algorithm for the instance segmentation task. △ Less

Submitted 25 July, 2019; originally announced July 2019.

Comments: NeurIPS 2019

arXiv:1904.04202 [pdf, other]

Dynamics of Pedestrian Crossing Decisions Based on Vehicle Trajectories in Large-Scale Simulated and Real-World Data

Authors: Jack Terwilliger, Michael Glazer, Henri Schmidt, Josh Domeyer, Heishiro Toyoda, Bruce Mehler, Bryan Reimer, Lex Fridman

Abstract: Humans, as both pedestrians and drivers, generally skillfully navigate traffic intersections. Despite the uncertainty, danger, and the non-verbal nature of communication commonly found in these interactions, there are surprisingly few collisions considering the total number of interactions. As the role of automation technology in vehicles grows, it becomes increasingly critical to understand the r… ▽ More Humans, as both pedestrians and drivers, generally skillfully navigate traffic intersections. Despite the uncertainty, danger, and the non-verbal nature of communication commonly found in these interactions, there are surprisingly few collisions considering the total number of interactions. As the role of automation technology in vehicles grows, it becomes increasingly critical to understand the relationship between pedestrian and driver behavior: how pedestrians perceive the actions of a vehicle/driver and how pedestrians make crossing decisions. The relationship between time-to-arrival (TTA) and pedestrian gap acceptance (i.e., whether a pedestrian chooses to cross under a given window of time to cross) has been extensively investigated. However, the dynamic nature of vehicle trajectories in the context of non-verbal communication has not been systematically explored. Our work provides evidence that trajectory dynamics, such as changes in TTA, can be powerful signals in the non-verbal communication between drivers and pedestrians. Moreover, we investigate these effects in both simulated and real-world datasets, both larger than have previously been considered in literature to the best of our knowledge. △ Less

Submitted 8 April, 2019; originally announced April 2019.

Comments: Will appear in Proceedings of 2019 Driving Assessment Conference

arXiv:1904.04188 [pdf, other]

Eye Contact Between Pedestrians and Drivers

Authors: Dina AlAdawy, Michael Glazer, Jack Terwilliger, Henri Schmidt, Josh Domeyer, Bruce Mehler, Bryan Reimer, Lex Fridman

Abstract: When asked, a majority of people believe that, as pedestrians, they make eye contact with the driver of an approaching vehicle when making their crossing decisions. This work presents evidence that this widely held belief is false. We do so by showing that, in majority of cases where conflict is possible, pedestrians begin crossing long before they are able to see the driver through the windshield… ▽ More When asked, a majority of people believe that, as pedestrians, they make eye contact with the driver of an approaching vehicle when making their crossing decisions. This work presents evidence that this widely held belief is false. We do so by showing that, in majority of cases where conflict is possible, pedestrians begin crossing long before they are able to see the driver through the windshield. In other words, we are able to circumvent the very difficult question of whether pedestrians choose to make eye contact with drivers, by showing that whether they think they do or not, they can't. Specifically, we show that over 90\% of people in representative lighting conditions cannot determine the gaze of the driver at 15m and see the driver at all at 30m. This means that, for example, that given the common city speed limit of 25mph, more than 99% of pedestrians would have begun crossing before being able to see either the driver or the driver's gaze. In other words, from the perspective of the pedestrian, in most situations involving an approaching vehicle, the crossing decision is made by the pedestrian solely based on the kinematics of the vehicle without needing to determine that eye contact was made by explicitly detecting the eyes of the driver. △ Less

Submitted 8 April, 2019; originally announced April 2019.

Comments: Will appear in Proceedings of 2019 Driving Assessment Conference

arXiv:1904.01931 [pdf, other]

Hacking Nonverbal Communication Between Pedestrians and Vehicles in Virtual Reality

Authors: Henri Schmidt, Jack Terwilliger, Dina AlAdawy, Lex Fridman

Abstract: We use an immersive virtual reality environment to explore the intricate social cues that underlie non-verbal communication involved in a pedestrian's crossing decision. We "hack" non-verbal communication between pedestrian and vehicle by engineering a set of 15 vehicle trajectories, some of which follow social conventions and some that break them. By subverting social expectations of vehicle beha… ▽ More We use an immersive virtual reality environment to explore the intricate social cues that underlie non-verbal communication involved in a pedestrian's crossing decision. We "hack" non-verbal communication between pedestrian and vehicle by engineering a set of 15 vehicle trajectories, some of which follow social conventions and some that break them. By subverting social expectations of vehicle behavior we show that pedestrians may use vehicle kinematics to infer social intentions and not merely as the state of a moving object. We investigate human behavior in this virtual world by conducting a study of 22 subjects, with each subject experiencing and responding to each of the trajectories by moving their body, legs, arms, and head in both the physical and the virtual world. Both quantitative and qualitative responses are collected and analyzed, showing that, in fact, social cues can be engineered through vehicle trajectory manipulation. In addition, we demonstrate that immersive virtual worlds which allow the pedestrian to move around freely, provide a powerful way to understand both the mechanisms of human perception and the social signaling involved in pedestrian-vehicle interaction. △ Less

Submitted 1 April, 2019; originally announced April 2019.

Comments: 2019 Driving Assessment Conference

arXiv:1904.00758 [pdf, other]

Value of Temporal Dynamics Information in Driving Scene Segmentation

Authors: Li Ding, Jack Terwilliger, Rini Sherony, Bryan Reimer, Lex Fridman

Abstract: Semantic scene segmentation has primarily been addressed by forming representations of single images both with supervised and unsupervised methods. The problem of semantic segmentation in dynamic scenes has begun to recently receive attention with video object segmentation approaches. What is not known is how much extra information the temporal dynamics of the visual scene carries that is complime… ▽ More Semantic scene segmentation has primarily been addressed by forming representations of single images both with supervised and unsupervised methods. The problem of semantic segmentation in dynamic scenes has begun to recently receive attention with video object segmentation approaches. What is not known is how much extra information the temporal dynamics of the visual scene carries that is complimentary to the information available in the individual frames of the video. There is evidence that the human visual system can effectively perceive the scene from temporal dynamics information of the scene's changing visual characteristics without relying on the visual characteristics of individual snapshots themselves. Our work takes steps to explore whether machine perception can exhibit similar properties by combining appearance-based representations and temporal dynamics representations in a joint-learning problem that reveals the contribution of each toward successful dynamic scene segmentation. Additionally, we provide the MIT Driving Scene Segmentation dataset, which is a large-scale full driving scene segmentation dataset, densely annotated for every pixel and every one of 5,000 video frames. This dataset is intended to help further the exploration of the value of temporal dynamics information for semantic segmentation in video. △ Less

Submitted 20 March, 2019; originally announced April 2019.

arXiv:1810.01835 [pdf, other]

Human-Centered Autonomous Vehicle Systems: Principles of Effective Shared Autonomy

Authors: Lex Fridman

Abstract: Building effective, enjoyable, and safe autonomous vehicles is a lot harder than has historically been considered. The reason is that, simply put, an autonomous vehicle must interact with human beings. This interaction is not a robotics problem nor a machine learning problem nor a psychology problem nor an economics problem nor a policy problem. It is all of these problems put into one. It challen… ▽ More Building effective, enjoyable, and safe autonomous vehicles is a lot harder than has historically been considered. The reason is that, simply put, an autonomous vehicle must interact with human beings. This interaction is not a robotics problem nor a machine learning problem nor a psychology problem nor an economics problem nor a policy problem. It is all of these problems put into one. It challenges our assumptions about the limitations of human beings at their worst and the capabilities of artificial intelligence systems at their best. This work proposes a set of principles for designing and building autonomous vehicles in a human-centered way that does not run away from the complexity of human nature but instead embraces it. We describe our development of the Human-Centered Autonomous Vehicle (HCAV) as an illustrative case study of implementing these principles in practice. △ Less

Submitted 3 October, 2018; originally announced October 2018.

arXiv:1805.02787 [pdf, other]

Designing Toward Minimalism in Vehicle HMI

Authors: Julia Kindelsberger, Lex Fridman, Michael Glazer, Bryan Reimer

Abstract: We propose that safe, beautiful, fulfilling vehicle HMI design must start from a rigorous consideration of minimalist design. Modern vehicles are changing from mechanical machines to mobile computing devices, similar to the change from landline phones to smartphones. We propose the approach of "designing toward minimalism", where we ask "why?" rather than "why not?" in choosing what information to… ▽ More We propose that safe, beautiful, fulfilling vehicle HMI design must start from a rigorous consideration of minimalist design. Modern vehicles are changing from mechanical machines to mobile computing devices, similar to the change from landline phones to smartphones. We propose the approach of "designing toward minimalism", where we ask "why?" rather than "why not?" in choosing what information to display to the driver. We demonstrate this approach on an HMI case study of displaying vehicle speed. We first show that vehicle speed is what 87.6% of people ask for. We then show, through an online study with 1,038 subjects and 22,950 videos, that humans can estimate ego-vehicle speed very well, especially at lower speeds. Thus, despite believing that we need this information, we may not. In this way, we demonstrate a systematic approach of questioning the fundamental assumptions of what information is essential for vehicle HMI. △ Less

Submitted 7 May, 2018; originally announced May 2018.

arXiv:1803.06760 [pdf, other]

A Machine Learning Approach for Power Allocation in HetNets Considering QoS

Authors: Roohollah Amiri, Hani Mehrpouyan, Lex Fridman, Ranjan K. Mallik, Arumugam Nallanathan, David Matolak

Abstract: There is an increase in usage of smaller cells or femtocells to improve performance and coverage of next-generation heterogeneous wireless networks (HetNets). However, the interference caused by femtocells to neighboring cells is a limiting performance factor in dense HetNets. This interference is being managed via distributed resource allocation methods. However, as the density of the network inc… ▽ More There is an increase in usage of smaller cells or femtocells to improve performance and coverage of next-generation heterogeneous wireless networks (HetNets). However, the interference caused by femtocells to neighboring cells is a limiting performance factor in dense HetNets. This interference is being managed via distributed resource allocation methods. However, as the density of the network increases so does the complexity of such resource allocation methods. Yet, unplanned deployment of femtocells requires an adaptable and self-organizing algorithm to make HetNets viable. As such, we propose to use a machine learning approach based on Q-learning to solve the resource allocation problem in such complex networks. By defining each base station as an agent, a cellular network is modelled as a multi-agent network. Subsequently, cooperative Q-learning can be applied as an efficient approach to manage the resources of a multi-agent network. Furthermore, the proposed approach considers the quality of service (QoS) for each user and fairness in the network. In comparison with prior work, the proposed approach can bring more than a four-fold increase in the number of supported femtocells while using cooperative Q-learning to reduce resource allocation overhead. △ Less

Submitted 18 March, 2018; originally announced March 2018.

Comments: 7 pages, 7 figures, IEEE ICC'18

arXiv:1801.02805 [pdf, other]

DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation

Authors: Lex Fridman, Jack Terwilliger, Benedikt Jenik

Abstract: We present a traffic simulation named DeepTraffic where the planning systems for a subset of the vehicles are handled by a neural network as part of a model-free, off-policy reinforcement learning process. The primary goal of DeepTraffic is to make the hands-on study of deep reinforcement learning accessible to thousands of students, educators, and researchers in order to inspire and fuel the expl… ▽ More We present a traffic simulation named DeepTraffic where the planning systems for a subset of the vehicles are handled by a neural network as part of a model-free, off-policy reinforcement learning process. The primary goal of DeepTraffic is to make the hands-on study of deep reinforcement learning accessible to thousands of students, educators, and researchers in order to inspire and fuel the exploration and evaluation of deep Q-learning network variants and hyperparameter configurations through large-scale, open competition. This paper investigates the crowd-sourced hyperparameter tuning of the policy network that resulted from the first iteration of the DeepTraffic competition where thousands of participants actively searched through the hyperparameter space. △ Less

Submitted 2 January, 2019; v1 submitted 9 January, 2018; originally announced January 2018.

Comments: Neural Information Processing Systems (NIPS 2018) Deep Reinforcement Learning Workshop

arXiv:1711.06976 [pdf, other]

doi 10.1109/ACCESS.2019.2926040

MIT Advanced Vehicle Technology Study: Large-Scale Naturalistic Driving Study of Driver Behavior and Interaction with Automation

Authors: Lex Fridman, Daniel E. Brown, Michael Glazer, William Angell, Spencer Dodd, Benedikt Jenik, Jack Terwilliger, Aleksandr Patsekin, Julia Kindelsberger, Li Ding, Sean Seaman, Alea Mehler, Andrew Sipperley, Anthony Pettinato, Bobbie Seppelt, Linda Angell, Bruce Mehler, Bryan Reimer

Abstract: For the foreseeble future, human beings will likely remain an integral part of the driving task, monitoring the AI system as it performs anywhere from just over 0% to just under 100% of the driving. The governing objectives of the MIT Autonomous Vehicle Technology (MIT-AVT) study are to (1) undertake large-scale real-world driving data collection that includes high-definition video to fuel the dev… ▽ More For the foreseeble future, human beings will likely remain an integral part of the driving task, monitoring the AI system as it performs anywhere from just over 0% to just under 100% of the driving. The governing objectives of the MIT Autonomous Vehicle Technology (MIT-AVT) study are to (1) undertake large-scale real-world driving data collection that includes high-definition video to fuel the development of deep learning based internal and external perception systems, (2) gain a holistic understanding of how human beings interact with vehicle automation technology by integrating video data with vehicle state data, driver characteristics, mental models, and self-reported experiences with technology, and (3) identify how technology and other factors related to automation adoption and use can be improved in ways that save lives. In pursuing these objectives, we have instrumented 23 Tesla Model S and Model X vehicles, 2 Volvo S90 vehicles, 2 Range Rover Evoque, and 2 Cadillac CT6 vehicles for both long-term (over a year per driver) and medium term (one month per driver) naturalistic driving data collection. Furthermore, we are continually develo** new methods for analysis of the massive-scale dataset collected from the instrumented vehicle fleet. The recorded data streams include IMU, GPS, CAN messages, and high-definition video streams of the driver face, the driver cabin, the forward roadway, and the instrument cluster (on select vehicles). The study is on-going and growing. To date, we have 122 participants, 15,610 days of participation, 511,638 miles, and 7.1 billion video frames. This paper presents the design of the study, the data collection hardware, the processing of the data, and the computer vision algorithms currently being used to extract actionable knowledge from the data. △ Less

Submitted 14 August, 2019; v1 submitted 19 November, 2017; originally announced November 2017.

Journal ref: IEEE Access, vol. 7, pp. 102021-102038, 2019

arXiv:1710.04459 [pdf, other]

Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions

Authors: Lex Fridman, Li Ding, Benedikt Jenik, Bryan Reimer

Abstract: We consider the paradigm of a black box AI system that makes life-critical decisions. We propose an "arguing machines" framework that pairs the primary AI system with a secondary one that is independently trained to perform the same task. We show that disagreement between the two systems, without any knowledge of underlying system design or operation, is sufficient to arbitrarily improve the accur… ▽ More We consider the paradigm of a black box AI system that makes life-critical decisions. We propose an "arguing machines" framework that pairs the primary AI system with a secondary one that is independently trained to perform the same task. We show that disagreement between the two systems, without any knowledge of underlying system design or operation, is sufficient to arbitrarily improve the accuracy of the overall decision pipeline given human supervision over disagreements. We demonstrate this system in two applications: (1) an illustrative example of image classification and (2) on large-scale real-world semi-autonomous driving data. For the first application, we apply this framework to image classification achieving a reduction from 8.0% to 2.8% top-5 error on ImageNet. For the second application, we apply this framework to Tesla Autopilot and demonstrate the ability to predict 90.4% of system disengagements that were labeled by human annotators as challenging and needing human supervision. △ Less

Submitted 24 September, 2018; v1 submitted 12 October, 2017; originally announced October 2017.

arXiv:1707.02698 [pdf, other]

To Walk or Not to Walk: Crowdsourced Assessment of External Vehicle-to-Pedestrian Displays

Authors: Lex Fridman, Bruce Mehler, Lei Xia, Yangyang Yang, Laura Yvonne Facusse, Bryan Reimer

Abstract: Researchers, technology reviewers, and governmental agencies have expressed concern that automation may necessitate the introduction of added displays to indicate vehicle intent in vehicle-to-pedestrian interactions. An automated online methodology for obtaining communication intent perceptions for 30 external vehicle-to-pedestrian display concepts was implemented and tested using Amazon Mechanic… ▽ More Researchers, technology reviewers, and governmental agencies have expressed concern that automation may necessitate the introduction of added displays to indicate vehicle intent in vehicle-to-pedestrian interactions. An automated online methodology for obtaining communication intent perceptions for 30 external vehicle-to-pedestrian display concepts was implemented and tested using Amazon Mechanic Turk. Data from 200 qualified participants was quickly obtained and processed. In addition to producing a useful early-stage evaluation of these specific design concepts, the test demonstrated that the methodology is scalable so that a large number of design elements or minor variations can be assessed through a series of runs even on much larger samples in a matter of hours. Using this approach, designers should be able to refine concepts both more quickly and in more depth than available development resources typically allow. Some concerns and questions about common assumptions related to the implementation of vehicle-to-pedestrian displays are posed. △ Less

Submitted 10 July, 2017; originally announced July 2017.

arXiv:1706.04568 [pdf, other]

SideEye: A Generative Neural Network Based Simulator of Human Peripheral Vision

Authors: Lex Fridman, Benedikt Jenik, Shaiyan Keshvari, Bryan Reimer, Christoph Zetzsche, Ruth Rosenholtz

Abstract: Foveal vision makes up less than 1% of the visual field. The other 99% is peripheral vision. Precisely what human beings see in the periphery is both obvious and mysterious in that we see it with our own eyes but can't visualize what we see, except in controlled lab experiments. Degradation of information in the periphery is far more complex than what might be mimicked with a radial blur. Rather,… ▽ More Foveal vision makes up less than 1% of the visual field. The other 99% is peripheral vision. Precisely what human beings see in the periphery is both obvious and mysterious in that we see it with our own eyes but can't visualize what we see, except in controlled lab experiments. Degradation of information in the periphery is far more complex than what might be mimicked with a radial blur. Rather, behaviorally-validated models hypothesize that peripheral vision measures a large number of local texture statistics in pooling regions that overlap and grow with eccentricity. In this work, we develop a new method for peripheral vision simulation by training a generative neural network on a behaviorally-validated full-field synthesis model. By achieving a 21,000 fold reduction in running time, our approach is the first to combine realism and speed of peripheral vision simulation to a degree that provides a whole new way to approach visual design: through peripheral visualization. △ Less

Submitted 22 October, 2017; v1 submitted 14 June, 2017; originally announced June 2017.

arXiv:1612.01035 [pdf, other]

Semi-Automated Annotation of Discrete States in Large Video Datasets

Authors: Lex Fridman, Bryan Reimer

Abstract: We propose a framework for semi-automated annotation of video frames where the video is of an object that at any point in time can be labeled as being in one of a finite number of discrete states. A Hidden Markov Model (HMM) is used to model (1) the behavior of the underlying object and (2) the noisy observation of its state through an image processing algorithm. The key insight of this approach i… ▽ More We propose a framework for semi-automated annotation of video frames where the video is of an object that at any point in time can be labeled as being in one of a finite number of discrete states. A Hidden Markov Model (HMM) is used to model (1) the behavior of the underlying object and (2) the noisy observation of its state through an image processing algorithm. The key insight of this approach is that the annotation of frame-by-frame video can be reduced from a problem of labeling every single image to a problem of detecting a transition between states of the underlying objected being recording on video. The performance of the framework is evaluated on a driver gaze classification dataset composed of 16,000,000 images that were fully annotated over 6,000 hours of direct manual annotation labor. On this dataset, we achieve a 13x reduction in manual annotation for an average accuracy of 99.1% and a 84x reduction for an average accuracy of 91.2%. △ Less

Submitted 3 December, 2016; originally announced December 2016.

Comments: To be presented at AAAI 2017. arXiv admin note: text overlap with arXiv:1508.04028

arXiv:1611.08754 [pdf, other]

What Can Be Predicted from Six Seconds of Driver Glances?

Authors: Lex Fridman, Heishiro Toyoda, Sean Seaman, Bobbie Seppelt, Linda Angell, Joonbum Lee, Bruce Mehler, Bryan Reimer

Abstract: We consider a large dataset of real-world, on-road driving from a 100-car naturalistic study to explore the predictive power of driver glances and, specifically, to answer the following question: what can be predicted about the state of the driver and the state of the driving environment from a 6-second sequence of macro-glances? The context-based nature of such glances allows for application of s… ▽ More We consider a large dataset of real-world, on-road driving from a 100-car naturalistic study to explore the predictive power of driver glances and, specifically, to answer the following question: what can be predicted about the state of the driver and the state of the driving environment from a 6-second sequence of macro-glances? The context-based nature of such glances allows for application of supervised learning to the problem of vision-based gaze estimation, making it robust, accurate, and reliable in messy, real-world conditions. So, it's valuable to ask whether such macro-glances can be used to infer behavioral, environmental, and demographic variables? We analyze 27 binary classification problems based on these variables. The takeaway is that glance can be used as part of a multi-sensor real-time system to predict radio-tuning, fatigue state, failure to signal, talking, and several environment variables. △ Less

Submitted 26 November, 2016; originally announced November 2016.

arXiv:1602.07324 [pdf]

doi 10.7717/peerj-cs.146

Investigating Drivers' Head and Glance Correspondence

Authors: Joonbum Lee, Mauricio Muñoz, Lex Fridman, Trent Victor, Bryan Reimer, Bruce Mehler

Abstract: The relationship between a driver's glance pattern and corresponding head rotation is highly complex due to its nonlinear dependence on the individual, task, and driving context. This study explores the ability of head pose to serve as an estimator for driver gaze by connecting head rotation data with manually coded gaze region data using both a statistical analysis approach and a predictive (i.e.… ▽ More The relationship between a driver's glance pattern and corresponding head rotation is highly complex due to its nonlinear dependence on the individual, task, and driving context. This study explores the ability of head pose to serve as an estimator for driver gaze by connecting head rotation data with manually coded gaze region data using both a statistical analysis approach and a predictive (i.e., machine learning) approach. For the latter, classification accuracy increased as visual angles between two glance locations increased. In other words, the greater the shift in gaze, the higher the accuracy of classification. This is an intuitive but important concept that we make explicit through our analysis. The highest accuracy achieved was 83% using the method of Hidden Markov Models (HMM) for the binary gaze classification problem of (1) the forward roadway versus (2) the center stack. Results suggest that although there are individual differences in head-glance correspondence while driving, classifier models based on head-rotation data may be robust to these differences and therefore can serve as reasonable estimators for glance location. The results suggest that driver head pose can be used as a surrogate for eye gaze in several key conditions including the identification of high-eccentricity glances. Inexpensive driver head pose tracking may be a key element in detection systems developed to mitigate driver distraction and inattention. △ Less

Submitted 23 February, 2016; originally announced February 2016.

Comments: 27 pages, 7 figures, 2 tables

Journal ref: PeerJ Computer Science 4:e146 (2018) https://doi.org/10.7717/peerj-cs.146

arXiv:1512.02425 [pdf, other]

On the joint impact of bias and power control on downlink spectral efficiency in cellular networks

Authors: Lex Fridman, Jeffrey Wildman, Steven Weber

Abstract: Cell biasing and downlink transmit power are two controls that may be used to improve the spectral efficiency of cellular networks. With cell biasing, each mobile user associates with the base station offering, say, the highest biased signal to interference plus noise ratio. Biasing affects the cell association decisions of mobile users, but not the received instantaneous downlink transmission rat… ▽ More Cell biasing and downlink transmit power are two controls that may be used to improve the spectral efficiency of cellular networks. With cell biasing, each mobile user associates with the base station offering, say, the highest biased signal to interference plus noise ratio. Biasing affects the cell association decisions of mobile users, but not the received instantaneous downlink transmission rates. Adjusting the collection of downlink transmission powers can likewise affect the cell associations, but in contrast with biasing, it also directly affects the instantaneous rates. This paper investigates the joint use of both cell biasing and transmission power control and their (individual and joint) effects on the statistical properties of the collection of per-user spectral efficiencies. Our analytical results and numerical investigations demonstrate in some cases a significant performance improvement in the Pareto efficient frontiers of both a mean-variance and throughput-fairness tradeoff from using both bias and power controls over using either control alone. △ Less

Submitted 8 December, 2015; originally announced December 2015.

Comments: 14 pages, 9 figures, submitted on December 8, 2015 to IEEE/ACM Transactions on Networking, extension of Crowncom 2013 submission

arXiv:1511.07035 [pdf, other]

Detecting Road Surface Wetness from Audio: A Deep Learning Approach

Authors: Irman Abdić, Lex Fridman, Erik Marchi, Daniel E Brown, William Angell, Bryan Reimer, Björn Schuller

Abstract: We introduce a recurrent neural network architecture for automated road surface wetness detection from audio of tire-surface interaction. The robustness of our approach is evaluated on 785,826 bins of audio that span an extensive range of vehicle speeds, noises from the environment, road surface types, and pavement conditions including international roughness index (IRI) values from 25 in/mi to 14… ▽ More We introduce a recurrent neural network architecture for automated road surface wetness detection from audio of tire-surface interaction. The robustness of our approach is evaluated on 785,826 bins of audio that span an extensive range of vehicle speeds, noises from the environment, road surface types, and pavement conditions including international roughness index (IRI) values from 25 in/mi to 1400 in/mi. The training and evaluation of the model are performed on different roads to minimize the impact of environmental and other external factors on the accuracy of the classification. We achieve an unweighted average recall (UAR) of 93.2% across all vehicle speeds including 0 mph. The classifier still works at 0 mph because the discriminating signal is present in the sound of other vehicles driving by. △ Less

Submitted 4 December, 2015; v1 submitted 22 November, 2015; originally announced November 2015.

Comments: Under review in IEEE Signal Processing Letters

arXiv:1511.03908 [pdf, other]

Learning Human Identity from Motion Patterns

Authors: Natalia Neverova, Christian Wolf, Griffin Lacey, Lex Fridman, Deepak Chandra, Brandon Barbello, Graham Taylor

Abstract: We present a large-scale study exploring the capability of temporal deep neural networks to interpret natural human kinematics and introduce the first method for active biometric authentication with mobile inertial sensors. At Google, we have created a first-of-its-kind dataset of human movements, passively collected by 1500 volunteers using their smartphones daily over several months. We (1) comp… ▽ More We present a large-scale study exploring the capability of temporal deep neural networks to interpret natural human kinematics and introduce the first method for active biometric authentication with mobile inertial sensors. At Google, we have created a first-of-its-kind dataset of human movements, passively collected by 1500 volunteers using their smartphones daily over several months. We (1) compare several neural architectures for efficient learning of temporal multi-modal data representations, (2) propose an optimized shift-invariant dense convolutional mechanism (DCWRNN), and (3) incorporate the discriminatively-trained dynamic features in a probabilistic generative framework taking into account temporal characteristics. Our results demonstrate that human kinematics convey important information about user identity and can serve as a valuable component of multi-modal authentication systems. △ Less

Submitted 21 April, 2016; v1 submitted 12 November, 2015; originally announced November 2015.

Comments: 10 pages, 6 figures, 2 tables

arXiv:1510.06113 [pdf, other]

doi 10.1016/j.patrec.2016.02.011

Automated Synchronization of Driving Data Using Vibration and Steering Events

Authors: Lex Fridman, Daniel E Brown, William Angell, Irman Abdić, Bryan Reimer, Hae Young Noh

Abstract: We propose a method for automated synchronization of vehicle sensors useful for the study of multi-modal driver behavior and for the design of advanced driver assistance systems. Multi-sensor decision fusion relies on synchronized data streams in (1) the offline supervised learning context and (2) the online prediction context. In practice, such data streams are often out of sync due to the absenc… ▽ More We propose a method for automated synchronization of vehicle sensors useful for the study of multi-modal driver behavior and for the design of advanced driver assistance systems. Multi-sensor decision fusion relies on synchronized data streams in (1) the offline supervised learning context and (2) the online prediction context. In practice, such data streams are often out of sync due to the absence of a real-time clock, use of multiple recording devices, or improper thread scheduling and data buffer management. Cross-correlation of accelerometer, telemetry, audio, and dense optical flow from three video sensors is used to achieve an average synchronization error of 13 milliseconds. The insight underlying the effectiveness of the proposed approach is that the described sensors capture overlap** aspects of vehicle vibrations and vehicle steering allowing the cross-correlation function to serve as a way to compute the delay shift in each sensor. Furthermore, we show the decrease in synchronization error as a function of the duration of the data stream. △ Less

Submitted 1 March, 2016; v1 submitted 20 October, 2015; originally announced October 2015.

Comments: Accepted for Publication in Elsevier Pattern Recognition Letters

arXiv:1508.04028 [pdf, other]

Owl and Lizard: Patterns of Head Pose and Eye Pose in Driver Gaze Classification

Authors: Lex Fridman, Joonbum Lee, Bryan Reimer, Trent Victor

Abstract: Accurate, robust, inexpensive gaze tracking in the car can help keep a driver safe by facilitating the more effective study of how to improve (1) vehicle interfaces and (2) the design of future Advanced Driver Assistance Systems. In this paper, we estimate head pose and eye pose from monocular video using methods developed extensively in prior work and ask two new interesting questions. First, how… ▽ More Accurate, robust, inexpensive gaze tracking in the car can help keep a driver safe by facilitating the more effective study of how to improve (1) vehicle interfaces and (2) the design of future Advanced Driver Assistance Systems. In this paper, we estimate head pose and eye pose from monocular video using methods developed extensively in prior work and ask two new interesting questions. First, how much better can we classify driver gaze using head and eye pose versus just using head pose? Second, are there individual-specific gaze strategies that strongly correlate with how much gaze classification improves with the addition of eye pose information? We answer these questions by evaluating data drawn from an on-road study of 40 drivers. The main insight of the paper is conveyed through the analogy of an "owl" and "lizard" which describes the degree to which the eyes and the head move when shifting gaze. When the head moves a lot ("owl"), not much classification improvement is attained by estimating eye pose on top of head pose. On the other hand, when the head stays still and only the eyes move ("lizard"), classification accuracy increases significantly from adding in eye pose. We characterize how that accuracy varies between people, gaze strategies, and gaze regions. △ Less

Submitted 19 November, 2016; v1 submitted 17 August, 2015; originally announced August 2015.

Comments: Accepted for Publication in IET Computer Vision. arXiv admin note: text overlap with arXiv:1507.04760

arXiv:1507.04760 [pdf, other]

Driver Gaze Region Estimation Without Using Eye Movement

Authors: Lex Fridman, Philipp Langhans, Joonbum Lee, Bryan Reimer

Abstract: Automated estimation of the allocation of a driver's visual attention may be a critical component of future Advanced Driver Assistance Systems. In theory, vision-based tracking of the eye can provide a good estimate of gaze location. In practice, eye tracking from video is challenging because of sunglasses, eyeglass reflections, lighting conditions, occlusions, motion blur, and other factors. Esti… ▽ More Automated estimation of the allocation of a driver's visual attention may be a critical component of future Advanced Driver Assistance Systems. In theory, vision-based tracking of the eye can provide a good estimate of gaze location. In practice, eye tracking from video is challenging because of sunglasses, eyeglass reflections, lighting conditions, occlusions, motion blur, and other factors. Estimation of head pose, on the other hand, is robust to many of these effects, but cannot provide as fine-grained of a resolution in localizing the gaze. However, for the purpose of kee** the driver safe, it is sufficient to partition gaze into regions. In this effort, we propose a system that extracts facial features and classifies their spatial configuration into six regions in real-time. Our proposed method achieves an average accuracy of 91.4% at an average decision rate of 11 Hz on a dataset of 50 drivers from an on-road study. △ Less

Submitted 1 March, 2016; v1 submitted 16 July, 2015; originally announced July 2015.

Comments: Accepted for Publication in IEEE Intelligent Systems

arXiv:1503.08479 [pdf, other]

doi 10.1109/JSYST.2015.2472579

Active Authentication on Mobile Devices via Stylometry, Application Usage, Web Browsing, and GPS Location

Authors: Lex Fridman, Steven Weber, Rachel Greenstadt, Moshe Kam

Abstract: Active authentication is the problem of continuously verifying the identity of a person based on behavioral aspects of their interaction with a computing device. In this study, we collect and analyze behavioral biometrics data from 200subjects, each using their personal Android mobile device for a period of at least 30 days. This dataset is novel in the context of active authentication due to its… ▽ More Active authentication is the problem of continuously verifying the identity of a person based on behavioral aspects of their interaction with a computing device. In this study, we collect and analyze behavioral biometrics data from 200subjects, each using their personal Android mobile device for a period of at least 30 days. This dataset is novel in the context of active authentication due to its size, duration, number of modalities, and absence of restrictions on tracked activity. The geographical colocation of the subjects in the study is representative of a large closed-world environment such as an organization where the unauthorized user of a device is likely to be an insider threat: coming from within the organization. We consider four biometric modalities: (1) text entered via soft keyboard, (2) applications used, (3) websites visited, and (4) physical location of the device as determined from GPS (when outdoors) or WiFi (when indoors). We implement and test a classifier for each modality and organize the classifiers as a parallel binary decision fusion architecture. We are able to characterize the performance of the system with respect to intruder detection time and to quantify the contribution of each modality to the overall performance. △ Less

Submitted 29 March, 2015; originally announced March 2015.

Comments: Accepted for Publication in the IEEE Systems Journal

Showing 1–24 of 24 results for author: Fridman, L