Search | arXiv e-print repository

UnweaveNet: Unweaving Activity Stories

Authors: Will Price, Carl Vondrick, Dima Damen

Abstract: Our lives can be seen as a complex weaving of activities; we switch from one activity to another, to maximise our achievements or in reaction to demands placed upon us. Observing a video of unscripted daily activities, we parse the video into its constituent activity threads through a process we call unweaving. To accomplish this, we introduce a video representation explicitly capturing activity t… ▽ More Our lives can be seen as a complex weaving of activities; we switch from one activity to another, to maximise our achievements or in reaction to demands placed upon us. Observing a video of unscripted daily activities, we parse the video into its constituent activity threads through a process we call unweaving. To accomplish this, we introduce a video representation explicitly capturing activity threads called a thread bank, along with a neural controller capable of detecting goal changes and resuming of past activities, together forming UnweaveNet. We train and evaluate UnweaveNet on sequences from the unscripted egocentric dataset EPIC-KITCHENS. We propose and showcase the efficacy of pretraining UnweaveNet in a self-supervised manner. △ Less

Submitted 4 April, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

Comments: Accepted at IEEE/CVF Computer Vision and Pattern Recognition (CVPR) 2022

arXiv:2111.09098 [pdf, other]

Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding

Authors: Kyunghoon Hur, Jiyoung Lee, Jungwoo Oh, Wesley Price, Young-Hak Kim, Edward Choi

Abstract: EHR systems lack a unified code system forrepresenting medical concepts, which acts asa barrier for the deployment of deep learningmodels in large scale to multiple clinics and hos-pitals. To overcome this problem, we introduceDescription-based Embedding,DescEmb, a code-agnostic representation learning framework forEHR. DescEmb takes advantage of the flexibil-ity of neural language understanding m… ▽ More EHR systems lack a unified code system forrepresenting medical concepts, which acts asa barrier for the deployment of deep learningmodels in large scale to multiple clinics and hos-pitals. To overcome this problem, we introduceDescription-based Embedding,DescEmb, a code-agnostic representation learning framework forEHR. DescEmb takes advantage of the flexibil-ity of neural language understanding models toembed clinical events using their textual descrip-tions rather than directly map** each event toa dedicated embedding. DescEmb outperformedtraditional code-based embedding in extensiveexperiments, especially in a zero-shot transfertask (one hospital to another), and was able totrain a single unified model for heterogeneousEHR datasets. △ Less

Submitted 17 January, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

Comments: Machine Learning for Health (ML4H) at NeurIPS 2021 - Extended Abstract. This is a condensed version of arXiv:2108.03625. arXiv admin note: substantial text overlap with arXiv:2108.03625

arXiv:2110.07058 [pdf, other]

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/ △ Less

Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

arXiv:2108.03625 [pdf, other]

Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding

Authors: Kyunghoon Hur, Jiyoung Lee, Jungwoo Oh, Wesley Price, Young-Hak Kim, Edward Choi

Abstract: Substantial increase in the use of Electronic Health Records (EHRs) has opened new frontiers for predictive healthcare. However, while EHR systems are nearly ubiquitous, they lack a unified code system for representing medical concepts. Heterogeneous formats of EHR present a substantial barrier for the training and deployment of state-of-the-art deep learning models at scale. To overcome this prob… ▽ More Substantial increase in the use of Electronic Health Records (EHRs) has opened new frontiers for predictive healthcare. However, while EHR systems are nearly ubiquitous, they lack a unified code system for representing medical concepts. Heterogeneous formats of EHR present a substantial barrier for the training and deployment of state-of-the-art deep learning models at scale. To overcome this problem, we introduce Description-based Embedding, DescEmb, a code-agnostic description-based representation learning framework for predictive modeling on EHR. DescEmb takes advantage of the flexibility of neural language understanding models while maintaining a neutral approach that can be combined with prior frameworks for task-specific representation learning or predictive modeling. We tested our model's capacity on various experiments including prediction tasks, transfer learning and pooled learning. DescEmb shows higher performance in overall experiments compared to code-based approach, opening the door to a text-based approach in predictive healthcare research that is not constrained by EHR structure nor special domain knowledge. △ Less

Submitted 18 March, 2022; v1 submitted 8 August, 2021; originally announced August 2021.

Comments: Accepted at CHIL 2022. Main paper + supplementary material (21 pages, 8 figures, 12 tables)

arXiv:2103.04448 [pdf, other]

doi 10.1145/3448139.3448205

Toward Semi-Automatic Misconception Discovery Using Code Embeddings

Authors: Yang Shi, Krupal Shah, Wengran Wang, Samiha Marwan, Poorvaja Penmetsa, Thomas W. Price

Abstract: Understanding students' misconceptions is important for effective teaching and assessment. However, discovering such misconceptions manually can be time-consuming and laborious. Automated misconception discovery can address these challenges by highlighting patterns in student data, which domain experts can then inspect to identify misconceptions. In this work, we present a novel method for the sem… ▽ More Understanding students' misconceptions is important for effective teaching and assessment. However, discovering such misconceptions manually can be time-consuming and laborious. Automated misconception discovery can address these challenges by highlighting patterns in student data, which domain experts can then inspect to identify misconceptions. In this work, we present a novel method for the semi-automated discovery of problem-specific misconceptions from students' program code in computing courses, using a state-of-the-art code classification model. We trained the model on a block-based programming dataset and used the learned embedding to cluster incorrect student submissions. We found these clusters correspond to specific misconceptions about the problem and would not have been easily discovered with existing approaches. We also discuss potential applications of our approach and how these misconceptions inform domain-specific insights into students' learning processes. △ Less

Submitted 7 March, 2021; originally announced March 2021.

Comments: 7 pages, 3 figures, Accepted in LAK'21

ACM Class: I.2.1; K.3.1; K.3.2

arXiv:2102.05765 [pdf, other]

doi 10.1145/3408877.3432439

Early Performance Prediction using Interpretable Patterns in Programming Process Data

Authors: Ge Gao, Samiha Marwan, Thomas W. Price

Abstract: Instructors have limited time and resources to help struggling students, and these resources should be directed to the students who most need them. To address this, researchers have constructed models that can predict students' final course performance early in a semester. However, many predictive models are limited to static and generic student features (e.g. demographics, GPA), rather than compu… ▽ More Instructors have limited time and resources to help struggling students, and these resources should be directed to the students who most need them. To address this, researchers have constructed models that can predict students' final course performance early in a semester. However, many predictive models are limited to static and generic student features (e.g. demographics, GPA), rather than computing-specific evidence that assesses a student's progress in class. Many programming environments now capture complete time-stamped records of students' actions during programming. In this work, we leverage this rich, fine-grained log data to build a model to predict student course outcomes. From the log data, we extract patterns of behaviors that are predictive of students' success using an approach called differential sequence mining. We evaluate our approach on a dataset from 106 students in a block-based, introductory programming course. The patterns extracted from our approach can predict final programming performance with 79% accuracy using only the first programming assignment, outperforming two baseline methods. In addition, we show that the patterns are interpretable and correspond to concrete, effective -- and ineffective -- novice programming behaviors. We also discuss these patterns and their implications for classroom instruction. △ Less

Submitted 10 February, 2021; originally announced February 2021.

arXiv:2011.12372 [pdf, other]

Play Fair: Frame Attributions in Video Models

Authors: Will Price, Dima Damen

Abstract: In this paper, we introduce an attribution method for explaining action recognition models. Such models fuse information from multiple frames within a video, through score aggregation or relational reasoning. We break down a model's class score into the sum of contributions from each frame, fairly. Our method adapts an axiomatic solution to fair reward distribution in cooperative games, known as t… ▽ More In this paper, we introduce an attribution method for explaining action recognition models. Such models fuse information from multiple frames within a video, through score aggregation or relational reasoning. We break down a model's class score into the sum of contributions from each frame, fairly. Our method adapts an axiomatic solution to fair reward distribution in cooperative games, known as the Shapley value, for elements in a variable-length sequence, which we call the Element Shapley Value (ESV). Critically, we propose a tractable approximation of ESV that scales linearly with the number of frames in the sequence. We employ ESV to explain two action recognition models (TRN and TSN) on the fine-grained dataset Something-Something. We offer detailed analysis of supporting/distracting frames, and the relationships of ESVs to the frame's position, class prediction, and sequence length. We compare ESV to naive baselines and two commonly used feature attribution methods: Grad-CAM and Integrated-Gradients. △ Less

Submitted 24 November, 2020; originally announced November 2020.

Comments: Code available at: https://github.com/willprice/play-fair/ and supporting website at: https://play-fair.willprice.dev/

arXiv:2006.13256 [pdf, other]

doi 10.5523/bris.2g1n6qdydwa9u22shpxqzp0t8m

Rescaling Egocentric Vision

Authors: Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

Abstract: This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a nov… ▽ More This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection enables new challenges such as action detection and evaluating the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected two years later. The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics △ Less

Submitted 17 September, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: Accepted at the International Journal of Computer Vision (IJCV). Dataset available from: http://epic-kitchens.github.io/

arXiv:2005.11416 [pdf, ps, other]

COVID-19 Contact Tracing: Eight Privacy Questions Explored

Authors: Hugh Lawson-Tancred, Henry C. W. Price, Alessandro Provetti

Abstract: We respond to a recent short paper by de Motjoye et el. on privacy issues with Covid-19 tracking. Their paper, which we discuss here, is structured around three "toy protocols" for the design of an app which can maximise the utility of contact tracing information while minimising the more general risk to privacy. On this basis, the paper proceeds to introduce eight questions against which they sho… ▽ More We respond to a recent short paper by de Motjoye et el. on privacy issues with Covid-19 tracking. Their paper, which we discuss here, is structured around three "toy protocols" for the design of an app which can maximise the utility of contact tracing information while minimising the more general risk to privacy. On this basis, the paper proceeds to introduce eight questions against which they should be assessed. The questions raised and the protocols proposed effectively amount to the creation of a game with different categories of players able to make different moves. It is therefore possible to analyse the model in terms of optimal game design. △ Less

Submitted 22 May, 2020; originally announced May 2020.

Comments: Six pages, no figures

ACM Class: K.4.1; J.3

arXiv:2005.00343 [pdf, other]

The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines

Authors: Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

Abstract: Since its introduction in 2018, EPIC-KITCHENS has attracted attention as the largest egocentric video benchmark, offering a unique viewpoint on people's interaction with objects, their attention, and even intention. In this paper, we detail how this large-scale dataset was captured by 32 participants in their native kitchen environments, and densely annotated with actions and object interactions.… ▽ More Since its introduction in 2018, EPIC-KITCHENS has attracted attention as the largest egocentric video benchmark, offering a unique viewpoint on people's interaction with objects, their attention, and even intention. In this paper, we detail how this large-scale dataset was captured by 32 participants in their native kitchen environments, and densely annotated with actions and object interactions. Our videos depict nonscripted daily activities, as recording is started every time a participant entered their kitchen. Recording took place in 4 countries by participants belonging to 10 different nationalities, resulting in highly diverse kitchen habits and cooking styles. Our dataset features 55 hours of video consisting of 11.5M frames, which we densely labelled for a total of 39.6K action segments and 454.2K object bounding boxes. Our annotation is unique in that we had the participants narrate their own videos after recording, thus reflecting true intention, and we crowd-sourced ground-truths based on these. We describe our object, action and. anticipation challenges, and evaluate several baselines over two test splits, seen and unseen kitchens. We introduce new baselines that highlight the multimodal nature of the dataset and the importance of explicit temporal modelling to discriminate fine-grained actions e.g. 'closing a tap' from 'opening' it up. △ Less

Submitted 29 April, 2020; originally announced May 2020.

Comments: Preprint for paper at IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1804.02748

arXiv:1909.09422 [pdf, other]

Retro-Actions: Learning 'Close' by Time-Reversing 'Open' Videos

Authors: Will Price, Dima Damen

Abstract: We investigate video transforms that result in class-homogeneous label-transforms. These are video transforms that consistently maintain or modify the labels of all videos in each class. We propose a general approach to discover invariant classes, whose transformed examples maintain their label; pairs of equivariant classes, whose transformed examples exchange their labels; and novel-generating cl… ▽ More We investigate video transforms that result in class-homogeneous label-transforms. These are video transforms that consistently maintain or modify the labels of all videos in each class. We propose a general approach to discover invariant classes, whose transformed examples maintain their label; pairs of equivariant classes, whose transformed examples exchange their labels; and novel-generating classes, whose transformed examples belong to a new class outside the dataset. Label transforms offer additional supervision previously unexplored in video recognition benefiting data augmentation and enabling zero-shot learning opportunities by learning a class from transformed videos of its counterpart. Amongst such video transforms, we study horizontal-flip**, time-reversal, and their composition. We highlight errors in naively using horizontal-flip** as a form of data augmentation in video. Next, we validate the realism of time-reversed videos through a human perception study where people exhibit equal preference for forward and time-reversed videos. Finally, we test our approach on two datasets, Jester and Something-Something, evaluating the three video transforms for zero-shot learning and data augmentation. Our results show that gestures such as zooming in can be learnt from zooming out in a zero-shot setting, as well as more complex actions with state transitions such as digging something out of something from burying something in something. △ Less

Submitted 20 September, 2019; originally announced September 2019.

Comments: ICCVW 2019, 8 pages, 7 figures, 6 tables. https://video-reversal.willprice.dev/

arXiv:1908.00867 [pdf, other]

An Evaluation of Action Recognition Models on EPIC-Kitchens

Authors: Will Price, Dima Damen

Abstract: We benchmark contemporary action recognition models (TSN, TRN, and TSM) on the recently introduced EPIC-Kitchens dataset and release pretrained models on GitHub (https://github.com/epic-kitchens/action-models) for others to build upon. In contrast to popular action recognition datasets like Kinetics, Something-Something, UCF101, and HMDB51, EPIC-Kitchens is shot from an egocentric perspective and… ▽ More We benchmark contemporary action recognition models (TSN, TRN, and TSM) on the recently introduced EPIC-Kitchens dataset and release pretrained models on GitHub (https://github.com/epic-kitchens/action-models) for others to build upon. In contrast to popular action recognition datasets like Kinetics, Something-Something, UCF101, and HMDB51, EPIC-Kitchens is shot from an egocentric perspective and captures daily actions in-situ. In this report, we aim to understand how well these models can tackle the challenges present in this dataset, such as its long tail class distribution, unseen environment test set, and multiple tasks (verb, noun and, action classification). We discuss the models' shortcomings and avenues for future research. △ Less

Submitted 2 August, 2019; originally announced August 2019.

Comments: 6 pages, 3 figures, 3 tables. Models released at https://github.com/epic-kitchens/action-models

arXiv:1804.02748 [pdf, other]

Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

Authors: Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

Abstract: First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen… ▽ More First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. Our videos depict nonscripted daily activities: we simply asked each participant to start recording every time they entered their kitchen. Recording took place in 4 cities (in North America and Europe) by participants belonging to 10 different nationalities, resulting in highly diverse cooking styles. Our dataset features 55 hours of video consisting of 11.5M frames, which we densely labeled for a total of 39.6K action segments and 454.3K object bounding boxes. Our annotation is unique in that we had the participants narrate their own videos (after recording), thus reflecting true intention, and we crowd-sourced ground-truths based on these. We describe our object, action and anticipation challenges, and evaluate several baselines over two test splits, seen and unseen kitchens. Dataset and Project page: http://epic-kitchens.github.io △ Less

Submitted 31 July, 2018; v1 submitted 8 April, 2018; originally announced April 2018.

Comments: European Conference on Computer Vision (ECCV) 2018 Dataset and Project page: http://epic-kitchens.github.io

arXiv:1708.06564 [pdf, other]

The Continuous Hint Factory - Providing Hints in Vast and Sparsely Populated Edit Distance Spaces

Authors: Benjamin Paaßen, Barbara Hammer, Thomas William Price, Tiffany Barnes, Sebastian Gross, Niels Pinkwart

Abstract: Intelligent tutoring systems can support students in solving multi-step tasks by providing hints regarding what to do next. However, engineering such next-step hints manually or via an expert model becomes infeasible if the space of possible states is too large. Therefore, several approaches have emerged to infer next-step hints automatically, relying on past students' data. In particular, the Hin… ▽ More Intelligent tutoring systems can support students in solving multi-step tasks by providing hints regarding what to do next. However, engineering such next-step hints manually or via an expert model becomes infeasible if the space of possible states is too large. Therefore, several approaches have emerged to infer next-step hints automatically, relying on past students' data. In particular, the Hint Factory (Barnes & Stamper, 2008) recommends edits that are most likely to guide students from their current state towards a correct solution, based on what successful students in the past have done in the same situation. Still, the Hint Factory relies on student data being available for any state a student might visit while solving the task, which is not the case for some learning tasks, such as open-ended programming tasks. In this contribution we provide a mathematical framework for edit-based hint policies and, based on this theory, propose a novel hint policy to provide edit hints in vast and sparsely populated state spaces. In particular, we extend the Hint Factory by considering data of past students in all states which are similar to the student's current state and creating hints approximating the weighted average of all these reference states. Because the space of possible weighted averages is continuous, we call this approach the Continuous Hint Factory. In our experimental evaluation, we demonstrate that the Continuous Hint Factory can predict more accurately what capable students would do compared to existing prediction schemes on two learning tasks, especially in an open-ended programming task, and that the Continuous Hint Factory is comparable to existing hint policies at reproducing tutor hints on a simple UML diagram task. △ Less

Submitted 30 June, 2018; v1 submitted 22 August, 2017; originally announced August 2017.

Journal ref: Journal of Educational Data Mining, 10 (2018) 1-35. Retrieved from https://jedm.educationaldatamining.org/index.php/JEDM/article/view/158

Showing 1–14 of 14 results for author: Price, W