Skip to main content

Showing 51–72 of 72 results for author: Ryoo, M S

.
  1. arXiv:1901.02537  [pdf, other

    cs.CV

    Collaborative Execution of Deep Neural Networks on Internet of Things Devices

    Authors: Ramyad Hadidi, Jiashen Cao, Micheal S. Ryoo, Hyesoon Kim

    Abstract: With recent advancements in deep neural networks (DNNs), we are able to solve traditionally challenging problems. Since DNNs are compute intensive, consumers, to deploy a service, need to rely on expensive and scarce compute resources in the cloud. This approach, in addition to its dependability on high-quality network infrastructure and data centers, raises new privacy concerns. These challenges… ▽ More

    Submitted 8 January, 2019; originally announced January 2019.

    Comments: Updated version after sysML

  2. arXiv:1811.10636  [pdf, other

    cs.CV cs.LG cs.NE

    Evolving Space-Time Neural Architectures for Videos

    Authors: AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo

    Abstract: We present a new method for finding video CNN architectures that capture rich spatio-temporal information in videos. Previous work, taking advantage of 3D convolutions, obtained promising results by manually designing video CNN architectures. We here develop a novel evolutionary search algorithm that automatically explores models with different types and combinations of layers to jointly learn int… ▽ More

    Submitted 20 August, 2019; v1 submitted 26 November, 2018; originally announced November 2018.

    Journal ref: ICCV 2019

  3. arXiv:1810.01455  [pdf, other

    cs.CV

    Representation Flow for Action Recognition

    Authors: AJ Piergiovanni, Michael S. Ryoo

    Abstract: In this paper, we propose a convolutional layer inspired by optical flow algorithms to learn motion representations. Our representation flow layer is a fully-differentiable layer designed to capture the `flow' of any representation channel within a convolutional neural network for action recognition. Its parameters for iterative flow optimization are learned in an end-to-end fashion together with… ▽ More

    Submitted 1 August, 2019; v1 submitted 2 October, 2018; originally announced October 2018.

    Comments: CVPR 2019

    Journal ref: CVPR 2019

  4. arXiv:1806.08251  [pdf, other

    cs.CV

    Learning Multimodal Representations for Unseen Activities

    Authors: AJ Piergiovanni, Michael S. Ryoo

    Abstract: We present a method to learn a joint multimodal representation space that enables recognition of unseen activities in videos. We first compare the effect of placing various constraints on the embedding space using paired text and video data. We also propose a method to improve the joint embedding space using an adversarial formulation, allowing it to benefit from unpaired text and video data. By u… ▽ More

    Submitted 7 July, 2020; v1 submitted 21 June, 2018; originally announced June 2018.

    Journal ref: WACV 2020

  5. arXiv:1805.07813  [pdf, other

    cs.RO cs.CV stat.ML

    Learning Real-World Robot Policies by Dreaming

    Authors: AJ Piergiovanni, Alan Wu, Michael S. Ryoo

    Abstract: Learning to control robots directly based on images is a primary challenge in robotics. However, many existing reinforcement learning approaches require iteratively obtaining millions of robot samples to learn a policy, which can take significant time. In this paper, we focus on learning a realistic world model capturing the dynamics of scene changes conditioned on robot actions. Our dreaming mode… ▽ More

    Submitted 1 August, 2019; v1 submitted 20 May, 2018; originally announced May 2018.

    Journal ref: IROS 2019

  6. arXiv:1804.03247  [pdf, other

    cs.CV

    Fine-grained Activity Recognition in Baseball Videos

    Authors: AJ Piergiovanni, Michael S. Ryoo

    Abstract: In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in continuous videos. We experimentally compare various recognition approaches capturing temporal structure in activity videos, by classifying segmented videos and extending those approaches… ▽ More

    Submitted 9 April, 2018; originally announced April 2018.

    Comments: CVPR Workshop on Computer Vision in Sports

    Journal ref: CVPR Workshop on Computer Vision in Sports 2018

  7. arXiv:1803.11556  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Learning to Anonymize Faces for Privacy Preserving Action Detection

    Authors: Zhongzheng Ren, Yong Jae Lee, Michael S. Ryoo

    Abstract: There is an increasing concern in computer vision devices invading users' privacy by recording unwanted videos. On the one hand, we want the camera systems to recognize important events and assist human daily lives by understanding its videos, but on the other hand we want to ensure that they do not intrude people's privacy. In this paper, we propose a new principled approach for learning a video… ▽ More

    Submitted 26 July, 2018; v1 submitted 30 March, 2018; originally announced March 2018.

    Comments: ECCV'18 camera ready

  8. arXiv:1803.11217  [pdf, other

    cs.CV

    Joint Person Segmentation and Identification in Synchronized First- and Third-person Videos

    Authors: Mingze Xu, Chenyou Fan, Yuchen Wang, Michael S Ryoo, David J Crandall

    Abstract: In a world of pervasive cameras, public spaces are often captured from multiple perspectives by cameras of different types, both fixed and mobile. An important problem is to organize these heterogeneous collections of videos by finding connections between them, such as identifying correspondences between the people appearing in the videos and the people holding or wearing the cameras. In this pape… ▽ More

    Submitted 25 July, 2018; v1 submitted 29 March, 2018; originally announced March 2018.

    Comments: To appear in ECCV 2018

  9. arXiv:1803.06316  [pdf, other

    cs.CV

    Temporal Gaussian Mixture Layer for Videos

    Authors: AJ Piergiovanni, Michael S. Ryoo

    Abstract: We introduce a new convolutional layer named the Temporal Gaussian Mixture (TGM) layer and present how it can be used to efficiently capture longer-term temporal information in continuous activity videos. The TGM layer is a temporal convolutional layer governed by a much smaller set of parameters (e.g., location/variance of Gaussians) that are fully differentiable. We present our fully convolution… ▽ More

    Submitted 1 August, 2019; v1 submitted 16 March, 2018; originally announced March 2018.

    Comments: ICML 2019

  10. arXiv:1802.02138  [pdf, other

    cs.CV cs.AR

    Musical Chair: Efficient Real-Time Recognition Using Collaborative IoT Devices

    Authors: Ramyad Hadidi, Jiashen Cao, Matthew Woodward, Michael S. Ryoo, Hyesoon Kim

    Abstract: The prevalence of Internet of things (IoT) devices and abundance of sensor data has created an increase in real-time data processing such as recognition of speech, image, and video. While currently such processes are offloaded to the computationally powerful cloud system, a localized and distributed approach is desirable because (i) it preserves the privacy of users and (ii) it omits the dependenc… ▽ More

    Submitted 21 March, 2018; v1 submitted 5 February, 2018; originally announced February 2018.

  11. arXiv:1712.01938  [pdf, other

    cs.CV

    Learning Latent Super-Events to Detect Multiple Activities in Videos

    Authors: AJ Piergiovanni, Michael S. Ryoo

    Abstract: In this paper, we introduce the concept of learning latent super-events from activity videos, and present how it benefits activity detection in continuous videos. We define a super-event as a set of multiple events occurring together in videos with a particular temporal organization; it is the opposite concept of sub-events. Real-world videos contain multiple activities and are rarely segmented (e… ▽ More

    Submitted 29 March, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

    Comments: CVPR 2018

    Journal ref: CVPR 2018

  12. arXiv:1708.00999  [pdf, other

    cs.CV

    Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning

    Authors: Michael S. Ryoo, Kiyoon Kim, Hyun Jong Yang

    Abstract: This paper presents an approach for recognizing human activities from extreme low resolution (e.g., 16x12) videos. Extreme low resolution recognition is not only necessary for analyzing actions at a distance but also is crucial for enabling privacy-preserving recognition of human activities. We design a new two-stream multi-Siamese convolutional neural network. The idea is to explicitly capture th… ▽ More

    Submitted 3 February, 2018; v1 submitted 3 August, 2017; originally announced August 2017.

    Comments: AAAI 2018

  13. arXiv:1705.07328  [pdf, other

    cs.CV

    Forecasting Hands and Objects in Future Frames

    Authors: Chenyou Fan, Jangwon Lee, Michael S. Ryoo

    Abstract: This paper presents an approach to forecast future presence and location of human hands and objects. Given an image frame, the goal is to predict what objects will appear in the future frame (e.g., 5 seconds later) and where they will be located at, even when they are not visible in the current frame. The key idea is that (1) an intermediate representation of a convolutional object recognition mod… ▽ More

    Submitted 23 August, 2018; v1 submitted 20 May, 2017; originally announced May 2017.

  14. arXiv:1704.06340  [pdf, other

    cs.CV

    Identifying First-person Camera Wearers in Third-person Videos

    Authors: Chenyou Fan, Jangwon Lee, Mingze Xu, Krishna Kumar Singh, Yong Jae Lee, David J. Crandall, Michael S. Ryoo

    Abstract: We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in environments in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene. To do this, we need to establish person-level correspondences across first- and third-person videos, which is challenging because the cam… ▽ More

    Submitted 20 April, 2017; originally announced April 2017.

  15. arXiv:1703.01040  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression

    Authors: Jangwon Lee, Michael S. Ryoo

    Abstract: We design a new approach that allows robot learning of new activities from unlabeled human example videos. Given videos of humans executing the same activity from a human's viewpoint (i.e., first-person videos), our objective is to make the robot learn the temporal structure of the activity as its future regression network, and learn to transfer such model for its own motor execution. We present a… ▽ More

    Submitted 24 July, 2017; v1 submitted 3 March, 2017; originally announced March 2017.

  16. arXiv:1703.00503  [pdf, other

    cs.RO cs.AI cs.CV

    Learning Social Affordance Grammar from Videos: Transferring Human Interactions to Human-Robot Interactions

    Authors: Tianmin Shu, Xiaofeng Gao, Michael S. Ryoo, Song-Chun Zhu

    Abstract: In this paper, we present a general framework for learning social affordance grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human interactions, and transfer the grammar to humanoids to enable a real-time motion inference for human-robot interaction (HRI). Based on Gibbs sampling, our weakly supervised grammar learning can automatically construct a hierarchical representatio… ▽ More

    Submitted 1 March, 2017; originally announced March 2017.

    Comments: The 2017 IEEE International Conference on Robotics and Automation (ICRA)

  17. arXiv:1605.08140  [pdf, other

    cs.CV

    Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters

    Authors: AJ Piergiovanni, Chenyou Fan, Michael S. Ryoo

    Abstract: In this paper, we newly introduce the concept of temporal attention filters, and describe how they can be used for human activity recognition from videos. Many high-level activities are often composed of multiple temporal parts (e.g., sub-events) with different duration/speed, and our objective is to make the model explicitly learn such temporal structure using multiple attention filters and benef… ▽ More

    Submitted 26 December, 2016; v1 submitted 26 May, 2016; originally announced May 2016.

    Journal ref: AAAI 2017

  18. arXiv:1604.03692  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Learning Social Affordance for Human-Robot Interaction

    Authors: Tianmin Shu, M. S. Ryoo, Song-Chun Zhu

    Abstract: In this paper, we present an approach for robot learning of social affordance from human activity videos. We consider the problem in the context of human-robot interaction: Our approach learns structural representations of human-human (and human-object-human) interactions, describing how body-parts of each agent move with respect to each other and what spatial relations they should maintain to com… ▽ More

    Submitted 20 April, 2016; v1 submitted 13 April, 2016; originally announced April 2016.

    Comments: International Joint Conference on Artificial Intelligence (IJCAI), 2016

  19. arXiv:1604.03196  [pdf, other

    cs.CV

    Privacy-Preserving Human Activity Recognition from Extreme Low Resolution

    Authors: Michael S. Ryoo, Brandon Rothrock, Charles Fleming, Hyun Jong Yang

    Abstract: Privacy protection from surreptitious video recordings is an important societal challenge. We desire a computer vision system (e.g., a robot) that can recognize human activities and assist our daily life, yet ensure that it is not recording video that may invade our privacy. This paper presents a fundamental approach to address such contradicting objectives: human activity recognition while only u… ▽ More

    Submitted 26 December, 2016; v1 submitted 11 April, 2016; originally announced April 2016.

    Journal ref: AAAI 2017

  20. arXiv:1507.02558  [pdf, other

    cs.CV

    Multi-Type Activity Recognition in Robot-Centric Scenarios

    Authors: Ilaria Gori, J. K. Aggarwal, Larry Matthies, Michael S. Ryoo

    Abstract: Activity recognition is very useful in scenarios where robots interact with, monitor or assist humans. In the past years many types of activities -- single actions, two persons interactions or ego-centric activities, to name a few -- have been analyzed. Whereas traditional methods treat such types of activities separately, an autonomous robot should be able to detect and recognize multiple types o… ▽ More

    Submitted 11 April, 2016; v1 submitted 9 July, 2015; originally announced July 2015.

    Journal ref: IEEE Robotics and Automation Letters (RA-L), 1(1):593-600, 2016

  21. arXiv:1412.6505  [pdf, other

    cs.CV

    Pooled Motion Features for First-Person Videos

    Authors: M. S. Ryoo, Brandon Rothrock, Larry Matthies

    Abstract: In this paper, we present a new feature representation for first-person videos. In first-person video understanding (e.g., activity recognition), it is very important to capture both entire scene dynamics (i.e., egomotion) and salient local motion observed in videos. We describe a representation framework based on time series pooling, which is designed to abstract short-term/long-term changes in f… ▽ More

    Submitted 6 May, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

  22. arXiv:1406.5309  [pdf, other

    cs.CV

    Early Recognition of Human Activities from First-Person Videos Using Onset Representations

    Authors: M. S. Ryoo, Thomas J. Fuchs, Lu Xia, J. K. Aggarwal, Larry Matthies

    Abstract: In this paper, we propose a methodology for early recognition of human activities from videos taken with a first-person viewpoint. Early recognition, which is also known as activity prediction, is an ability to infer an ongoing activity at its early stage. We present an algorithm to perform recognition of activities targeted at the camera from streaming videos, making the system to predict intende… ▽ More

    Submitted 6 July, 2015; v1 submitted 20 June, 2014; originally announced June 2014.