Search | arXiv e-print repository

Learning to Act from Actionless Videos through Dense Correspondences

Authors: Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum

Abstract: In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations. Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot g… ▽ More In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations. Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals. By synthesizing videos that ``hallucinate'' robot executing actions and in combination with dense correspondences between frames, our approach can infer the closed-formed action to execute to an environment without the need of any explicit action labels. This unique capability allows us to train the policy solely based on RGB videos and deploy learned policies to various robotic tasks. We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks. Additionally, we contribute an open-source framework for efficient video modeling, enabling the training of high-fidelity policy models with four GPUs within a single day. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: Project page: https://flow-diffusion.github.io/

arXiv:2206.09742 [pdf]

doi 10.1109/ACCESS.2023.3296793

Develo** a Free and Open-source Automated Building Exterior Crack Inspection Software for Construction and Facility Managers

Authors: Pi Ko, Samuel A. Prieto, Borja Garcia de Soto

Abstract: Inspection of cracks is an important process for properly monitoring and maintaining a building. However, manual crack inspection is time-consuming, inconsistent, and dangerous (e.g., in tall buildings). Due to the development of open-source AI technologies, the increase in available Unmanned Aerial Vehicles (UAVs) and the availability of smartphone cameras, it has become possible to automate the… ▽ More Inspection of cracks is an important process for properly monitoring and maintaining a building. However, manual crack inspection is time-consuming, inconsistent, and dangerous (e.g., in tall buildings). Due to the development of open-source AI technologies, the increase in available Unmanned Aerial Vehicles (UAVs) and the availability of smartphone cameras, it has become possible to automate the building crack inspection process. This study presents the development of an easy-to-use, free and open-source Automated Building Exterior Crack Inspection Software (ABECIS) for construction and facility managers, using state-of-the-art segmentation algorithms to identify concrete cracks and generate a quantitative and qualitative report. ABECIS was tested using images collected from a UAV and smartphone cameras in real-world conditions and a controlled laboratory environment. From the raw output of the algorithm, the median Intersection over Unions for the test experiments is (1) 0.686 for indoor crack detection experiment in a controlled lab environment using a commercial drone, (2) 0.186 for indoor crack detection at a construction site using a smartphone and (3) 0.958 for outdoor crack detection on university campus using a commercial drone. These IoU results can be improved significantly to over 0.8 when a human operator selectively removes the false positives. In general, ABECIS performs best for outdoor drone images, and combining the algorithm predictions with human verification/intervention offers very accurate crack detection results. The software is available publicly and can be downloaded for out-of-the-box use at: https://github.com/SMART-NYUAD/ABECIS △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2206.06862 [pdf, other]

doi 10.1016/j.cmpb.2023.107631

Evaluating histopathology transfer learning with ChampKit

Authors: Jakub R. Kaczmarzyk, Tahsin M. Kurc, Shahira Abousamra, Rajarsi Gupta, Joel H. Saltz, Peter K. Koo

Abstract: Histopathology remains the gold standard for diagnosis of various cancers. Recent advances in computer vision, specifically deep learning, have facilitated the analysis of histopathology images for various tasks, including immune cell detection and microsatellite instability classification. The state-of-the-art for each task often employs base architectures that have been pretrained for image clas… ▽ More Histopathology remains the gold standard for diagnosis of various cancers. Recent advances in computer vision, specifically deep learning, have facilitated the analysis of histopathology images for various tasks, including immune cell detection and microsatellite instability classification. The state-of-the-art for each task often employs base architectures that have been pretrained for image classification on ImageNet. The standard approach to develop classifiers in histopathology tends to focus narrowly on optimizing models for a single task, not considering the aspects of modeling innovations that improve generalization across tasks. Here we present ChampKit (Comprehensive Histopathology Assessment of Model Predictions toolKit): an extensible, fully reproducible benchmarking toolkit that consists of a broad collection of patch-level image classification tasks across different cancers. ChampKit enables a way to systematically document the performance impact of proposed improvements in models and methodology. ChampKit source code and data are freely accessible at https://github.com/kaczmarj/champkit . △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: Submitted to NeurIPS 2022 Track on Datasets and Benchmarks. Source code available at https://github.com/kaczmarj/champkit

ACM Class: J.3; I.4.9; D.2.13

arXiv:2010.13981 [pdf, other]

A Members First Approach to Enabling LinkedIn's Labor Market Insights at Scale

Authors: Ryan Rogers, Adrian Rivera Cardoso, Koray Mancuhan, Akash Kaura, Nikhil Gahlawat, Neha Jain, Paul Ko, Parvez Ahammad

Abstract: We describe the privatization method used in reporting labor market insights from LinkedIn's Economic Graph, including the differentially private algorithms used to protect member's privacy. The reports show who are the top employers, as well as what are the top jobs and skills in a given country/region and industry. We hope this data will help governments and citizens track labor market trends du… ▽ More We describe the privatization method used in reporting labor market insights from LinkedIn's Economic Graph, including the differentially private algorithms used to protect member's privacy. The reports show who are the top employers, as well as what are the top jobs and skills in a given country/region and industry. We hope this data will help governments and citizens track labor market trends during the COVID-19 pandemic while also protecting the privacy of our members. △ Less

Submitted 26 October, 2020; originally announced October 2020.

arXiv:1802.03581 [pdf]

2-gram-based Phonetic Feature Generation for Convolutional Neural Network in Assessment of Trademark Similarity

Authors: Kyung Pyo Ko, Kwang Hee Lee, Mi So Jang, Gun Hong Park

Abstract: A trademark is a mark used to identify various commodities. If same or similar trademark is registered for the same or similar commodity, the purchaser of the goods may be confused. Therefore, in the process of trademark registration examination, the examiner judges whether the trademark is the same or similar to the other applied or registered trademarks. The confusion in trademarks is based on t… ▽ More A trademark is a mark used to identify various commodities. If same or similar trademark is registered for the same or similar commodity, the purchaser of the goods may be confused. Therefore, in the process of trademark registration examination, the examiner judges whether the trademark is the same or similar to the other applied or registered trademarks. The confusion in trademarks is based on the visual, phonetic or conceptual similarity of the marks. In this paper, we focus specifically on the phonetic similarity between trademarks. We propose a method to generate 2D phonetic feature for convolutional neural network in assessment of trademark similarity. This proposed algorithm is tested with 12,553 trademark phonetic similar pairs and 34,020 trademark phonetic non-similar pairs from 2010 to 2016. As a result, we have obtained approximately 92% judgment accuracy. △ Less

Submitted 10 February, 2018; originally announced February 2018.

Comments: 10 pages, 6 figures, 10 tables

Showing 1–5 of 5 results for author: Ko, P