Search | arXiv e-print repository

Dreamitate: Real-World Visuomotor Policy Learning via Video Generation

Authors: Junbang Liang, Ruoshi Liu, Ege Ozguroglu, Sruthi Sudhakar, Achal Dave, Pavel Tokmakov, Shuran Song, Carl Vondrick

Abstract: A key challenge in manipulation is learning a policy that can robustly generalize to diverse visual environments. A promising mechanism for learning robust policies is to leverage video generative models, which are pretrained on large-scale datasets of internet videos. In this paper, we propose a visuomotor policy learning framework that fine-tunes a video diffusion model on human demonstrations o… ▽ More A key challenge in manipulation is learning a policy that can robustly generalize to diverse visual environments. A promising mechanism for learning robust policies is to leverage video generative models, which are pretrained on large-scale datasets of internet videos. In this paper, we propose a visuomotor policy learning framework that fine-tunes a video diffusion model on human demonstrations of a given task. At test time, we generate an example of an execution of the task conditioned on images of a novel scene, and use this synthesized execution directly to control the robot. Our key insight is that using common tools allows us to effortlessly bridge the embodiment gap between the human hand and the robot manipulator. We evaluate our approach on four tasks of increasing complexity and demonstrate that harnessing internet-scale generative models allows the learned policy to achieve a significantly higher degree of generalization than existing behavior cloning approaches. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Project page: https://dreamitate.cs.columbia.edu/

arXiv:2403.09566 [pdf, other]

PaperBot: Learning to Design Real-World Tools Using Paper

Authors: Ruoshi Liu, Junbang Liang, Sruthi Sudhakar, Huy Ha, Cheng Chi, Shuran Song, Carl Vondrick

Abstract: Paper is a cheap, recyclable, and clean material that is often used to make practical tools. Traditional tool design either relies on simulation or physical analysis, which is often inaccurate and time-consuming. In this paper, we propose PaperBot, an approach that directly learns to design and use a tool in the real world using paper without human intervention. We demonstrated the effectiveness a… ▽ More Paper is a cheap, recyclable, and clean material that is often used to make practical tools. Traditional tool design either relies on simulation or physical analysis, which is often inaccurate and time-consuming. In this paper, we propose PaperBot, an approach that directly learns to design and use a tool in the real world using paper without human intervention. We demonstrated the effectiveness and efficiency of PaperBot on two tool design tasks: 1. learning to fold and throw paper airplanes for maximum travel distance 2. learning to cut paper into grippers that exert maximum grip** force. We present a self-supervised learning framework that learns to perform a sequence of folding, cutting, and dynamic manipulation actions in order to optimize the design and use of a tool. We deploy our system to a real-world two-arm robotic system to solve challenging design tasks that involve aerodynamics (paper airplane) and friction (paper gripper) that are impossible to simulate accurately. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Project Website: https://paperbot.cs.columbia.edu/

arXiv:2307.00033 [pdf]

Application of data engineering approaches to address challenges in microbiome data for optimal medical decision-making

Authors: Isha Thombre, Pavan Kumar Perepu, Shyam Kumar Sudhakar

Abstract: The human gut microbiota is known to contribute to numerous physiological functions of the body and also implicated in a myriad of pathological conditions. Prolific research work in the past few decades have yielded valuable information regarding the relative taxonomic distribution of gut microbiota. Unfortunately, the microbiome data suffers from class imbalance and high dimensionality issues tha… ▽ More The human gut microbiota is known to contribute to numerous physiological functions of the body and also implicated in a myriad of pathological conditions. Prolific research work in the past few decades have yielded valuable information regarding the relative taxonomic distribution of gut microbiota. Unfortunately, the microbiome data suffers from class imbalance and high dimensionality issues that must be addressed. In this study, we have implemented data engineering algorithms to address the above-mentioned issues inherent to microbiome data. Four standard machine learning classifiers (logistic regression (LR), support vector machines (SVM), random forests (RF), and extreme gradient boosting (XGB) decision trees) were implemented on a previously published dataset. The issue of class imbalance and high dimensionality of the data was addressed through synthetic minority oversampling technique (SMOTE) and principal component analysis (PCA). Our results indicate that ensemble classifiers (RF and XGB decision trees) exhibit superior classification accuracy in predicting the host phenotype. The application of PCA significantly reduced testing time while maintaining high classification accuracy. The highest classification accuracy was obtained at the levels of species for most classifiers. The prototype employed in the study addresses the issues inherent to microbiome datasets and could be highly beneficial for providing personalized medicine. △ Less

Submitted 11 July, 2023; v1 submitted 30 June, 2023; originally announced July 2023.

arXiv:2306.04482 [pdf, other]

ICON$^2$: Reliably Benchmarking Predictive Inequity in Object Detection

Authors: Sruthi Sudhakar, Viraj Prabhu, Olga Russakovsky, Judy Hoffman

Abstract: As computer vision systems are being increasingly deployed at scale in high-stakes applications like autonomous driving, concerns about social bias in these systems are rising. Analysis of fairness in real-world vision systems, such as object detection in driving scenes, has been limited to observing predictive inequity across attributes such as pedestrian skin tone, and lacks a consistent methodo… ▽ More As computer vision systems are being increasingly deployed at scale in high-stakes applications like autonomous driving, concerns about social bias in these systems are rising. Analysis of fairness in real-world vision systems, such as object detection in driving scenes, has been limited to observing predictive inequity across attributes such as pedestrian skin tone, and lacks a consistent methodology to disentangle the role of confounding variables e.g. does my model perform worse for a certain skin tone, or are such scenes in my dataset more challenging due to occlusion and crowds? In this work, we introduce ICON$^2$, a framework for robustly answering this question. ICON$^2$ leverages prior knowledge on the deficiencies of object detection systems to identify performance discrepancies across sub-populations, compute correlations between these potential confounders and a given sensitive attribute, and control for the most likely confounders to obtain a more reliable estimate of model bias. Using our approach, we conduct an in-depth study on the performance of object detection with respect to income from the BDD100K driving dataset, revealing useful insights. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: Accepted to CVPR 2023 SSAD Workshop

arXiv:2302.04358 [pdf, other]

Mitigating Bias in Visual Transformers via Targeted Alignment

Authors: Sruthi Sudhakar, Viraj Prabhu, Arvindkumar Krishnakumar, Judy Hoffman

Abstract: As transformer architectures become increasingly prevalent in computer vision, it is critical to understand their fairness implications. We perform the first study of the fairness of transformers applied to computer vision and benchmark several bias mitigation approaches from prior work. We visualize the feature space of the transformer self-attention modules and discover that a significant portio… ▽ More As transformer architectures become increasingly prevalent in computer vision, it is critical to understand their fairness implications. We perform the first study of the fairness of transformers applied to computer vision and benchmark several bias mitigation approaches from prior work. We visualize the feature space of the transformer self-attention modules and discover that a significant portion of the bias is encoded in the query matrix. With this knowledge, we propose TADeT, a targeted alignment strategy for debiasing transformers that aims to discover and remove bias primarily from query matrix features. We measure performance using Balanced Accuracy and Standard Accuracy, and fairness using Equalized Odds and Balanced Accuracy Difference. TADeT consistently leads to improved fairness over prior work on multiple attribute prediction tasks on the CelebA dataset, without compromising performance. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2110.15499 [pdf, other]

UDIS: Unsupervised Discovery of Bias in Deep Visual Recognition Models

Authors: Arvindkumar Krishnakumar, Viraj Prabhu, Sruthi Sudhakar, Judy Hoffman

Abstract: Deep learning models have been shown to learn spurious correlations from data that sometimes lead to systematic failures for certain subpopulations. Prior work has typically diagnosed this by crowdsourcing annotations for various protected attributes and measuring performance, which is both expensive to acquire and difficult to scale. In this work, we propose UDIS, an unsupervised algorithm for su… ▽ More Deep learning models have been shown to learn spurious correlations from data that sometimes lead to systematic failures for certain subpopulations. Prior work has typically diagnosed this by crowdsourcing annotations for various protected attributes and measuring performance, which is both expensive to acquire and difficult to scale. In this work, we propose UDIS, an unsupervised algorithm for surfacing and analyzing such failure modes. UDIS identifies subpopulations via hierarchical clustering of dataset embeddings and surfaces systematic failure modes by visualizing low performing clusters along with their gradient-weighted class-activation maps. We show the effectiveness of UDIS in identifying failure modes in models trained for image classification on the CelebA and MSCOCO datasets. △ Less

Submitted 28 October, 2021; originally announced October 2021.

arXiv:0903.4770 [pdf]

Act of CVT and EVT In The Formation of Number-Theoretic Fractals

Authors: Pal Choudhury Pabitra, Sahoo Sudhakar, Nayak Birendra Kumar, Hassan Sk. Sarif

Abstract: In this paper we have defined two functions that have been used to construct different fractals having fractal dimensions between 1 and 2. More precisely, we can say that one of our defined functions produce the fractals whose fractal dimension lies in [1.58, 2) and rest function produce the fractals whose fractal dimension lies in (1, 1.58]. Also we tried to calculate the amount of increment of… ▽ More In this paper we have defined two functions that have been used to construct different fractals having fractal dimensions between 1 and 2. More precisely, we can say that one of our defined functions produce the fractals whose fractal dimension lies in [1.58, 2) and rest function produce the fractals whose fractal dimension lies in (1, 1.58]. Also we tried to calculate the amount of increment of fractal dimension in accordance with base of the number systems. And in switching of fractals from one base to another, the increment of fractal dimension is constant, which is 1.58, its quite surprising! △ Less

Submitted 27 March, 2009; originally announced March 2009.

Comments: 15 pages

Showing 1–7 of 7 results for author: Sudhakar, S