Search | arXiv e-print repository

VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

Authors: Michael Ahn, Montserrat Gonzalez Arenas, Matthew Bennice, Noah Brown, Christine Chan, Byron David, Anthony Francis, Gavin Gonzalez, Rainer Hessmer, Tomas Jackson, Nikhil J Joshi, Daniel Lam, Tsang-Wei Edward Lee, Alex Luong, Sharath Maddineni, Harsh Patel, Jodilyn Peralta, Jornell Quiambao, Diego Reyes, Rosario M Jauregui Ruano, Dorsa Sadigh, Pannag Sanketi, Leila Takayama, Pavel Vodenski, Fei Xia

Abstract: Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon ta… ▽ More Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon tasks with the help of humans or other robots. VADER leverages visual question answering (VQA) modules to detect visual affordances and recognize execution errors. It then generates prompts for a language model planner (LMP) which decides when to seek help from another robot or human to recover from errors in long-horizon task execution. We show the effectiveness of VADER with two long-horizon robotic tasks. Our pilot study showed that VADER is capable of performing complex long-horizon tasks by asking for help from another robot to clear a table. Our user study showed that VADER is capable of performing complex long-horizon tasks by asking for help from a human to clear a path. We gathered feedback from people (N=19) about the performance of the VADER performance vs. a robot that did not ask for help. https://google-vader.github.io/ △ Less

Submitted 30 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 9 pages, 4 figures

arXiv:2404.09790 [pdf, other]

NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, **hua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge is to obtain designs/solutions with the most advanced SR performance, with no constraints on computational resources (e.g., model size and FLOPs) or training data. The track of this challenge assesses performance with the PSNR metric on the DIV2K testing dataset. The competition attracted 199 registrants, with 20 teams submitting valid entries. This collective endeavour not only pushes the boundaries of performance in single-image SR but also offers a comprehensive overview of current trends in this field. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

arXiv:2403.12943 [pdf, other]

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

Authors: Vidhi Jain, Maria Attarian, Nikhil J Joshi, Ayzaan Wahid, Danny Driess, Quan Vuong, Pannag R Sanketi, Pierre Sermanet, Stefan Welker, Christine Chan, Igor Gilitschenski, Yonatan Bisk, Debidatta Dwibedi

Abstract: While large-scale robotic systems typically rely on textual instructions for tasks, this work explores a different approach: can robots infer the task directly from observing humans? This shift necessitates the robot's ability to decode human intent and translate it into executable actions within its physical constraints and environment. We introduce Vid2Robot, a novel end-to-end video-based learn… ▽ More While large-scale robotic systems typically rely on textual instructions for tasks, this work explores a different approach: can robots infer the task directly from observing humans? This shift necessitates the robot's ability to decode human intent and translate it into executable actions within its physical constraints and environment. We introduce Vid2Robot, a novel end-to-end video-based learning framework for robots. Given a video demonstration of a manipulation task and current visual observations, Vid2Robot directly produces robot actions. This is achieved through a unified representation model trained on a large dataset of human video and robot trajectory. The model leverages cross-attention mechanisms to fuse prompt video features to the robot's current state and generate appropriate actions that mimic the observed task. To further improve policy performance, we propose auxiliary contrastive losses that enhance the alignment between human and robot video representations. We evaluate Vid2Robot on real-world robots, demonstrating a 20% improvement in performance compared to other video-conditioned policies when using human demonstration videos. Additionally, our model exhibits emergent capabilities, such as successfully transferring observed motions from one object to another, and long-horizon composition, thus showcasing its potential for real-world applications. Project website: vid2robot.github.io △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Robot learning: Imitation Learning, Robot Perception, Sensing & Vision, Gras** & Manipulation

arXiv:2311.00899 [pdf, other]

RoboVQA: Multimodal Long-Horizon Reasoning for Robotics

Authors: Pierre Sermanet, Tianli Ding, Jeffrey Zhao, Fei Xia, Debidatta Dwibedi, Keerthana Gopalakrishnan, Christine Chan, Gabriel Dulac-Arnold, Sharath Maddineni, Nikhil J Joshi, Pete Florence, Wei Han, Robert Baruch, Yao Lu, Suvir Mirchandani, Peng Xu, Pannag Sanketi, Karol Hausman, Izhak Shafran, Brian Ichter, Yuan Cao

Abstract: We present a scalable, bottom-up and intrinsically diverse data collection scheme that can be used for high-level reasoning with long and medium horizons and that has 2.2x higher throughput compared to traditional narrow top-down step-by-step collection. We collect realistic data by performing any user requests within the entirety of 3 office buildings and using multiple robot and human embodiment… ▽ More We present a scalable, bottom-up and intrinsically diverse data collection scheme that can be used for high-level reasoning with long and medium horizons and that has 2.2x higher throughput compared to traditional narrow top-down step-by-step collection. We collect realistic data by performing any user requests within the entirety of 3 office buildings and using multiple robot and human embodiments. With this data, we show that models trained on all embodiments perform better than ones trained on the robot data only, even when evaluated solely on robot episodes. We find that for a fixed collection budget it is beneficial to take advantage of cheaper human collection along with robot collection. We release a large and highly diverse (29,520 unique instructions) dataset dubbed RoboVQA containing 829,502 (video, text) pairs for robotics-focused visual question answering. We also demonstrate how evaluating real robot experiments with an intervention mechanism enables performing tasks to completion, making it deployable with human oversight even if imperfect while also providing a single performance metric. We demonstrate a single video-conditioned model named RoboVQA-VideoCoCa trained on our dataset that is capable of performing a variety of grounded high-level reasoning tasks in broad realistic settings with a cognitive intervention rate 46% lower than the zero-shot state of the art visual language model (VLM) baseline and is able to guide real robots through long-horizon tasks. The performance gap with zero-shot state-of-the-art models indicates that a lot of grounded data remains to be collected for real-world deployment, emphasizing the critical need for scalable data collection approaches. Finally, we show that video VLMs significantly outperform single-image VLMs with an average error rate reduction of 19% across all VQA tasks. Data and videos available at https://robovqa.github.io △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.14342 [pdf]

PulmoBell: Home-based Pulmonary Rehabilitation Assistive Technology for People with COPD

Authors: Yuanxiang Ma, Andreas Polydorides, Jitesh Joshi, Youngjun Cho

Abstract: Chronic Obstructive Pulmonary Disease (COPD) can be fatal and is challenging to live with due to its severe symptoms. Pulmonary rehabilitation (PR) is one of the managements means to maintain COPD in a stable status. However, implementation of PR in the UK has been challenging due to the environmental and personal barriers faced by patients, which hinder their uptake, adherence, and completion of… ▽ More Chronic Obstructive Pulmonary Disease (COPD) can be fatal and is challenging to live with due to its severe symptoms. Pulmonary rehabilitation (PR) is one of the managements means to maintain COPD in a stable status. However, implementation of PR in the UK has been challenging due to the environmental and personal barriers faced by patients, which hinder their uptake, adherence, and completion of the programmes. Moreover, increased exercise capacity following PR does not always translate into physical activity (PA) and unfortunately, can lead back to exercise capacity seen prior to PR. Current alternative solutions using telerehabilitation methods have limitations on addressing these accessibility problems, and no clear conclusion can be drawn on the efficacy of telerehabilitation in enhancing the sustainability of PR outcomes via promoting PA in patients' everyday life. In this work, the authors propose a novel design of sensor-based assistive product with the aim of facilitating PR and promoting PA maintenance in a home-based setting. Prototypes of different levels of fidelity are presented, followed by an evaluation plan for future research directions. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: Short Technical Report: Best student-led project in COMP0145: Research Methods and Making Skills (2022/23)

arXiv:2310.08864 [pdf, other]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, A**kya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io. △ Less

Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Project website: https://robotics-transformer-x.github.io

arXiv:2309.07297 [pdf, other]

Multi-Modal Hybrid Learning and Sequential Training for RGB-T Saliency Detection

Authors: Guangyu Ren, Jitesh Joshi, Youngjun Cho

Abstract: RGB-T saliency detection has emerged as an important computer vision task, identifying conspicuous objects in challenging scenes such as dark environments. However, existing methods neglect the characteristics of cross-modal features and rely solely on network structures to fuse RGB and thermal features. To address this, we first propose a Multi-Modal Hybrid loss (MMHL) that comprises supervised a… ▽ More RGB-T saliency detection has emerged as an important computer vision task, identifying conspicuous objects in challenging scenes such as dark environments. However, existing methods neglect the characteristics of cross-modal features and rely solely on network structures to fuse RGB and thermal features. To address this, we first propose a Multi-Modal Hybrid loss (MMHL) that comprises supervised and self-supervised loss functions. The supervised loss component of MMHL distinctly utilizes semantic features from different modalities, while the self-supervised loss component reduces the distance between RGB and thermal features. We further consider both spatial and channel information during feature fusion and propose the Hybrid Fusion Module to effectively fuse RGB and thermal features. Lastly, instead of jointly training the network with cross-modal features, we implement a sequential training strategy which performs training only on RGB images in the first stage and then learns cross-modal features in the second stage. This training strategy improves saliency detection performance without computational overhead. Results from performance evaluation and ablation studies demonstrate the superior performance achieved by the proposed method compared with the existing state-of-the-art methods. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: 8 Pages main text, 3 pages supplementary information, 12 figures

arXiv:2308.02756 [pdf, other]

PhysioKit: Open-source, Low-cost Physiological Computing Toolkit for Single and Multi-user Studies

Authors: Jitesh Joshi, Katherine Wang, Youngjun Cho

Abstract: The proliferation of physiological sensors opens new opportunities to explore interactions, conduct experiments and evaluate the user experience with continuous monitoring of bodily functions. Commercial devices, however, can be costly or limit access to raw waveform data, while low-cost sensors are efforts-intensive to setup. To address these challenges, we introduce PhysioKit, an open-source, lo… ▽ More The proliferation of physiological sensors opens new opportunities to explore interactions, conduct experiments and evaluate the user experience with continuous monitoring of bodily functions. Commercial devices, however, can be costly or limit access to raw waveform data, while low-cost sensors are efforts-intensive to setup. To address these challenges, we introduce PhysioKit, an open-source, low-cost physiological computing toolkit. PhysioKit provides a one-stop pipeline consisting of (i) a sensing and data acquisition layer that can be configured in a modular manner per research needs, (ii) a software application layer that enables data acquisition, real-time visualization and machine learning (ML)-enabled signal quality assessment. This also supports basic visual biofeedback configurations and synchronized acquisition for co-located or remote multi-user settings. In a validation study with 16 participants, PhysioKit shows strong agreement with research-grade sensors on measuring heart rate and heart rate variability metrics data. Furthermore, we report usability survey results from 10 small-project teams (44 individual members in total) who used PhysioKit for 4-6 weeks, providing insights into its use cases and research benefits. Lastly, we discuss the extensibility and potential impact of the toolkit on the research community. △ Less

Submitted 12 September, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

Comments: 25 pages, 8 figures, 4 tables

arXiv:2212.06817 [pdf, other]

RT-1: Robotics Transformer for Real-World Control at Scale

Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deeksha Manjunath , et al. (26 additional authors not shown)

Abstract: By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, wher… ▽ More By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer1.github.io △ Less

Submitted 11 August, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

Comments: See website at robotics-transformer1.github.io

arXiv:2209.10700 [pdf, other]

Self-adversarial Multi-scale Contrastive Learning for Semantic Segmentation of Thermal Facial Images

Authors: Jitesh Joshi, Nadia Bianchi-Berthouze, Youngjun Cho

Abstract: Segmentation of thermal facial images is a challenging task. This is because facial features often lack salience due to high-dynamic thermal range scenes and occlusion issues. Limited availability of datasets from unconstrained settings further limits the use of the state-of-the-art segmentation networks, loss functions and learning strategies which have been built and validated for RGB images. To… ▽ More Segmentation of thermal facial images is a challenging task. This is because facial features often lack salience due to high-dynamic thermal range scenes and occlusion issues. Limited availability of datasets from unconstrained settings further limits the use of the state-of-the-art segmentation networks, loss functions and learning strategies which have been built and validated for RGB images. To address the challenge, we propose Self-Adversarial Multi-scale Contrastive Learning (SAM-CL) framework as a new training strategy for thermal image segmentation. SAM-CL framework consists of a SAM-CL loss function and a thermal image augmentation (TiAug) module as a domain-specific augmentation technique. We use the Thermal-Face-Database to demonstrate effectiveness of our approach. Experiments conducted on the existing segmentation networks (UNET, Attention-UNET, DeepLabV3 and HRNetv2) evidence the consistent performance gains from the SAM-CL framework. Furthermore, we present a qualitative analysis with UBComfort and DeepBreath datasets to discuss how our proposed methods perform in handling unconstrained situations. △ Less

Submitted 7 October, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

Comments: Accepted at the British Machine Vision Conference (BMVC), 2022

arXiv:2204.01691 [pdf, other]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Authors: Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee , et al. (20 additional authors not shown)

Abstract: Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embo… ▽ More Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show the need for real-world grounding and that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. The project's website and the video can be found at https://say-can.github.io/. △ Less

Submitted 16 August, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

Comments: See website at https://say-can.github.io/ V1. Initial Upload. V2. Added PaLM results. Added study about new capabilities (drawer manipulation, chain of thought prompting, multilingual instructions). Added an ablation study of language model size. Added an open-source version of \algname on a simulated tabletop environment. Improved readability

arXiv:2109.04194 [pdf, other]

Novel Time Domain Based Upper-Limb Prosthesis Control using Incremental Learning Approach

Authors: Sidharth Pancholi, Amit M. Joshi Deepak Joshi, Bradly S. Duerstock

Abstract: The upper limb of the body is a vital for various kind of activities for human. The complete or partial loss of the upper limb would lead to a significant impact on daily activities of the amputees. EMG carries important information of human physique which helps to decode the various functionalities of human arm. EMG signal based bionics and prosthesis have gained huge research attention over the… ▽ More The upper limb of the body is a vital for various kind of activities for human. The complete or partial loss of the upper limb would lead to a significant impact on daily activities of the amputees. EMG carries important information of human physique which helps to decode the various functionalities of human arm. EMG signal based bionics and prosthesis have gained huge research attention over the past decade. Conventional EMG-PR based prosthesis struggles to give accurate performance due to off-line training used and incapability to compensate for electrode position shift and change in arm position. This work proposes online training and incremental learning based system for upper limb prosthetic application. This system consists of ADS1298 as AFE (analog front end) and a 32 bit arm cortex-m4 processor for DSP (digital signal processing). The system has been tested for both intact and amputated subjects. Time derivative moment based features have been implemented and utilized for effective pattern classification. Initially, system have been trained for four classes using the on-line training process later on the number of classes have been incremented on user demand till eleven, and system performance has been evaluated. The system yielded a completion rate of 100% for healthy and amputated subjects when four motions have been considered. Further 94.33% and 92% completion rate have been showcased by the system when the number of classes increased to eleven for healthy and amputees respectively. The motion efficacy test is also evaluated for all the subjects. The highest efficacy rate of 91.23% and 88.64% are observed for intact and amputated subjects respectively. △ Less

Submitted 13 January, 2024; v1 submitted 25 August, 2021; originally announced September 2021.

Comments: 15 Pages, 8 Figures, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2108.04417 [pdf, other]

Privacy-Preserving Machine Learning: Methods, Challenges and Directions

Authors: Runhua Xu, Nathalie Baracaldo, James Joshi

Abstract: Machine learning (ML) is increasingly being adopted in a wide variety of application domains. Usually, a well-performing ML model relies on a large volume of training data and high-powered computational resources. Such a need for and the use of huge volumes of data raise serious privacy concerns because of the potential risks of leakage of highly privacy-sensitive information; further, the evolvin… ▽ More Machine learning (ML) is increasingly being adopted in a wide variety of application domains. Usually, a well-performing ML model relies on a large volume of training data and high-powered computational resources. Such a need for and the use of huge volumes of data raise serious privacy concerns because of the potential risks of leakage of highly privacy-sensitive information; further, the evolving regulatory environments that increasingly restrict access to and use of privacy-sensitive data add significant challenges to fully benefiting from the power of ML for data-driven applications. A trained ML model may also be vulnerable to adversarial attacks such as membership, attribute, or property inference attacks and model inversion attacks. Hence, well-designed privacy-preserving ML (PPML) solutions are critically needed for many emerging applications. Increasingly, significant research efforts from both academia and industry can be seen in PPML areas that aim toward integrating privacy-preserving techniques into ML pipeline or specific algorithms, or designing various PPML architectures. In particular, existing PPML research cross-cut ML, systems and applications design, as well as security and privacy areas; hence, there is a critical need to understand state-of-the-art research, related challenges and a research roadmap for future research in PPML area. In this paper, we systematically review and summarize existing privacy-preserving approaches and propose a Phase, Guarantee, and Utility (PGU) triad based model to understand and guide the evaluation of various PPML solutions by decomposing their privacy-preserving functionalities. We discuss the unique characteristics and challenges of PPML and outline possible research directions that leverage as well as benefit multiple research communities such as ML, distributed systems, security and privacy. △ Less

Submitted 22 September, 2021; v1 submitted 9 August, 2021; originally announced August 2021.

arXiv:2105.08587 [pdf, other]

Adaptive ABAC Policy Learning: A Reinforcement Learning Approach

Authors: Leila Karimi, Mai Abdelhakim, James Joshi

Abstract: With rapid advances in computing systems, there is an increasing demand for more effective and efficient access control (AC) approaches. Recently, Attribute Based Access Control (ABAC) approaches have been shown to be promising in fulfilling the AC needs of such emerging complex computing environments. An ABAC model grants access to a requester based on attributes of entities in a system and an au… ▽ More With rapid advances in computing systems, there is an increasing demand for more effective and efficient access control (AC) approaches. Recently, Attribute Based Access Control (ABAC) approaches have been shown to be promising in fulfilling the AC needs of such emerging complex computing environments. An ABAC model grants access to a requester based on attributes of entities in a system and an authorization policy; however, its generality and flexibility come with a higher cost. Further, increasing complexities of organizational systems and the need for federated accesses to their resources make the task of AC enforcement and management much more challenging. In this paper, we propose an adaptive ABAC policy learning approach to automate the authorization management task. We model ABAC policy learning as a reinforcement learning problem. In particular, we propose a contextual bandit system, in which an authorization engine adapts an ABAC model through a feedback control loop; it relies on interacting with users/administrators of the system to receive their feedback that assists the model in making authorization decisions. We propose four methods for initializing the learning model and a planning approach based on attribute value hierarchy to accelerate the learning process. We focus on develo** an adaptive ABAC policy learning model for a home IoT environment as a running example. We evaluate our proposed approach over real and synthetic data. We consider both complete and sparse datasets in our evaluations. Our experimental results show that the proposed approach achieves performance that is comparable to ones based on supervised learning in many scenarios and even outperforms them in several situations. △ Less

Submitted 18 May, 2021; originally announced May 2021.

arXiv:2104.11349 [pdf]

Scalable Predictive Time-Series Analysis of COVID-19: Cases and Fatalities

Authors: Shradha Shinde, Jay Joshi, Sowmya Mareedu, Yeon Pyo Kim, Jongwook Woo

Abstract: COVID 19 is an acute disease that started spreading throughout the world, beginning in December 2019. It has spread worldwide and has affected more than 7 million people, and 200 thousand people have died due to this infection as of Oct 2020. In this paper, we have forecasted the number of deaths and the confirmed cases in Los Angeles and New York of the United States using the traditional and Big… ▽ More COVID 19 is an acute disease that started spreading throughout the world, beginning in December 2019. It has spread worldwide and has affected more than 7 million people, and 200 thousand people have died due to this infection as of Oct 2020. In this paper, we have forecasted the number of deaths and the confirmed cases in Los Angeles and New York of the United States using the traditional and Big Data platforms based on the Times Series: ARIMA and ETS. We also implemented a more sophisticated time-series forecast model using Facebook Prophet API. Furthermore, we developed the classification models: Logistic Regression and Random Forest regression to show that the Weather does not affect the number of the confirmed cases. The models are built and run in legacy systems (Azure ML Studio) and Big Data systems (Oracle Cloud and Databricks). Besides, we present the accuracy of the models. △ Less

Submitted 22 April, 2021; originally announced April 2021.

Comments: 8 pages, 7 figures, 4 tables

arXiv:2103.03918 [pdf, other]

FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data

Authors: Runhua Xu, Nathalie Baracaldo, Yi Zhou, Ali Anwar, James Joshi, Heiko Ludwig

Abstract: Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties where each party can keep its data private. In this paradigm, only model updates, such as model weights or gradients, are shared. Many existing approaches have focused on horizontal FL, where each party has the entire feature set and labels in the training data set. Howe… ▽ More Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties where each party can keep its data private. In this paradigm, only model updates, such as model weights or gradients, are shared. Many existing approaches have focused on horizontal FL, where each party has the entire feature set and labels in the training data set. However, many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes; this allows FedV to achieve faster training times. It also works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple types of ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer with respect to the state-of-the-art approaches. △ Less

Submitted 16 June, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

arXiv:2102.01249 [pdf, other]

doi 10.1109/TDSC.2022.3179698

Blockchain-based Transparency Framework for Privacy Preserving Third-party Services

Authors: Runhua Xu, Chao Li, James Joshi

Abstract: Increasingly, information systems rely on computational, storage, and network resources deployed in third-party facilities such as cloud centers and edge nodes. Such an approach further exacerbates cybersecurity concerns constantly raised by numerous incidents of security and privacy attacks resulting in data leakage and identity theft, among others. These have, in turn, forced the creation of str… ▽ More Increasingly, information systems rely on computational, storage, and network resources deployed in third-party facilities such as cloud centers and edge nodes. Such an approach further exacerbates cybersecurity concerns constantly raised by numerous incidents of security and privacy attacks resulting in data leakage and identity theft, among others. These have, in turn, forced the creation of stricter security and privacy-related regulations and have eroded the trust in cyberspace. In particular, security-related services and infrastructures, such as Certificate Authorities (CAs) that provide digital certificate services and Third-Party Authorities (TPAs) that provide cryptographic key services, are critical components for establishing trust in crypto-based privacy-preserving applications and services. To address such trust issues, various transparency frameworks and approaches have been recently proposed in the literature. This paper proposes TAB framework that provides transparency and trustworthiness of third-party authority and third-party facilities using blockchain techniques for emerging crypto-based privacy-preserving applications. TAB employs the Ethereum blockchain as the underlying public ledger and also includes a novel smart contract to automate accountability with an incentive mechanism that motivates users to participate in auditing, and punishes unintentional or malicious behaviors. We implement TAB and show through experimental evaluation in the Ethereum official test network, Rinkeby, that the framework is efficient. We also formally show the security guarantee provided by TAB, and analyze the privacy guarantee and trustworthiness it provides. △ Less

Submitted 3 June, 2022; v1 submitted 1 February, 2021; originally announced February 2021.

Comments: 12 pages

Journal ref: IEEE TDSC 2022

arXiv:2012.10547 [pdf, other]

NN-EMD: Efficiently Training Neural Networks using Encrypted Multi-Sourced Datasets

Authors: Runhua Xu, James Joshi, Chao Li

Abstract: Training a machine learning model over an encrypted dataset is an existing promising approach to address the privacy-preserving machine learning task, however, it is extremely challenging to efficiently train a deep neural network (DNN) model over encrypted data for two reasons: first, it requires large-scale computation over huge datasets; second, the existing solutions for computation over encry… ▽ More Training a machine learning model over an encrypted dataset is an existing promising approach to address the privacy-preserving machine learning task, however, it is extremely challenging to efficiently train a deep neural network (DNN) model over encrypted data for two reasons: first, it requires large-scale computation over huge datasets; second, the existing solutions for computation over encrypted data, such as homomorphic encryption, is inefficient. Further, for an enhanced performance of a DNN model, we also need to use huge training datasets composed of data from multiple data sources that may not have pre-established trust relationships among each other. We propose a novel framework, NN-EMD, to train DNN over multiple encrypted datasets collected from multiple sources. Toward this, we propose a set of secure computation protocols using hybrid functional encryption schemes. We evaluate our framework for performance with regards to the training time and model accuracy on the MNIST datasets. Compared to other existing frameworks, our proposed NN-EMD framework can significantly reduce the training time, while providing comparable model accuracy and privacy guarantees as well as supporting multiple data sources. Furthermore, the depth and complexity of neural networks do not affect the training time despite introducing a privacy-preserving NN-EMD setting. △ Less

Submitted 17 April, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

arXiv:2011.06191 [pdf, other]

Revisiting Secure Computation Using Functional Encryption: Opportunities and Research Directions

Authors: Runhua Xu, James Joshi

Abstract: Increasing incidents of security compromises and privacy leakage have raised serious privacy concerns related to cyberspace. Such privacy concerns have been instrumental in the creation of several regulations and acts to restrict the availability and use of privacy-sensitive data. The secure computation problem, initially and formally introduced as secure two-party computation by Andrew Yao in 198… ▽ More Increasing incidents of security compromises and privacy leakage have raised serious privacy concerns related to cyberspace. Such privacy concerns have been instrumental in the creation of several regulations and acts to restrict the availability and use of privacy-sensitive data. The secure computation problem, initially and formally introduced as secure two-party computation by Andrew Yao in 1986, has been the focus of intense research in academia because of its fundamental role in building many of the existing privacy-preserving approaches. Most of the existing secure computation solutions rely on garbled-circuits and homomorphic encryption techniques to tackle secure computation issues, including efficiency and security guarantees. However, it is still challenging to adopt these secure computation approaches in emerging compute-intensive and data-intensive applications such as emerging machine learning solutions. Recently proposed functional encryption scheme has shown its promise as an underlying secure computation foundation in recent privacy-preserving machine learning approaches proposed. This paper revisits the secure computation problem using emerging and promising functional encryption techniques and presents a comprehensive study. We first briefly summarize existing conventional secure computation approaches built on garbled-circuits, oblivious transfer, and homomorphic encryption techniques. Then, we elaborate on the unique characteristics and challenges of emerging functional encryption based secure computation approaches and outline several research directions. △ Less

Submitted 7 December, 2020; v1 submitted 11 November, 2020; originally announced November 2020.

Comments: 15 pages, 2 figures, IEEE TPS 2020

arXiv:2009.13323 [pdf]

AI Progress in Skin Lesion Analysis

Authors: Philippe M. Burlina, William Paul, Phil A. Mathew, Neil J. Joshi, Alison W. Rebman, John N. Aucott

Abstract: We examine progress in the use of AI for detecting skin lesions, with particular emphasis on the erythema migrans rash of acute Lyme disease, and other lesions, such as those from conditions like herpes zoster (shingles), tinea corporis, erythema multiforme, cellulitis, insect bites, or tick bites. We discuss important challenges for these applications, in particular the problems of AI bias regard… ▽ More We examine progress in the use of AI for detecting skin lesions, with particular emphasis on the erythema migrans rash of acute Lyme disease, and other lesions, such as those from conditions like herpes zoster (shingles), tinea corporis, erythema multiforme, cellulitis, insect bites, or tick bites. We discuss important challenges for these applications, in particular the problems of AI bias regarding the lack of skin images in dark skinned individuals, being able to accurately detect, delineate, and segment lesions or regions of interest compared to normal skin in images, and low shot learning (addressing classification with a paucity of training images). Solving these problems ranges from being highly desirable requirements -- e.g. for delineation, which may be useful to disambiguate between similar types of lesions, and perform improved diagnostics -- or required, as is the case for AI de-biasing, to allow for the deployment of fair AI techniques in the clinic for skin lesion analysis. For the problem of low shot learning in particular, we report skin analysis algorithms that gracefully degrade and still perform well at low shots, when compared to baseline algorithms: when using a little as 10 training exemplars per class, the baseline DL algorithm performance significantly degrades, with accuracy of 56.41%, close to chance, whereas the best performing low shot algorithm yields an accuracy of 85.26%. △ Less

Submitted 9 October, 2020; v1 submitted 28 September, 2020; originally announced September 2020.

arXiv:2003.07270 [pdf, other]

An Automatic Attribute Based Access Control Policy Extraction from Access Logs

Authors: Leila Karimi, Maryam Aldairi, James Joshi, Mai Abdelhakim

Abstract: With the rapid advances in computing and information technologies, traditional access control models have become inadequate in terms of capturing fine-grained, and expressive security requirements of newly emerging applications. An attribute-based access control (ABAC) model provides a more flexible approach for addressing the authorization needs of complex and dynamic systems. While organizations… ▽ More With the rapid advances in computing and information technologies, traditional access control models have become inadequate in terms of capturing fine-grained, and expressive security requirements of newly emerging applications. An attribute-based access control (ABAC) model provides a more flexible approach for addressing the authorization needs of complex and dynamic systems. While organizations are interested in employing newer authorization models, migrating to such models pose as a significant challenge. Many large-scale businesses need to grant authorization to their user populations that are potentially distributed across disparate and heterogeneous computing environments. Each of these computing environments may have its own access control model. The manual development of a single policy framework for an entire organization is tedious, costly, and error-prone. In this paper, we present a methodology for automatically learning ABAC policy rules from access logs of a system to simplify the policy development process. The proposed approach employs an unsupervised learning-based algorithm for detecting patterns in access logs and extracting ABAC authorization rules from these patterns. In addition, we present two policy improvement algorithms, including rule pruning and policy refinement algorithms to generate a higher quality mined policy. Finally, we implement a prototype of the proposed approach to demonstrate its feasibility. △ Less

Submitted 30 January, 2021; v1 submitted 16 March, 2020; originally announced March 2020.

arXiv:1904.07303 [pdf, other]

CryptoNN: Training Neural Networks over Encrypted Data

Authors: Runhua Xu, James B. D. Joshi, Chao Li

Abstract: Emerging neural networks based machine learning techniques such as deep learning and its variants have shown tremendous potential in many application domains. However, they raise serious privacy concerns due to the risk of leakage of highly privacy-sensitive data when data collected from users is used to train neural network models to support predictive tasks. To tackle such serious privacy concer… ▽ More Emerging neural networks based machine learning techniques such as deep learning and its variants have shown tremendous potential in many application domains. However, they raise serious privacy concerns due to the risk of leakage of highly privacy-sensitive data when data collected from users is used to train neural network models to support predictive tasks. To tackle such serious privacy concerns, several privacy-preserving approaches have been proposed in the literature that use either secure multi-party computation (SMC) or homomorphic encryption (HE) as the underlying mechanisms. However, neither of these cryptographic approaches provides an efficient solution towards constructing a privacy-preserving machine learning model, as well as supporting both the training and inference phases. To tackle the above issue, we propose a CryptoNN framework that supports training a neural network model over encrypted data by using the emerging functional encryption scheme instead of SMC or HE. We also construct a functional encryption scheme for basic arithmetic computation to support the requirement of the proposed CryptoNN framework. We present performance evaluation and security analysis of the underlying crypto scheme and show through our experiments that CryptoNN achieves accuracy that is similar to those of the baseline neural network models on the MNIST dataset. △ Less

Submitted 26 April, 2019; v1 submitted 15 April, 2019; originally announced April 2019.

Comments: ePrint

arXiv:1808.07269 [pdf, other]

doi 10.1103/PhysRevD.99.092001

A Deep Neural Network for Pixel-Level Electromagnetic Particle Identification in the MicroBooNE Liquid Argon Time Projection Chamber

Authors: MicroBooNE collaboration, C. Adams, M. Alrashed, R. An, J. Anthony, J. Asaadi, A. Ashkenazi, M. Auger, S. Balasubramanian, B. Baller, C. Barnes, G. Barr, M. Bass, F. Bay, A. Bhat, K. Bhattacharya, M. Bishai, A. Blake, T. Bolton, L. Camilleri, D. Caratelli, I. Caro Terrazas, R. Carr, R. Castillo Fernandez, F. Cavanna , et al. (148 additional authors not shown)

Abstract: We have developed a convolutional neural network (CNN) that can make a pixel-level prediction of objects in image data recorded by a liquid argon time projection chamber (LArTPC) for the first time. We describe the network design, training techniques, and software tools developed to train this network. The goal of this work is to develop a complete deep neural network based data reconstruction cha… ▽ More We have developed a convolutional neural network (CNN) that can make a pixel-level prediction of objects in image data recorded by a liquid argon time projection chamber (LArTPC) for the first time. We describe the network design, training techniques, and software tools developed to train this network. The goal of this work is to develop a complete deep neural network based data reconstruction chain for the MicroBooNE detector. We show the first demonstration of a network's validity on real LArTPC data using MicroBooNE collection plane images. The demonstration is performed for stop** muon and a $ν_μ$ charged current neutral pion data samples. △ Less

Submitted 22 August, 2018; originally announced August 2018.

Journal ref: Phys. Rev. D 99, 092001 (2019)

arXiv:1309.6204

A Friendship Privacy Attack on Friends and 2-Distant Neighbors in Social Networks

Authors: Lei **, Xuelian Long, James Joshi

Abstract: In an undirected social graph, a friendship link involves two users and the friendship is visible in both the users' friend lists. Such a dual visibility of the friendship may raise privacy threats. This is because both users can separately control the visibility of a friendship link to other users and their privacy policies for the link may not be consistent. Even if one of them conceals the link… ▽ More In an undirected social graph, a friendship link involves two users and the friendship is visible in both the users' friend lists. Such a dual visibility of the friendship may raise privacy threats. This is because both users can separately control the visibility of a friendship link to other users and their privacy policies for the link may not be consistent. Even if one of them conceals the link from a third user, the third user may find such a friendship link from another user's friend list. In addition, as most users allow their friends to see their friend lists in most social network systems, an adversary can exploit the inconsistent policies to launch privacy attacks to identify and infer many of a targeted user's friends. In this paper, we propose, analyze and evaluate such an attack which is called Friendship Identification and Inference (FII) attack. In a FII attack scenario, we assume that an adversary can only see his friend list and the friend lists of his friends who do not hide the friend lists from him. Then, a FII attack contains two attack steps: 1) friend identification and 2) friend inference. In the friend identification step, the adversary tries to identify a target's friends based on his friend list and those of his friends. In the friend inference step, the adversary attempts to infer the target's friends by using the proposed random walk with restart approach. We present experimental results using three real social network datasets and show that FII attacks are generally efficient and effective when adversaries and targets are friends or 2-distant neighbors. We also comprehensively analyze the attack results in order to find what values of parameters and network features could promote FII attacks. Currently, most popular social network systems with an undirected friendship graph, such as Facebook, LinkedIn and Foursquare, are susceptible to FII attacks. △ Less

Submitted 1 December, 2013; v1 submitted 24 September, 2013; originally announced September 2013.

Comments: This paper has been withdrawn by the authors

arXiv:1209.0126 [pdf]

Evaluation of some Information Retrieval models for Gujarati Ad hoc Monolingual Tasks

Authors: Hardik J. Joshi, Pareek Jyoti

Abstract: This paper describes the work towards Gujarati Ad hoc Monolingual Retrieval task for widely used Information Retrieval (IR) models. We present an indexing baseline for the Gujarati Language represented by Mean Average Precision (MAP) values. Our objective is to obtain a relative picture of a better IR model for Gujarati Language. Results show that Classical IR models like Term Frequency Inverse Do… ▽ More This paper describes the work towards Gujarati Ad hoc Monolingual Retrieval task for widely used Information Retrieval (IR) models. We present an indexing baseline for the Gujarati Language represented by Mean Average Precision (MAP) values. Our objective is to obtain a relative picture of a better IR model for Gujarati Language. Results show that Classical IR models like Term Frequency Inverse Document Frequency (TF_IDF) performs better when compared to few recent probabilistic IR models. The experiments helped to identify the outperforming IR models for Gujarati Language. △ Less

Submitted 1 September, 2012; originally announced September 2012.

Comments: 6 pages, Some text in Gujarati Language

Journal ref: VNSGU Journal of Science and Technology,3,2,176-181,2012

Showing 1–25 of 25 results for author: Joshi, J