Search | arXiv e-print repository

Leveraging PointNet and PointNet++ for Lyft Point Cloud Classification Challenge

Abstract: This study investigates the application of PointNet and PointNet++ in the classification of LiDAR-generated point cloud data, a critical component for achieving fully autonomous vehicles. Utilizing a modified dataset from the Lyft 3D Object Detection Challenge, we examine the models' capabilities to handle dynamic and complex environments essential for autonomous navigation. Our analysis shows tha… ▽ More This study investigates the application of PointNet and PointNet++ in the classification of LiDAR-generated point cloud data, a critical component for achieving fully autonomous vehicles. Utilizing a modified dataset from the Lyft 3D Object Detection Challenge, we examine the models' capabilities to handle dynamic and complex environments essential for autonomous navigation. Our analysis shows that PointNet and PointNet++ achieved accuracy rates of 79.53% and 84.24%, respectively. These results underscore the models' robustness in interpreting intricate environmental data, which is pivotal for the safety and efficiency of autonomous vehicles. Moreover, the enhanced detection accuracy, particularly in distinguishing pedestrians from other objects, highlights the potential of these models to contribute substantially to the advancement of autonomous vehicle technology. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2312.00506 [pdf]

Generative artificial intelligence enhances creativity but reduces the diversity of novel content

Authors: Anil R. Doshi, Oliver P. Hauser

Abstract: Creativity is core to being human. Generative artificial intelligence (GenAI) holds promise for humans to be more creative by offering new ideas, or less creative by anchoring on GenAI ideas. We study the causal impact of GenAI on the production of a creative output in an online experimental study where some writers are could obtain ideas for a story from a GenAI platform. Access to GenAI ideas ca… ▽ More Creativity is core to being human. Generative artificial intelligence (GenAI) holds promise for humans to be more creative by offering new ideas, or less creative by anchoring on GenAI ideas. We study the causal impact of GenAI on the production of a creative output in an online experimental study where some writers are could obtain ideas for a story from a GenAI platform. Access to GenAI ideas causes an increase in the writer's creativity with stories being evaluated as better written and more enjoyable, especially among less creative writers. However, GenAI-enabled stories are more similar to each other than stories by humans alone. Our results have implications for researchers, policy-makers and practitioners interested in bolstering creativity, but point to potential downstream consequences from over-reliance. △ Less

Submitted 14 March, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.10075 [pdf]

ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing to Improve Health Literacy and Communication in Pediatric Populations and Beyond

Authors: Kanhai S. Amin, Linda Mayes, Pavan Khosla, Rushabh Doshi

Abstract: Purpose: Enhanced health literacy has been linked to better health outcomes; however, few interventions have been studied. We investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children and other populations. Methods: We ran 288 conditions using 26 different prompts through ChatGPT-3.5, Microsoft Bing, and Google Bard. Given constraints imposed b… ▽ More Purpose: Enhanced health literacy has been linked to better health outcomes; however, few interventions have been studied. We investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children and other populations. Methods: We ran 288 conditions using 26 different prompts through ChatGPT-3.5, Microsoft Bing, and Google Bard. Given constraints imposed by rate limits, we tested a subset of 150 conditions through ChatGPT-4. The primary outcome measurements were the reading grade level (RGL) and word counts of output. Results: Across all models, output for basic prompts such as "Explain" and "What is (are)" were at, or exceeded, a 10th-grade RGL. When prompts were specified to explain conditions from the 1st to 12th RGL, we found that LLMs had varying abilities to tailor responses based on RGL. ChatGPT-3.5 provided responses that ranged from the 7th-grade to college freshmen RGL while ChatGPT-4 outputted responses from the 6th-grade to the college-senior RGL. Microsoft Bing provided responses from the 9th to 11th RGL while Google Bard provided responses from the 7th to 10th RGL. Discussion: ChatGPT-3.5 and ChatGPT-4 did better in achieving lower-grade level outputs. Meanwhile Bard and Bing tended to consistently produce an RGL that is at the high school level regardless of prompt. Additionally, Bard's hesitancy in providing certain outputs indicates a cautious approach towards health information. LLMs demonstrate promise in enhancing health communication, but future research should verify the accuracy and effectiveness of such tools in this context. Implications: LLMs face challenges in crafting outputs below a sixth-grade reading level. However, their capability to modify outputs above this threshold provides a potential mechanism to improve health literacy and communication in a pediatric population and beyond. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 15 pages, 1 Table, 3 Figures, and 3 Supplemental Figures

arXiv:2310.08864 [pdf, other]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, A**kya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io. △ Less

Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Project website: https://robotics-transformer-x.github.io

arXiv:2212.09902 [pdf, other]

Dexterous Manipulation from Images: Autonomous Real-World RL via Substep Guidance

Authors: Kelvin Xu, Zheyuan Hu, Ria Doshi, Aaron Rovinsky, Vikash Kumar, Abhishek Gupta, Sergey Levine

Abstract: Complex and contact-rich robotic manipulation tasks, particularly those that involve multi-fingered hands and underactuated object manipulation, present a significant challenge to any control method. Methods based on reinforcement learning offer an appealing choice for such settings, as they can enable robots to learn to delicately balance contact forces and dexterously reposition objects without… ▽ More Complex and contact-rich robotic manipulation tasks, particularly those that involve multi-fingered hands and underactuated object manipulation, present a significant challenge to any control method. Methods based on reinforcement learning offer an appealing choice for such settings, as they can enable robots to learn to delicately balance contact forces and dexterously reposition objects without strong modeling assumptions. However, running reinforcement learning on real-world dexterous manipulation systems often requires significant manual engineering. This negates the benefits of autonomous data collection and ease of use that reinforcement learning should in principle provide. In this paper, we describe a system for vision-based dexterous manipulation that provides a "programming-free" approach for users to define new tasks and enable robots with complex multi-fingered hands to learn to perform them through interaction. The core principle underlying our system is that, in a vision-based setting, users should be able to provide high-level intermediate supervision that circumvents challenges in teleoperation or kinesthetic teaching which allow a robot to not only learn a task efficiently but also to autonomously practice. Our system includes a framework for users to define a final task and intermediate sub-tasks with image examples, a reinforcement learning procedure that learns the task autonomously without interventions, and experimental results with a four-finger robotic hand learning multi-stage object manipulation tasks directly in the real world, without simulation, manual modeling, or reward engineering. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: First two authors contributed equally

arXiv:2212.09860 [pdf, other]

Predicting Ejection Fraction from Chest X-rays Using Computer Vision for Diagnosing Heart Failure

Authors: Walt Williams, Rohan Doshi, Yanran Li, Kexuan Liang

Abstract: Heart failure remains a major public health challenge with growing costs. Ejection fraction (EF) is a key metric for the diagnosis and management of heart failure however estimation of EF using echocardiography remains expensive for the healthcare system and subject to intra/inter operator variability. While chest x-rays (CXR) are quick, inexpensive, and require less expertise, they do not provide… ▽ More Heart failure remains a major public health challenge with growing costs. Ejection fraction (EF) is a key metric for the diagnosis and management of heart failure however estimation of EF using echocardiography remains expensive for the healthcare system and subject to intra/inter operator variability. While chest x-rays (CXR) are quick, inexpensive, and require less expertise, they do not provide sufficient information to the human eye to estimate EF. This work explores the efficacy of computer vision techniques to predict reduced EF solely from CXRs. We studied a dataset of 3488 CXRs from the MIMIC CXR-jpg (MCR) dataset. Our work establishes benchmarks using multiple state-of-the-art convolutional neural network architectures. The subsequent analysis shows increasing model sizes from 8M to 23M parameters improved classification performance without overfitting the dataset. We further show how data augmentation techniques such as CXR rotation and random crop** further improves model performance another ~5%. Finally, we conduct an error analysis using saliency maps and Grad-CAMs to better understand the failure modes of convolutional models on this task. △ Less

Submitted 19 December, 2022; originally announced December 2022.

arXiv:2211.13907 [pdf, other]

Blockchain based solution design for Energy Exchange Platform

Authors: Atharv Bhadange, Rohan Doshi, Tanmay Karmarkar, Snehal Shintre

Abstract: It is observed that users have higher requirements for fairness, transparency, and privacy of transactions of energy exchanges that occur across platforms like Indian Energy Exchange (IEX) and Power Exchange India Limited (PXIL). As a decentralized distributed accounting system, blockchain is characterized by traceability, security, credibility, and non-tampering of transactions, which can meet th… ▽ More It is observed that users have higher requirements for fairness, transparency, and privacy of transactions of energy exchanges that occur across platforms like Indian Energy Exchange (IEX) and Power Exchange India Limited (PXIL). As a decentralized distributed accounting system, blockchain is characterized by traceability, security, credibility, and non-tampering of transactions, which can meet the needs of integrated energy and multi-energy transactions. Based on the research on the application of blockchain technology in the field of integrated energy services, this solution proposes an integrated energy trading process based on smart contracts and explores the application of blockchain technology in integrated energy services. △ Less

Submitted 25 November, 2022; originally announced November 2022.

arXiv:2107.04556 [pdf, other]

doi 10.1063/5.0062546

Deep Learning for Reduced Order Modelling and Efficient Temporal Evolution of Fluid Simulations

Authors: Pranshu Pant, Ruchit Doshi, Pranav Bahl, Amir Barati Farimani

Abstract: Reduced Order Modelling (ROM) has been widely used to create lower order, computationally inexpensive representations of higher-order dynamical systems. Using these representations, ROMs can efficiently model flow fields while using significantly lesser parameters. Conventional ROMs accomplish this by linearly projecting higher-order manifolds to lower-dimensional space using dimensionality reduct… ▽ More Reduced Order Modelling (ROM) has been widely used to create lower order, computationally inexpensive representations of higher-order dynamical systems. Using these representations, ROMs can efficiently model flow fields while using significantly lesser parameters. Conventional ROMs accomplish this by linearly projecting higher-order manifolds to lower-dimensional space using dimensionality reduction techniques such as Proper Orthogonal Decomposition (POD). In this work, we develop a novel deep learning framework DL-ROM (Deep Learning - Reduced Order Modelling) to create a neural network capable of non-linear projections to reduced order states. We then use the learned reduced state to efficiently predict future time steps of the simulation using 3D Autoencoder and 3D U-Net based architectures. Our model DL-ROM is able to create highly accurate reconstructions from the learned ROM and is thus able to efficiently predict future time steps by temporally traversing in the learned reduced state. All of this is achieved without ground truth supervision or needing to iteratively solve the expensive Navier-Stokes(NS) equations thereby resulting in massive computational savings. To test the effectiveness and performance of our approach, we evaluate our implementation on five different Computational Fluid Dynamics (CFD) datasets using reconstruction performance and computational runtime metrics. DL-ROM can reduce the computational runtimes of iterative solvers by nearly two orders of magnitude while maintaining an acceptable error threshold. △ Less

Submitted 9 July, 2021; originally announced July 2021.

Comments: 16 pages, 11 figures

arXiv:2106.01750 [pdf, other]

doi 10.1109/TCSS.2022.3140779

Modeling Influencer Marketing Campaigns in Social Networks

Authors: Ronak Doshi, Ajay Ramesh Ranganathan, Shrisha Rao

Abstract: Social media are extensively used in today's world, and facilitate quick and easy sharing of information, which makes them a good way to advertise products. Influencers of a social media network, owing to their massive popularity, provide a huge potential customer base. However, it is not straightforward to decide which influencers should be selected for an advertizing campaign that can generate h… ▽ More Social media are extensively used in today's world, and facilitate quick and easy sharing of information, which makes them a good way to advertise products. Influencers of a social media network, owing to their massive popularity, provide a huge potential customer base. However, it is not straightforward to decide which influencers should be selected for an advertizing campaign that can generate high returns with low investment. In this work, we present an agent-based model (ABM) that can simulate the dynamics of influencer advertizing campaigns in a variety of scenarios and can help to discover the best influencer marketing strategy. Our system is a probabilistic graph-based model that provides the additional advantage to incorporate real-world factors such as customers' interest in a product, customer behavior, the willingness to pay, a brand's investment cap, influencers' engagement with influence diffusion, and the nature of the product being advertized viz. luxury and non-luxury. Using customer acquisition cost and conversion ratio as a unit economic, we evaluate the performance of different kinds of influencers under a variety of circumstances that are simulated by varying the nature of the product and the customers' interest. Our results exemplify the circumstance-dependent nature of influencer marketing and provide insight into which kinds of influencers would be a better strategy under respective circumstances. For instance, we show that as the nature of the product varies from luxury to non-luxury, the performance of celebrities declines whereas the performance of nano-influencers improves. In terms of the customers' interest, we find that the performance of nano-influencers declines with the decrease in customers' interest whereas the performance of celebrities improves. △ Less

Submitted 22 January, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: 14 pages, in IEEE Transactions on Computational Social Systems

MSC Class: 68T42; 93A16

Journal ref: IEEE Transactions on Computational Social Systems, vol. 10 (1), February 2023, pp. 322--334

arXiv:2103.08074 [pdf, other]

Pushing the Limits of Capsule Networks

Authors: Prem Nair, Rohan Doshi, Stefan Keselj

Abstract: Convolutional neural networks use pooling and other downscaling operations to maintain translational invariance for detection of features, but in their architecture they do not explicitly maintain a representation of the locations of the features relative to each other. This means they do not represent two instances of the same object in different orientations the same way, like humans do, and so… ▽ More Convolutional neural networks use pooling and other downscaling operations to maintain translational invariance for detection of features, but in their architecture they do not explicitly maintain a representation of the locations of the features relative to each other. This means they do not represent two instances of the same object in different orientations the same way, like humans do, and so training them often requires extensive data augmentation and exceedingly deep networks. A team at Google Brain recently made news with an attempt to fix this problem: Capsule Networks. While a normal CNN works with scalar outputs representing feature presence, a CapsNet works with vector outputs representing entity presence. We want to stress test CapsNet in various incremental ways to better understand their performance and expressiveness. In broad terms, the goals of our investigation are: (1) test CapsNets on datasets that are like MNIST but harder in a specific way, and (2) explore the internal embedding space and sources of error for CapsNets. △ Less

Submitted 14 March, 2021; originally announced March 2021.

arXiv:2101.04952 [pdf]

Modeling and Analysis of Unmanned Remote Guided Vehicle on Rough and Loose Snow Terrain

Authors: Abhishek D. Patange, Sharad S. Mulik, R. Jegadeeshwaran, Dhananjay R. Jadhav, Prateek J. Ghatage, Gaurav R. Doshi, Rushikesh V Raykar

Abstract: Survival in remote snow bounded areas is unsafe and risky for mankind. Many problems like arthritis, frostbite, asthma, starvation can caused and lead to death. Indian Military provides transportation vehicles which are heavily built and needs manpower for monitoring. Hence it necessitates facilitating compact transportation to fulfill all requirements. This research aimed at design and analysis o… ▽ More Survival in remote snow bounded areas is unsafe and risky for mankind. Many problems like arthritis, frostbite, asthma, starvation can caused and lead to death. Indian Military provides transportation vehicles which are heavily built and needs manpower for monitoring. Hence it necessitates facilitating compact transportation to fulfill all requirements. This research aimed at design and analysis of mobile unmanned vehicle for transportation & providing medical help, food and other essential things necessary for surviving in such areas. This can also be used for military services to save the life of solider with less risk. It is typical medium weight, high speed vehicle which carries up to 35 kg load and can negotiate through loose snow, rough terrain with use of caterpillar track. The noteworthy feature of the vehicle is that it constitutes of spiral blades and V shape snowplow to make its way through snow. Hence it will repel the snow in outward direction for self-extraction. It also incorporates skis and hubs for changing the direction and smooth suspension. 3D model of the vehicle is drafted in CATIA and structural analysis is carried out in ANSYS. Control system design and mechatronics integration is proposed to develop the prototype by assembling various components. △ Less

Submitted 13 January, 2021; originally announced January 2021.

Comments: 4 pages, 4 figures

arXiv:1804.04159 [pdf, other]

doi 10.1109/SPW.2018.00013

Machine Learning DDoS Detection for Consumer Internet of Things Devices

Authors: Rohan Doshi, Noah Apthorpe, Nick Feamster

Abstract: An increasing number of Internet of Things (IoT) devices are connecting to the Internet, yet many of these devices are fundamentally insecure, exposing the Internet to a variety of attacks. Botnets such as Mirai have used insecure consumer IoT devices to conduct distributed denial of service (DDoS) attacks on critical Internet infrastructure. This motivates the development of new techniques to aut… ▽ More An increasing number of Internet of Things (IoT) devices are connecting to the Internet, yet many of these devices are fundamentally insecure, exposing the Internet to a variety of attacks. Botnets such as Mirai have used insecure consumer IoT devices to conduct distributed denial of service (DDoS) attacks on critical Internet infrastructure. This motivates the development of new techniques to automatically detect consumer IoT attack traffic. In this paper, we demonstrate that using IoT-specific network behaviors (e.g. limited number of endpoints and regular time intervals between packets) to inform feature selection can result in high accuracy DDoS detection in IoT network traffic with a variety of machine learning algorithms, including neural networks. These results indicate that home gateway routers or other network middleboxes could automatically detect local IoT device sources of DDoS attacks using low-cost machine learning algorithms and traffic data that is flow-based and protocol-agnostic. △ Less

Submitted 11 April, 2018; originally announced April 2018.

Comments: 7 pages, 3 figures, 3 tables, appears in the 2018 Workshop on Deep Learning and Security (DLS '18)

Showing 1–12 of 12 results for author: Doshi, R