-
SABLE: Staging Blocked Evaluation of Sparse Matrix Computations
Authors:
Pratyush Das,
Adhitha Dias,
Anxhelo Xhebraj,
Artem Pelenitsyn,
Kirshanthan Sundararajah,
Milind Kulkarni
Abstract:
Sparse Matrices found in the real world often have some structure in how the dense elements are organized. While the inspector-executor model inspects matrices for structure, its generality can overlook further specialization. We propose a system that - if the sparse matrix is stored in a blocked storage format - can generate more efficient code by constructing regular loops over these blocks. Our…
▽ More
Sparse Matrices found in the real world often have some structure in how the dense elements are organized. While the inspector-executor model inspects matrices for structure, its generality can overlook further specialization. We propose a system that - if the sparse matrix is stored in a blocked storage format - can generate more efficient code by constructing regular loops over these blocks. Our system performs a specified computation over every element of the block instead of avoiding computing any sparse element at all and achieving regularity in specialized code. The system is extensible, providing a dense block iterator for the user to express any computation over these dense blocks. We show that this approach can significantly speed up SpMV and SpMM operations over the state-of-the-art systems Partially-Strided Codelets and Sparse Register Tiling.
△ Less
Submitted 3 April, 2024;
originally announced July 2024.
-
Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models
Authors:
Nisarg Patel,
Mohith Kulkarni,
Mihir Parmar,
Aashna Budhiraja,
Mutsumi Nakamura,
Neeraj Varshney,
Chitta Baral
Abstract:
As Large Language Models (LLMs) continue to exhibit remarkable performance in natural language understanding tasks, there is a crucial need to measure their ability for human-like multi-step logical reasoning. Existing logical reasoning evaluation benchmarks often focus primarily on simplistic single-step or multi-step reasoning with a limited set of inference rules. Furthermore, the lack of datas…
▽ More
As Large Language Models (LLMs) continue to exhibit remarkable performance in natural language understanding tasks, there is a crucial need to measure their ability for human-like multi-step logical reasoning. Existing logical reasoning evaluation benchmarks often focus primarily on simplistic single-step or multi-step reasoning with a limited set of inference rules. Furthermore, the lack of datasets for evaluating non-monotonic reasoning represents a crucial gap since it aligns more closely with human-like reasoning. To address these limitations, we propose Multi-LogiEval, a comprehensive evaluation dataset encompassing multi-step logical reasoning with various inference rules and depths. Multi-LogiEval covers three logic types--propositional, first-order, and non-monotonic--consisting of more than 30 inference rules and more than 60 of their combinations with various depths. Leveraging this dataset, we conduct evaluations on a range of LLMs including GPT-4, ChatGPT, Gemini-Pro, Yi, Orca, and Mistral, employing a zero-shot chain-of-thought. Experimental results show that there is a significant drop in the performance of LLMs as the reasoning steps/depth increases (average accuracy of ~68% at depth-1 to ~43% at depth-5). We further conduct a thorough investigation of reasoning chains generated by LLMs which reveals several important findings. We believe that Multi-LogiEval facilitates future research for evaluating and enhancing the logical reasoning ability of LLMs. Data is available at https://github.com/Mihir3009/Multi-LogiEval.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Optimizing Layout of Recursive Datatypes with Marmoset
Authors:
Vidush Singhal,
Chaitanya Koparkar,
Joseph Zullo,
Artem Pelenitsyn,
Michael Vollmer,
Mike Rainey,
Ryan Newton,
Milind Kulkarni
Abstract:
While programmers know that the low-level memory representation of data structures can have significant effects on performance, compiler support to optimize the layout of those structures is an under-explored field. Prior work has optimized the layout of individual, non-recursive structures without considering how collections of those objects in linked or recursive data structures are laid out. Th…
▽ More
While programmers know that the low-level memory representation of data structures can have significant effects on performance, compiler support to optimize the layout of those structures is an under-explored field. Prior work has optimized the layout of individual, non-recursive structures without considering how collections of those objects in linked or recursive data structures are laid out. This work introduces Marmoset, a compiler that optimizes the layouts of algebraic datatypes, with a special focus on producing highly optimized, packed data layouts where recursive structures can be traversed with minimal pointer chasing. Marmoset performs an analysis of how a recursive ADT is used across functions to choose a global layout that promotes simple, strided access for that ADT in memory. It does so by building and solving a constraint system to minimize an abstract cost model, yielding a predicted efficient layout for the ADT. Marmoset then builds on top of Gibbon, a prior compiler for packed, mostly-serial representations, to synthesize optimized ADTs. We show experimentally that Marmoset is able to choose optimal layouts across a series of microbenchmarks and case studies, outperforming both Gibbons baseline approach, as well as MLton, a Standard ML compiler that uses traditional pointer-heavy representations.
△ Less
Submitted 3 June, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning
Authors:
Sanchit Sinha,
Yuguang Yue,
Victor Soto,
Mayank Kulkarni,
Jianhua Lu,
Aidong Zhang
Abstract:
Adapting large language models (LLMs) to unseen tasks with in-context training samples without fine-tuning remains an important research problem. To learn a robust LLM that adapts well to unseen tasks, multiple meta-training approaches have been proposed such as MetaICL and MetaICT, which involve meta-training pre-trained LLMs on a wide variety of diverse tasks. These meta-training approaches esse…
▽ More
Adapting large language models (LLMs) to unseen tasks with in-context training samples without fine-tuning remains an important research problem. To learn a robust LLM that adapts well to unseen tasks, multiple meta-training approaches have been proposed such as MetaICL and MetaICT, which involve meta-training pre-trained LLMs on a wide variety of diverse tasks. These meta-training approaches essentially perform in-context multi-task fine-tuning and evaluate on a disjointed test set of tasks. Even though they achieve impressive performance, their goal is never to compute a truly general set of parameters. In this paper, we propose MAML-en-LLM, a novel method for meta-training LLMs, which can learn truly generalizable parameters that not only perform well on disjointed tasks but also adapts to unseen tasks. We see an average increase of 2% on unseen domains in the performance while a massive 4% improvement on adaptation performance. Furthermore, we demonstrate that MAML-en-LLM outperforms baselines in settings with limited amount of training data on both seen and unseen domains by an average of 2%. Finally, we discuss the effects of type of tasks, optimizers and task complexity, an avenue barely explored in meta-training literature. Exhaustive experiments across 7 task settings along with two data settings demonstrate that models trained with MAML-en-LLM outperform SOTA meta-training approaches.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Maritime Vessel Tank Inspection using Aerial Robots: Experience from the field and dataset release
Authors:
Mihir Dharmadhikari,
Nikhil Khedekar,
Paolo De Petris,
Mihir Kulkarni,
Morten Nissov,
Kostas Alexis
Abstract:
This paper presents field results and lessons learned from the deployment of aerial robots inside ship ballast tanks. Vessel tanks including ballast tanks and cargo holds present dark, dusty environments having simultaneously very narrow openings and wide open spaces that create several challenges for autonomous navigation and inspection operations. We present a system for vessel tank inspection u…
▽ More
This paper presents field results and lessons learned from the deployment of aerial robots inside ship ballast tanks. Vessel tanks including ballast tanks and cargo holds present dark, dusty environments having simultaneously very narrow openings and wide open spaces that create several challenges for autonomous navigation and inspection operations. We present a system for vessel tank inspection using an aerial robot along with its autonomy modules. We show the results of autonomous exploration and visual inspection in 3 ships spanning across 7 distinct types of sections of the ballast tanks. Additionally, we comment on the lessons learned from the field and possible directions for future work. Finally, we release a dataset consisting of the data from these missions along with data collected with a handheld sensor stick.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Cyberbully and Online Harassment: Issues Associated with Digital Wellbeing
Authors:
Manasi Kulkarni,
Siddhi Durve,
Bochen Jia
Abstract:
As digital technology becomes increasingly embedded in daily life, its impact on social interactions has become a critical area of study, particularly concerning cyberbullying. This meta-analysis investigates the dual role of technology in cyberbullying both as a catalyst that can exacerbate the issue and as a potential solution. Cyberbullying, characterized by the use of digital platforms to hara…
▽ More
As digital technology becomes increasingly embedded in daily life, its impact on social interactions has become a critical area of study, particularly concerning cyberbullying. This meta-analysis investigates the dual role of technology in cyberbullying both as a catalyst that can exacerbate the issue and as a potential solution. Cyberbullying, characterized by the use of digital platforms to harass, threaten, or humiliate individuals, poses significant challenges to mental and social wellbeing. This research synthesizes empirical findings from diverse studies to evaluate how innovative technological interventions, such as content monitoring algorithms, anonymous reporting systems, and educational initiatives integrated within digital platforms, contribute to reducing the prevalence of cyberbullying. The study focuses on the effectiveness of these interventions in various settings, highlighting the need for adaptive strategies that respond to the dynamic digital landscape. By offering a comprehensive overview of the current state of cyberbullying and the efficacy of technology based solutions, this analysis provides valuable insights for stakeholders, including educators, policymakers, and technology developers, aiming to enhance digital wellbeing and create safer online environments. The findings underscore the importance of leveraging technology not only as a medium of communication but also as a strategic tool to combat the negative impacts of cyberbullying, thus promoting a more inclusive and respectful digital world.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
LabelAId: Just-in-time AI Interventions for Improving Human Labeling Quality and Domain Knowledge in Crowdsourcing Systems
Authors:
Chu Li,
Zhihan Zhang,
Michael Saugstad,
Esteban Safranchik,
Minchu Kulkarni,
Xiaoyu Huang,
Shwetak Patel,
Vikram Iyer,
Tim Althoff,
Jon E. Froehlich
Abstract:
Crowdsourcing platforms have transformed distributed problem-solving, yet quality control remains a persistent challenge. Traditional quality control measures, such as prescreening workers and refining instructions, often focus solely on optimizing economic output. This paper explores just-in-time AI interventions to enhance both labeling quality and domain-specific knowledge among crowdworkers. W…
▽ More
Crowdsourcing platforms have transformed distributed problem-solving, yet quality control remains a persistent challenge. Traditional quality control measures, such as prescreening workers and refining instructions, often focus solely on optimizing economic output. This paper explores just-in-time AI interventions to enhance both labeling quality and domain-specific knowledge among crowdworkers. We introduce LabelAId, an advanced inference model combining Programmatic Weak Supervision (PWS) with FT-Transformers to infer label correctness based on user behavior and domain knowledge. Our technical evaluation shows that our LabelAId pipeline consistently outperforms state-of-the-art ML baselines, improving mistake inference accuracy by 36.7% with 50 downstream samples. We then implemented LabelAId into Project Sidewalk, an open-source crowdsourcing platform for urban accessibility. A between-subjects study with 34 participants demonstrates that LabelAId significantly enhances label precision without compromising efficiency while also increasing labeler confidence. We discuss LabelAId's success factors, limitations, and its generalizability to other crowdsourced science domains.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Mochi: Fast \& Exact Collision Detection
Authors:
Durga Keerthi Mandarapu,
Nicholas James,
Milind Kulkarni
Abstract:
Collision Detection (CD) has several applications across the domains such as robotics, visual graphics, and fluid mechanics.
Finding exact collisions between the objects in the scene is quite computationally intensive.
To quickly filter the object pairs that do not result in a collision, bounding boxes are built on the objects, indexed using a Bounding Volume Hierarchy(BVH), and tested for int…
▽ More
Collision Detection (CD) has several applications across the domains such as robotics, visual graphics, and fluid mechanics.
Finding exact collisions between the objects in the scene is quite computationally intensive.
To quickly filter the object pairs that do not result in a collision, bounding boxes are built on the objects, indexed using a Bounding Volume Hierarchy(BVH), and tested for intersection before performing the expensive object-object intersection tests.
In state-of-the-art CD libraries, accelerators such as GPUs are used to accelerate BVH traversal by building specialized data structures.
The recent addition of ray tracing architecture to GPU hardware is designed to do the same but in the context of implementing a Ray Tracing algorithm to render a graphical scene in real-time.
We present Mochi, a fast and exact collision detection engine that accelerates both the broad and narrow phases by taking advantage of the capabilities of Ray Tracing cores.
We introduce multiple new reductions to perform generic CD to support three types of objects for CD: simple spherical particles, objects describable by mathematical equations, and complex objects composed of a triangle mesh.
By implementing our reductions, Mochi achieves several orders of magnitude speedups on synthetic datasets and 5x-28x speedups on real-world triangle mesh datasets.
We further evaluate our reductions thoroughly and provide several architectural insights on the ray tracing cores that are otherwise unknown due to their proprietorship.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Reinforcement Learning for Collision-free Flight Exploiting Deep Collision Encoding
Authors:
Mihir Kulkarni,
Kostas Alexis
Abstract:
This work contributes a novel deep navigation policy that enables collision-free flight of aerial robots based on a modular approach exploiting deep collision encoding and reinforcement learning. The proposed solution builds upon a deep collision encoder that is trained on both simulated and real depth images using supervised learning such that it compresses the high-dimensional depth data to a lo…
▽ More
This work contributes a novel deep navigation policy that enables collision-free flight of aerial robots based on a modular approach exploiting deep collision encoding and reinforcement learning. The proposed solution builds upon a deep collision encoder that is trained on both simulated and real depth images using supervised learning such that it compresses the high-dimensional depth data to a low-dimensional latent space encoding collision information while accounting for the robot size. This compressed encoding is combined with an estimate of the robot's odometry and the desired target location to train a deep reinforcement learning navigation policy that offers low-latency computation and robust sim2real performance. A set of simulation and experimental studies in diverse environments are conducted and demonstrate the efficiency of the emerged behavior and its resilience in real-life deployments.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Aerial Field Robotics
Authors:
Mihir Kulkarni,
Brady Moon,
Kostas Alexis,
Sebastian Scherer
Abstract:
Aerial field robotics research represents the domain of study that aims to equip unmanned aerial vehicles - and as it pertains to this chapter, specifically Micro Aerial Vehicles (MAVs)- with the ability to operate in real-life environments that present challenges to safe navigation. We present the key elements of autonomy for MAVs that are resilient to collisions and sensing degradation, while op…
▽ More
Aerial field robotics research represents the domain of study that aims to equip unmanned aerial vehicles - and as it pertains to this chapter, specifically Micro Aerial Vehicles (MAVs)- with the ability to operate in real-life environments that present challenges to safe navigation. We present the key elements of autonomy for MAVs that are resilient to collisions and sensing degradation, while operating under constrained computational resources. We overview aspects of the state of the art, outline bottlenecks to resilient navigation autonomy, and overview the field-readiness of MAVs. We conclude with notable contributions and discuss considerations for future research that are essential for resilience in aerial robotics.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Reinforcement Learning for Optimizing RAG for Domain Chatbots
Authors:
Mandar Kulkarni,
Praveen Tangarajan,
Kyung Kim,
Anusua Trivedi
Abstract:
With the advent of Large Language Models (LLM), conversational assistants have become prevalent for domain use cases. LLMs acquire the ability to contextual question answering through training, and Retrieval Augmented Generation (RAG) further enables the bot to answer domain-specific questions. This paper describes a RAG-based approach for building a chatbot that answers user's queries using Frequ…
▽ More
With the advent of Large Language Models (LLM), conversational assistants have become prevalent for domain use cases. LLMs acquire the ability to contextual question answering through training, and Retrieval Augmented Generation (RAG) further enables the bot to answer domain-specific questions. This paper describes a RAG-based approach for building a chatbot that answers user's queries using Frequently Asked Questions (FAQ) data. We train an in-house retrieval embedding model using infoNCE loss, and experimental results demonstrate that the in-house model works significantly better than the well-known general-purpose public embedding model, both in terms of retrieval accuracy and Out-of-Domain (OOD) query detection. As an LLM, we use an open API-based paid ChatGPT model. We noticed that a previously retrieved-context could be used to generate an answer for specific patterns/sequences of queries (e.g., follow-up queries). Hence, there is a scope to optimize the number of LLM tokens and cost. Assuming a fixed retrieval model and an LLM, we optimize the number of LLM tokens using Reinforcement Learning (RL). Specifically, we propose a policy-based model external to the RAG, which interacts with the RAG pipeline through policy actions and updates the policy to optimize the cost. The policy model can perform two actions: to fetch FAQ context or skip retrieval. We use the open API-based GPT-4 as the reward model. We then train a policy model using policy gradient on multiple training chat sessions. As a policy model, we experimented with a public gpt-2 model and an in-house BERT model. With the proposed RL-based optimization combined with similarity threshold, we are able to achieve significant cost savings while getting a slightly improved accuracy. Though we demonstrate results for the FAQ chatbot, the proposed RL approach is generic and can be experimented with any existing RAG pipeline.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
SparseAuto: An Auto-Scheduler for Sparse Tensor Computations Using Recursive Loop Nest Restructuring
Authors:
Adhitha Dias,
Logan Anderson,
Kirshanthan Sundararajah,
Artem Pelenitsyn,
Milind Kulkarni
Abstract:
Automated code generation and performance optimizations for sparse tensor algebra are cardinal since they have become essential in many real-world applications like quantum computing, physics, chemistry, and machine learning. General sparse tensor algebra compilers are not always versatile enough to generate asymptotically optimal code for sparse tensor contractions. This paper shows how to optimi…
▽ More
Automated code generation and performance optimizations for sparse tensor algebra are cardinal since they have become essential in many real-world applications like quantum computing, physics, chemistry, and machine learning. General sparse tensor algebra compilers are not always versatile enough to generate asymptotically optimal code for sparse tensor contractions. This paper shows how to optimize and generate asymptotically better schedules for complex tensor expressions using kernel fission and fusion. We present a generalized loop transformation to achieve loop nesting for minimized memory footprint and reduced asymptotic complexity.
Furthermore, we present an auto-scheduler that uses a partially ordered set-based cost model that uses both time and auxiliary memory complexities in its pruning stages. In addition, we highlight the use of SMT solvers in sparse auto-schedulers to prune the Pareto frontier of schedules to the smallest number of possible schedules with user-defined constraints available at compile time. Finally, we show that our auto-scheduler can select asymptotically better schedules that use our compiler transformation to generate optimized code. Our results show that the auto-scheduler achieves orders of magnitude speedup compared to the TACO-generated code for several real-world tensor algebra computations on different real-world inputs.
△ Less
Submitted 5 January, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing
Authors:
Durga Mandarapu,
Vani Nagarajan,
Artem Pelenitsyn,
Milind Kulkarni
Abstract:
High-performance implementations of $k$-Nearest Neighbor Search ($k$NN) in low dimensions use tree-based data structures. Tree algorithms are hard to parallelize on GPUs due to their irregularity. However, newer Nvidia GPUs offer hardware support for tree operations through ray-tracing cores. Recent works have proposed using RT cores to implement $k$NN search, but they all have a hardware-imposed…
▽ More
High-performance implementations of $k$-Nearest Neighbor Search ($k$NN) in low dimensions use tree-based data structures. Tree algorithms are hard to parallelize on GPUs due to their irregularity. However, newer Nvidia GPUs offer hardware support for tree operations through ray-tracing cores. Recent works have proposed using RT cores to implement $k$NN search, but they all have a hardware-imposed constraint on the distance metric used in the search -- the Euclidean distance. We propose and implement two reductions to support $k$NN for a broad range of distances other than the Euclidean distance: Arkade Filter-Refine and Arkade Monotone Transformation, each of which allows non-Euclidean distance-based nearest neighbor queries to be performed in terms of the Euclidean distance. With our reductions, we observe that $k$NN search time speedups range between $1.6$x-$200$x and $1.3$x-$33.1$x over various state-of-the-art GPU shader core and RT core baselines, respectively. In evaluation, we provide several insights on RT architectures' ability to efficiently build and traverse the tree by analyzing the $k$NN search time trends.
△ Less
Submitted 21 April, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Autonomous Exploration and General Visual Inspection of Ship Ballast Water Tanks using Aerial Robots
Authors:
Mihir Dharmadhikari,
Paolo De Petris,
Mihir Kulkarni,
Nikhil Khedekar,
Huan Nguyen,
Arnt Erik Stene,
Eivind Sjøvold,
Kristian Solheim,
Bente Gussiaas,
Kostas Alexis
Abstract:
This paper presents a solution for the autonomous exploration and inspection of Ballast Water Tanks (BWTs) of marine vessels using aerial robots. Ballast tank compartments are critical for a vessel's safety and correspond to confined environments often connected through particularly narrow manholes. The method enables their volumetric exploration combined with visual inspection subject to constrai…
▽ More
This paper presents a solution for the autonomous exploration and inspection of Ballast Water Tanks (BWTs) of marine vessels using aerial robots. Ballast tank compartments are critical for a vessel's safety and correspond to confined environments often connected through particularly narrow manholes. The method enables their volumetric exploration combined with visual inspection subject to constraints regarding the viewing distance from a surface. We present evaluation studies in simulation, in a mission consisting of 18 BWT compartments, and in 3 field experiments inside real vessels. The data from one of the experiments is also post-processed to generate semantically-segmented meshes of inspection-important geometries. Geometric models can be associated with onboard camera images for detailed and intuitive analysis.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Task-driven Compression for Collision Encoding based on Depth Images
Authors:
Mihir Kulkarni,
Kostas Alexis
Abstract:
This paper contributes a novel learning-based method for aggressive task-driven compression of depth images and their encoding as images tailored to collision prediction for robotic systems. A novel 3D image processing methodology is proposed that accounts for the robot's size in order to appropriately "inflate" the obstacles represented in the depth image and thus obtain the distance that can be…
▽ More
This paper contributes a novel learning-based method for aggressive task-driven compression of depth images and their encoding as images tailored to collision prediction for robotic systems. A novel 3D image processing methodology is proposed that accounts for the robot's size in order to appropriately "inflate" the obstacles represented in the depth image and thus obtain the distance that can be traversed by the robot in a collision-free manner along any given ray within the camera frustum. Such depth-and-collision image pairs are used to train a neural network that follows the architecture of Variational Autoencoders to compress-and-transform the information in the original depth image to derive a latent representation that encodes the collision information for the given depth image. We compare our proposed task-driven encoding method with classical task-agnostic methods and demonstrate superior performance for the task of collision image prediction from extremely low-dimensional latent spaces. A set of comparative studies show that the proposed approach is capable of encoding depth image-and-collision image tuples from complex scenes with thin obstacles at long distances better than the classical methods at compression ratios as high as 4050:1.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Targeted Control-flow Transformations for Mitigating Path Explosion in Dynamic Symbolic Execution
Authors:
Charitha Saumya,
Rohan Gangaraju,
Kirshanthan Sundararajah,
Milind Kulkarni
Abstract:
Dynamic symbolic execution (DSE) suffers from path explosion problem when the target program has many conditional branches. Classical approach for managing the path explosion problem is dynamic state merging. Dynamic state merging combines similar symbolic program states together to avoid the exponential growth of states in DSE. However, state merging still requires solver invocations at each bran…
▽ More
Dynamic symbolic execution (DSE) suffers from path explosion problem when the target program has many conditional branches. Classical approach for managing the path explosion problem is dynamic state merging. Dynamic state merging combines similar symbolic program states together to avoid the exponential growth of states in DSE. However, state merging still requires solver invocations at each branch point of the program even when both paths of the branch is feasible and, the best path search strategy for DSE may not create the best state merging opportunities. Some drawbacks of state merging can be mitigated by compile-time state merging i.e. branch elimination by converting control-flow into data-flow. In this paper, we propose a non-semantics preserving but failure-preserving compiler technique for removing expensive symbolic branches in a program to improve the scalability of DSE. We develop a framework for detecting spurious bugs that can be inserted by our transformation. Finally, we show that our transformation can significantly improve the performance of exhaustive DSE on variety of benchmarks and helps in achieving more coverage in a large real-world subjects within a limited time budget.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Overtaking Moving Obstacles with Digit: Path Following for Bipedal Robots via Model Predictive Contouring Control
Authors:
Kunal S. Narkhede,
Dhruv A. Thanki,
Abhijeet M. Kulkarni,
Ioannis Poulakakis
Abstract:
Humanoid robots are expected to navigate in changing environments and perform a variety of tasks. Frequently, these tasks require the robot to make decisions online regarding the speed and precision of following a reference path. For example, a robot may want to decide to temporarily deviate from its path to overtake a slowly moving obstacle that shares the same path and is ahead. In this case, pa…
▽ More
Humanoid robots are expected to navigate in changing environments and perform a variety of tasks. Frequently, these tasks require the robot to make decisions online regarding the speed and precision of following a reference path. For example, a robot may want to decide to temporarily deviate from its path to overtake a slowly moving obstacle that shares the same path and is ahead. In this case, path following performance is compromised in favor of fast path traversal. Available global trajectory tracking approaches typically assume a given -- specified in advance -- time parametrization of the path and seek to minimize the norm of the Cartesian error. As a result, when the robot should be where on the path is fixed and temporary deviations from the path are strongly discouraged. Given a global path, this paper presents a Model Predictive Contouring Control (MPCC) approach to selecting footsteps that maximize path traversal while simultaneously allowing the robot to decide between faithful versus fast path following. The method is evaluated in high-fidelity simulations of the bipedal robot Digit in terms of tracking performance of curved paths under disturbances and is also applied to the case where Digit overtakes a moving obstacle.
△ Less
Submitted 31 July, 2023;
originally announced August 2023.
-
Semantically-enhanced Deep Collision Prediction for Autonomous Navigation using Aerial Robots
Authors:
Mihir Kulkarni,
Huan Nguyen,
Kostas Alexis
Abstract:
This paper contributes a novel and modularized learning-based method for aerial robots navigating cluttered environments containing hard-to-perceive thin obstacles without assuming access to a map or the full pose estimation of the robot. The proposed solution builds upon a semantically-enhanced Variational Autoencoder that is trained with both real-world and simulated depth images to compress the…
▽ More
This paper contributes a novel and modularized learning-based method for aerial robots navigating cluttered environments containing hard-to-perceive thin obstacles without assuming access to a map or the full pose estimation of the robot. The proposed solution builds upon a semantically-enhanced Variational Autoencoder that is trained with both real-world and simulated depth images to compress the input data, while preserving semantically-labeled thin obstacles and handling invalid pixels in the depth sensor's output. This compressed representation, in addition to the robot's partial state involving its linear/angular velocities and its attitude are then utilized to train an uncertainty-aware 3D Collision Prediction Network in simulation to predict collision scores for candidate action sequences in a predefined motion primitives library. A set of simulation and experimental studies in cluttered environments with various sizes and types of obstacles, including multiple hard-to-perceive thin objects, were conducted to evaluate the performance of the proposed method and compare against an end-to-end trained baseline. The results demonstrate the benefits of the proposed semantically-enhanced deep collision prediction for learning-based autonomous navigation.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor Search
Authors:
Vani Nagarajan,
Durga Mandarapu,
Milind Kulkarni
Abstract:
The problem of identifying the k-Nearest Neighbors (kNNS) of a point has proven to be very useful both as a standalone application and as a subroutine in larger applications. Given its far-reaching applicability in areas such as machine learning and point clouds, extensive research has gone into leveraging GPU acceleration to solve this problem. Recent work has shown that using Ray Tracing cores i…
▽ More
The problem of identifying the k-Nearest Neighbors (kNNS) of a point has proven to be very useful both as a standalone application and as a subroutine in larger applications. Given its far-reaching applicability in areas such as machine learning and point clouds, extensive research has gone into leveraging GPU acceleration to solve this problem. Recent work has shown that using Ray Tracing cores in recent GPUs to accelerate kNNS is much more efficient compared to traditional acceleration using shader cores. However, the existing translation of kNNS to a ray tracing problem imposes a constraint on the search space for neighbors. Due to this, we can only use RT cores to accelerate fixed-radius kNNS, which requires the user to set a search radius a priori and hence can miss neighbors. In this work, we propose TrueKNN, the first unbounded RT-accelerated neighbor search. TrueKNN adopts an iterative approach where we incrementally grow the search space until all points have found their k neighbors. We show that our approach is orders of magnitude faster than existing approaches and can even be used to accelerate fixed-radius neighbor searches.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Aerial Gym -- Isaac Gym Simulator for Aerial Robots
Authors:
Mihir Kulkarni,
Theodor J. L. Forgaard,
Kostas Alexis
Abstract:
Develo** learning-based methods for navigation of aerial robots is an intensive data-driven process that requires highly parallelized simulation. The full utilization of such simulators is hindered by the lack of parallelized high-level control methods that imitate the real-world robot interface. Responding to this need, we develop the Aerial Gym simulator that can simulate millions of multiroto…
▽ More
Develo** learning-based methods for navigation of aerial robots is an intensive data-driven process that requires highly parallelized simulation. The full utilization of such simulators is hindered by the lack of parallelized high-level control methods that imitate the real-world robot interface. Responding to this need, we develop the Aerial Gym simulator that can simulate millions of multirotor vehicles parallelly with nonlinear geometric controllers for the Special Euclidean Group SE(3) for attitude, velocity and position tracking. We also develop functionalities for managing a large number of obstacles in the environment, enabling rapid randomization for learning of navigation tasks. In addition, we also provide sample environments having robots with simulated cameras capable of capturing RGB, depth, segmentation and optical flow data in obstacle-rich environments. This simulator is a step towards develo** a - currently missing - highly parallelized aerial robot simulation with geometric controllers at a large scale, while also providing a customizable obstacle randomization functionality for navigation tasks. We provide training scripts with compatible reinforcement learning frameworks to navigate the robot to a goal setpoint based on attitude and velocity command interfaces. Finally, we open source the simulator and aim to develop it further to speed up rendering using alternate kernel-based frameworks in order to parallelize ray-casting for depth images thus supporting a larger number of robots.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Overcoming Catastrophic Forgetting in Massively Multilingual Continual Learning
Authors:
Genta Indra Winata,
Lingjue Xie,
Karthik Radhakrishnan,
Shijie Wu,
Xisen **,
Pengxiang Cheng,
Mayank Kulkarni,
Daniel Preotiuc-Pietro
Abstract:
Real-life multilingual systems should be able to efficiently incorporate new languages as data distributions fed to the system evolve and shift over time. To do this, systems need to handle the issue of catastrophic forgetting, where the model performance drops for languages or tasks seen further in its past. In this paper, we study catastrophic forgetting, as well as methods to minimize this, in…
▽ More
Real-life multilingual systems should be able to efficiently incorporate new languages as data distributions fed to the system evolve and shift over time. To do this, systems need to handle the issue of catastrophic forgetting, where the model performance drops for languages or tasks seen further in its past. In this paper, we study catastrophic forgetting, as well as methods to minimize this, in a massively multilingual continual learning framework involving up to 51 languages and covering both classification and sequence labeling tasks. We present LR ADJUST, a learning rate scheduling method that is simple, yet effective in preserving new information without strongly overwriting past knowledge. Furthermore, we show that this method is effective across multiple continual learning approaches. Finally, we provide further insights into the dynamics of catastrophic forgetting in this massively multilingual setup.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
RT-DBSCAN: Accelerating DBSCAN using Ray Tracing Hardware
Authors:
Vani Nagarajan,
Milind Kulkarni
Abstract:
General Purpose computing on Graphical Processing Units (GPGPU) has resulted in unprecedented levels of speedup over its CPU counterparts, allowing programmers to harness the computational power of GPU shader cores to accelerate other computing applications. But this style of acceleration is best suited for regular computations (e.g., linear algebra). Recent GPUs feature new Ray Tracing (RT) cores…
▽ More
General Purpose computing on Graphical Processing Units (GPGPU) has resulted in unprecedented levels of speedup over its CPU counterparts, allowing programmers to harness the computational power of GPU shader cores to accelerate other computing applications. But this style of acceleration is best suited for regular computations (e.g., linear algebra). Recent GPUs feature new Ray Tracing (RT) cores that instead speed up the irregular process of ray tracing using Bounding Volume Hierarchies. While these cores seem limited in functionality, they can be used to accelerate n-body problems by leveraging RT cores to accelerate the required distance computations. In this work, we propose RT-DBSCAN, the first RT-accelerated DBSCAN implementation. We use RT cores to accelerate Density-Based Clustering of Applications with Noise (DBSCAN) by translating fixed-radius nearest neighbor queries to ray tracing queries. We show that leveraging the RT hardware results in speedups between 1.3x to 4x over current state-of-the-art, GPU-based DBSCAN implementations.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Table Tennis Stroke Detection and Recognition Using Ball Trajectory Data
Authors:
Kaustubh Milind Kulkarni,
Rohan S Jamadagni,
Jeffrey Aaron Paul,
Sucheth Shenoy
Abstract:
In this work, the novel task of detecting and classifying table tennis strokes solely using the ball trajectory has been explored. A single camera setup positioned in the umpire's view has been employed to procure a dataset consisting of six stroke classes executed by four professional table tennis players. Ball tracking using YOLOv4, a traditional object detection model, and TrackNetv2, a tempora…
▽ More
In this work, the novel task of detecting and classifying table tennis strokes solely using the ball trajectory has been explored. A single camera setup positioned in the umpire's view has been employed to procure a dataset consisting of six stroke classes executed by four professional table tennis players. Ball tracking using YOLOv4, a traditional object detection model, and TrackNetv2, a temporal heatmap based model, have been implemented on our dataset and their performances have been benchmarked. A mathematical approach developed to extract temporal boundaries of strokes using the ball trajectory data yielded a total of 2023 valid strokes in our dataset, while also detecting services and missed strokes successfully. The temporal convolutional network developed performed stroke recognition on completely unseen data with an accuracy of 87.155%. Several machine learning and deep learning based model architectures have been trained for stroke recognition using ball trajectory input and benchmarked based on their performances. While stroke recognition in the field of table tennis has been extensively explored based on human action recognition using video data focused on the player's actions, the use of ball trajectory data for the same is an unexplored characteristic of the sport. Hence, the motivation behind the work is to demonstrate that meaningful inferences such as stroke detection and recognition can be drawn using minimal input information.
△ Less
Submitted 19 February, 2023;
originally announced February 2023.
-
Explainable Artificial Intelligence in Retinal Imaging for the detection of Systemic Diseases
Authors:
Ayushi Raj Bhatt,
Rajkumar Vaghashiya,
Meghna Kulkarni,
Dr Prakash Kamaraj
Abstract:
Explainable Artificial Intelligence (AI) in the form of an interpretable and semiautomatic approach to stage grading ocular pathologies such as Diabetic retinopathy, Hypertensive retinopathy, and other retinopathies on the backdrop of major systemic diseases. The experimental study aims to evaluate an explainable staged grading process without using deep Convolutional Neural Networks (CNNs) direct…
▽ More
Explainable Artificial Intelligence (AI) in the form of an interpretable and semiautomatic approach to stage grading ocular pathologies such as Diabetic retinopathy, Hypertensive retinopathy, and other retinopathies on the backdrop of major systemic diseases. The experimental study aims to evaluate an explainable staged grading process without using deep Convolutional Neural Networks (CNNs) directly. Many current CNN-based deep neural networks used for diagnosing retinal disorders might have appreciable performance but fail to pinpoint the basis driving their decisions. To improve these decisions' transparency, we have proposed a clinician-in-the-loop assisted intelligent workflow that performs a retinal vascular assessment on the fundus images to derive quantifiable and descriptive parameters. The retinal vessel parameters meta-data serve as hyper-parameters for better interpretation and explainability of decisions. The semiautomatic methodology aims to have a federated approach to AI in healthcare applications with more inputs and interpretations from clinicians. The baseline process involved in the machine learning pipeline through image processing techniques for optic disc detection, vessel segmentation, and arteriole/venule identification.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
Predicting Treatment Adherence of Tuberculosis Patients at Scale
Authors:
Mihir Kulkarni,
Satvik Golechha,
Rishi Raj,
Jithin Sreedharan,
Ankit Bhardwaj,
Santanu Rathod,
Bhavin Vadera,
Jayakrishna Kurada,
Sanjay Mattoo,
Rajendra Joshi,
Kirankumar Rade,
Alpan Raval
Abstract:
Tuberculosis (TB), an infectious bacterial disease, is a significant cause of death, especially in low-income countries, with an estimated ten million new cases reported globally in $2020$. While TB is treatable, non-adherence to the medication regimen is a significant cause of morbidity and mortality. Thus, proactively identifying patients at risk of drop** off their medication regimen enables…
▽ More
Tuberculosis (TB), an infectious bacterial disease, is a significant cause of death, especially in low-income countries, with an estimated ten million new cases reported globally in $2020$. While TB is treatable, non-adherence to the medication regimen is a significant cause of morbidity and mortality. Thus, proactively identifying patients at risk of drop** off their medication regimen enables corrective measures to mitigate adverse outcomes. Using a proxy measure of extreme non-adherence and a dataset of nearly $700,000$ patients from four states in India, we formulate and solve the machine learning (ML) problem of early prediction of non-adherence based on a custom rank-based metric. We train ML models and evaluate against baselines, achieving a $\sim 100\%$ lift over rule-based baselines and $\sim 214\%$ over a random classifier, taking into account country-wide large-scale future deployment. We deal with various issues in the process, including data quality, high-cardinality categorical data, low target prevalence, distribution shift, variation across cohorts, algorithmic fairness, and the need for robustness and explainability. Our findings indicate that risk stratification of non-adherent patients is a viable, deployable-at-scale ML solution. As the official AI partner of India's Central TB Division, we are working on multiple city and state-level pilots with the goal of pan-India deployment.
△ Less
Submitted 15 November, 2022; v1 submitted 5 November, 2022;
originally announced November 2022.
-
Robust field-level inference with dark matter halos
Authors:
Helen Shao,
Francisco Villaescusa-Navarro,
Pablo Villanueva-Domingo,
Romain Teyssier,
Lehman H. Garrison,
Marco Gatti,
Derek Inman,
Yueying Ni,
Ulrich P. Steinwandel,
Mihir Kulkarni,
Eli Visbal,
Greg L. Bryan,
Daniel Angles-Alcazar,
Tiago Castro,
Elena Hernandez-Martinez,
Klaus Dolag
Abstract:
We train graph neural networks on halo catalogues from Gadget N-body simulations to perform field-level likelihood-free inference of cosmological parameters. The catalogues contain $\lesssim$5,000 halos with masses $\gtrsim 10^{10}~h^{-1}M_\odot$ in a periodic volume of $(25~h^{-1}{\rm Mpc})^3$; every halo in the catalogue is characterized by several properties such as position, mass, velocity, co…
▽ More
We train graph neural networks on halo catalogues from Gadget N-body simulations to perform field-level likelihood-free inference of cosmological parameters. The catalogues contain $\lesssim$5,000 halos with masses $\gtrsim 10^{10}~h^{-1}M_\odot$ in a periodic volume of $(25~h^{-1}{\rm Mpc})^3$; every halo in the catalogue is characterized by several properties such as position, mass, velocity, concentration, and maximum circular velocity. Our models, built to be permutationally, translationally, and rotationally invariant, do not impose a minimum scale on which to extract information and are able to infer the values of $Ω_{\rm m}$ and $σ_8$ with a mean relative error of $\sim6\%$, when using positions plus velocities and positions plus masses, respectively. More importantly, we find that our models are very robust: they can infer the value of $Ω_{\rm m}$ and $σ_8$ when tested using halo catalogues from thousands of N-body simulations run with five different N-body codes: Abacus, CUBEP$^3$M, Enzo, PKDGrav3, and Ramses. Surprisingly, the model trained to infer $Ω_{\rm m}$ also works when tested on thousands of state-of-the-art CAMELS hydrodynamic simulations run with four different codes and subgrid physics implementations. Using halo properties such as concentration and maximum circular velocity allow our models to extract more information, at the expense of breaking the robustness of the models. This may happen because the different N-body codes are not converged on the relevant scales corresponding to these parameters.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
Cornucopia: A Framework for Feedback Guided Generation of Binaries
Authors:
Vidush Singhal,
Akul Abhilash Pillai,
Charitha Saumya,
Milind Kulkarni,
Aravind Machiry
Abstract:
Binary analysis is an important capability required for many security and software engineering applications. Consequently, there are many binary analysis techniques and tools with varied capabilities. However, testing these tools requires a large, varied binary dataset with corresponding source-level information. In this paper, we present Cornucopia, an architecture agnostic automated framework th…
▽ More
Binary analysis is an important capability required for many security and software engineering applications. Consequently, there are many binary analysis techniques and tools with varied capabilities. However, testing these tools requires a large, varied binary dataset with corresponding source-level information. In this paper, we present Cornucopia, an architecture agnostic automated framework that can generate a plethora of binaries from corresponding program source by exploiting compiler optimizations and feedback-guided learning. Our evaluation shows that Cornucopia was able to generate 309K binaries across four architectures (x86, x64, ARM, MIPS) with an average of 403 binaries for each program and outperforms Bintuner, a similar technique. Our experiments revealed issues with the LLVM optimization scheduler resulting in compiler crashes ($\sim$300). Our evaluation of four popular binary analysis tools Angr, Ghidra, Idapro, and Radare, using Cornucopia generated binaries, revealed various issues with these tools. Specifically, we found 263 crashes in Angr and one memory corruption issue in Idapro. Our differential testing on the analysis results revealed various semantic bugs in these tools. We also tested machine learning tools, Asmvec, Safe, and Debin, that claim to capture binary semantics and show that they perform poorly (For instance, Debin F1 score dropped to 12.9% from reported 63.1%) on Cornucopia generated binaries. In summary, our exhaustive evaluation shows that Cornucopia is an effective mechanism to generate binaries for testing binary analysis techniques effectively.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
Synthesis of Distributed Agreement-Based Systems with Efficiently-Decidable Verification (Extended Version)
Authors:
Nouraldin Jaber,
Christopher Wagner,
Swen Jacobs,
Milind Kulkarni,
Roopsha Samanta
Abstract:
Distributed agreement-based (DAB) systems use common distributed agreement protocols such as leader election and consensus as building blocks for their target functionality. While automated verification for DAB systems is undecidable in general, recent work identifies a large class of DAB systems for which verification is efficiently-decidable. Unfortunately, the conditions characterizing such a c…
▽ More
Distributed agreement-based (DAB) systems use common distributed agreement protocols such as leader election and consensus as building blocks for their target functionality. While automated verification for DAB systems is undecidable in general, recent work identifies a large class of DAB systems for which verification is efficiently-decidable. Unfortunately, the conditions characterizing such a class can be opaque and non-intuitive, and can pose a significant challenge to system designers trying to model their systems in this class.
In this paper, we present a synthesis-driven tool, Cinnabar, to help system designers building DAB systems "fit" their intended designs into an efficiently-decidable class. In particular, starting from an initial sketch provided by the designer, Cinnabar generates sketch completions using a counterexample-guided procedure. The core technique relies on a compact encoding of a set of related counterexamples. We demonstrate Cinnabar's effectiveness by successfully and efficiently synthesizing completions for a variety of interesting DAB systems.
△ Less
Submitted 14 January, 2023; v1 submitted 25 August, 2022;
originally announced August 2022.
-
Study of Encoder-Decoder Architectures for Code-Mix Search Query Translation
Authors:
Mandar Kulkarni,
Soumya Chennabasavaraj,
Nikesh Garera
Abstract:
With the broad reach of the internet and smartphones, e-commerce platforms have an increasingly diversified user base. Since native language users are not conversant in English, their preferred browsing mode is their regional language or a combination of their regional language and English. From our recent study on the query data, we noticed that many of the queries we receive are code-mix, specif…
▽ More
With the broad reach of the internet and smartphones, e-commerce platforms have an increasingly diversified user base. Since native language users are not conversant in English, their preferred browsing mode is their regional language or a combination of their regional language and English. From our recent study on the query data, we noticed that many of the queries we receive are code-mix, specifically Hinglish i.e. queries with one or more Hindi words written in English (Latin) script. We propose a transformer-based approach for code-mix query translation to enable users to search with these queries. We demonstrate the effectiveness of pre-trained encoder-decoder models trained on a large corpus of the unlabeled English text for this task. Using generic domain translation models, we created a pseudo-labelled dataset for training the model on the search queries and verified the effectiveness of various data augmentation techniques. Further, to reduce the latency of the model, we use knowledge distillation and weight quantization. Effectiveness of the proposed method has been validated through experimental evaluations and A/B testing. The model is currently live on Flipkart app and website, serving millions of queries.
△ Less
Submitted 7 August, 2022;
originally announced August 2022.
-
Vernacular Search Query Translation with Unsupervised Domain Adaptation
Authors:
Mandar Kulkarni,
Nikesh Garera
Abstract:
With the democratization of e-commerce platforms, an increasingly diversified user base is opting to shop online. To provide a comfortable and reliable shop** experience, it's important to enable users to interact with the platform in the language of their choice. An accurate query translation is essential for Cross-Lingual Information Retrieval (CLIR) with vernacular queries. Due to internet-sc…
▽ More
With the democratization of e-commerce platforms, an increasingly diversified user base is opting to shop online. To provide a comfortable and reliable shop** experience, it's important to enable users to interact with the platform in the language of their choice. An accurate query translation is essential for Cross-Lingual Information Retrieval (CLIR) with vernacular queries. Due to internet-scale operations, e-commerce platforms get millions of search queries every day. However, creating a parallel training set to train an in-domain translation model is cumbersome. This paper proposes an unsupervised domain adaptation approach to translate search queries without using any parallel corpus. We use an open-domain translation model (trained on public corpus) and adapt it to the query data using only the monolingual queries from two languages. In addition, fine-tuning with a small labeled set further improves the result. For demonstration, we show results for Hindi to English query translation and use mBART-large-50 model as the baseline to improve upon. Experimental results show that, without using any parallel corpus, we obtain more than 20 BLEU points improvement over the baseline while fine-tuning with a small 50k labeled set provides more than 27 BLEU points improvement over the baseline.
△ Less
Submitted 7 August, 2022;
originally announced August 2022.
-
Team CERBERUS Wins the DARPA Subterranean Challenge: Technical Overview and Lessons Learned
Authors:
Marco Tranzatto,
Mihir Dharmadhikari,
Lukas Bernreiter,
Marco Camurri,
Shehryar Khattak,
Frank Mascarich,
Patrick Pfreundschuh,
David Wisth,
Samuel Zimmermann,
Mihir Kulkarni,
Victor Reijgwart,
Benoit Casseau,
Timon Homberger,
Paolo De Petris,
Lionel Ott,
Wayne Tubby,
Gabriel Waibel,
Huan Nguyen,
Cesar Cadena,
Russell Buchanan,
Lorenz Wellhausen,
Nikhil Khedekar,
Olov Andersson,
Lintong Zhang,
Takahiro Miki
, et al. (11 additional authors not shown)
Abstract:
This article presents the CERBERUS robotic system-of-systems, which won the DARPA Subterranean Challenge Final Event in 2021. The Subterranean Challenge was organized by DARPA with the vision to facilitate the novel technologies necessary to reliably explore diverse underground environments despite the grueling challenges they present for robotic autonomy. Due to their geometric complexity, degrad…
▽ More
This article presents the CERBERUS robotic system-of-systems, which won the DARPA Subterranean Challenge Final Event in 2021. The Subterranean Challenge was organized by DARPA with the vision to facilitate the novel technologies necessary to reliably explore diverse underground environments despite the grueling challenges they present for robotic autonomy. Due to their geometric complexity, degraded perceptual conditions combined with lack of GPS support, austere navigation conditions, and denied communications, subterranean settings render autonomous operations particularly demanding. In response to this challenge, we developed the CERBERUS system which exploits the synergy of legged and flying robots, coupled with robust control especially for overcoming perilous terrain, multi-modal and multi-robot perception for localization and map** in conditions of sensor degradation, and resilient autonomy through unified exploration path planning and local motion planning that reflects robot-specific limitations. Based on its ability to explore diverse underground environments and its high-level command and control by a single human supervisor, CERBERUS demonstrated efficient exploration, reliable detection of objects of interest, and accurate map**. In this article, we report results from both the preliminary runs and the final Prize Round of the DARPA Subterranean Challenge, and discuss highlights and challenges faced, alongside lessons learned for the benefit of the community.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring
Authors:
Adhitha Dias,
Kirshanthan Sundararajah,
Charitha Saumya,
Milind Kulkarni
Abstract:
Sparse tensor algebra computations have become important in many real-world applications like machine learning, scientific simulations, and data mining. Hence, automated code generation and performance optimizations for tensor algebra kernels are paramount. Recent advancements such as the Tensor Algebra Compiler (TACO) greatly generalize and automate the code generation for tensor algebra expressi…
▽ More
Sparse tensor algebra computations have become important in many real-world applications like machine learning, scientific simulations, and data mining. Hence, automated code generation and performance optimizations for tensor algebra kernels are paramount. Recent advancements such as the Tensor Algebra Compiler (TACO) greatly generalize and automate the code generation for tensor algebra expressions. However, the code generated by TACO for many important tensor computations remains suboptimal due to the absence of a scheduling directive to support transformations such as distribution/fusion.
This paper extends TACO's scheduling space to support kernel distribution/loop fusion in order to reduce asymptotic time complexity and improve locality of complex tensor algebra computations. We develop an intermediate representation (IR) for tensor operations called branched iteration graph which specifies breakdown of the computation into smaller ones (kernel distribution) and then fuse (loop fusion) outermost dimensions of the loop nests, while the innermost dimensions are distributed, to increase data locality. We describe exchanges of intermediate results between space iteration spaces, transformation in the IR, and its programmatic invocation. Finally, we show that the transformation can be used to optimize sparse tensor kernels. Our results show that this new transformation significantly improves the performance of several real-world tensor algebra computations compared to TACO-generated code.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
A Sequential MPC Approach to Reactive Planning for Bipedal Robots
Authors:
Kunal Sanjay Narkhede,
Abhijeet Mangesh Kulkarni,
Dhruv Ashwinkumar Thanki,
Ioannis Poulakakis
Abstract:
This paper presents a sequential Model Predictive Control (MPC) approach to reactive motion planning for bipedal robots in dynamic environments. The approach relies on a sequential polytopic decomposition of the free space, which provides an ordered collection of mutually intersecting obstacle free polytopes and waypoints. These are subsequently used to define a corresponding sequence of MPC progr…
▽ More
This paper presents a sequential Model Predictive Control (MPC) approach to reactive motion planning for bipedal robots in dynamic environments. The approach relies on a sequential polytopic decomposition of the free space, which provides an ordered collection of mutually intersecting obstacle free polytopes and waypoints. These are subsequently used to define a corresponding sequence of MPC programs that drive the system to a goal location avoiding static and moving obstacles. This way, the planner focuses on the free space in the vicinity of the robot, thus alleviating the need to consider all the obstacles simultaneously and reducing computational time. We verify the efficacy of our approach in high-fidelity simulations with the bipedal robot Digit, demonstrating robust reactive planning in the presence of static and moving obstacles.
△ Less
Submitted 30 April, 2022;
originally announced May 2022.
-
EntSUM: A Data Set for Entity-Centric Summarization
Authors:
Mounica Maddela,
Mayank Kulkarni,
Daniel Preotiuc-Pietro
Abstract:
Controllable summarization aims to provide summaries that take into account user-specified aspects and preferences to better assist them with their information need, as opposed to the standard summarization setup which build a single generic summary of a document. We introduce a human-annotated data set EntSUM for controllable summarization with a focus on named entities as the aspects to control.…
▽ More
Controllable summarization aims to provide summaries that take into account user-specified aspects and preferences to better assist them with their information need, as opposed to the standard summarization setup which build a single generic summary of a document. We introduce a human-annotated data set EntSUM for controllable summarization with a focus on named entities as the aspects to control. We conduct an extensive quantitative analysis to motivate the task of entity-centric summarization and show that existing methods for controllable summarization fail to generate entity-centric summaries. We propose extensions to state-of-the-art summarization approaches that achieve substantially better results on our data set. Our analysis and results show the challenging nature of this task and of the proposed data set.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
RMF-Owl: A Collision-Tolerant Flying Robot for Autonomous Subterranean Exploration
Authors:
Paolo De Petris,
Huan Nguyen,
Mihir Dharmadhikari,
Mihir Kulkarni,
Nikhil Khedekar,
Frank Mascarich,
Kostas Alexis
Abstract:
This work presents the design, hardware realization, autonomous exploration and object detection capabilities of RMF-Owl, a new collision-tolerant aerial robot tailored for resilient autonomous subterranean exploration. The system is custom built for underground exploration with focus on collision tolerance, resilient autonomy with robust localization and map**, alongside high-performance explor…
▽ More
This work presents the design, hardware realization, autonomous exploration and object detection capabilities of RMF-Owl, a new collision-tolerant aerial robot tailored for resilient autonomous subterranean exploration. The system is custom built for underground exploration with focus on collision tolerance, resilient autonomy with robust localization and map**, alongside high-performance exploration path planning in confined, obstacle-filled and topologically complex underground environments. Moreover, RMF-Owl offers the ability to search, detect and locate objects of interest which can be particularly useful in search and rescue missions. A series of results from field experiments are presented in order to demonstrate the system's ability to autonomously explore challenging unknown underground environments.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
CERBERUS: Autonomous Legged and Aerial Robotic Exploration in the Tunnel and Urban Circuits of the DARPA Subterranean Challenge
Authors:
Marco Tranzatto,
Frank Mascarich,
Lukas Bernreiter,
Carolina Godinho,
Marco Camurri,
Shehryar Khattak,
Tung Dang,
Victor Reijgwart,
Johannes Loeje,
David Wisth,
Samuel Zimmermann,
Huan Nguyen,
Marius Fehr,
Lukas Solanka,
Russell Buchanan,
Marko Bjelonic,
Nikhil Khedekar,
Mathieu Valceschini,
Fabian Jenelten,
Mihir Dharmadhikari,
Timon Homberger,
Paolo De Petris,
Lorenz Wellhausen,
Mihir Kulkarni,
Takahiro Miki
, et al. (16 additional authors not shown)
Abstract:
Autonomous exploration of subterranean environments constitutes a major frontier for robotic systems as underground settings present key challenges that can render robot autonomy hard to achieve. This has motivated the DARPA Subterranean Challenge, where teams of robots search for objects of interest in various underground environments. In response, the CERBERUS system-of-systems is presented as a…
▽ More
Autonomous exploration of subterranean environments constitutes a major frontier for robotic systems as underground settings present key challenges that can render robot autonomy hard to achieve. This has motivated the DARPA Subterranean Challenge, where teams of robots search for objects of interest in various underground environments. In response, the CERBERUS system-of-systems is presented as a unified strategy towards subterranean exploration using legged and flying robots. As primary robots, ANYmal quadruped systems are deployed considering their endurance and potential to traverse challenging terrain. For aerial robots, both conventional and collision-tolerant multirotors are utilized to explore spaces too narrow or otherwise unreachable by ground systems. Anticipating degraded sensing conditions, a complementary multi-modal sensor fusion approach utilizing camera, LiDAR, and inertial data for resilient robot pose estimation is proposed. Individual robot pose estimates are refined by a centralized multi-robot map optimization approach to improve the reported location accuracy of detected objects of interest in the DARPA-defined coordinate frame. Furthermore, a unified exploration path planning policy is presented to facilitate the autonomous operation of both legged and aerial robots in complex underground networks. Finally, to enable communication between the robots and the base station, CERBERUS utilizes a ground rover with a high-gain antenna and an optical fiber connection to the base station, alongside breadcrumbing of wireless nodes by our legged robots. We report results from the CERBERUS system-of-systems deployment at the DARPA Subterranean Challenge Tunnel and Urban Circuits, along with the current limitations and the lessons learned for the benefit of the community.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.
-
Code-based Signatures from New Proofs of Knowledge for the Syndrome Decoding Problem
Authors:
Loïc Bidoux,
Philippe Gaborit,
Mukul Kulkarni,
Victor Mateu
Abstract:
In this paper, we study code-based signatures constructed from Proof of Knowledge (PoK). This line of work can be traced back to Stern who introduces the first efficient PoK for the syndrome decoding problem in 1993. Afterward, different variations were proposed in order to reduce signature's size. In practice, obtaining a smaller signature size relies on the interaction of two main considerations…
▽ More
In this paper, we study code-based signatures constructed from Proof of Knowledge (PoK). This line of work can be traced back to Stern who introduces the first efficient PoK for the syndrome decoding problem in 1993. Afterward, different variations were proposed in order to reduce signature's size. In practice, obtaining a smaller signature size relies on the interaction of two main considerations: (i) the underlying protocol and its soundness error and (ii) the type of optimizations which are compatible with a given protocol. Over the years, different variations were proposed to improve the Stern scheme such as the Veron scheme (with public key a noisy codeword rather than a syndrome), the AGS scheme which is a 5-pass protocol with cheating probability asymptotically equal to 1/2 and more recently the FJR approach which permits to decrease the cheating probability to 1/N but induces a performance overhead. Overall the length of the signature depends on a trade-off between: the scheme in itself, the possible optimizations and the cost of the implementation. The recent approaches which increase the cost of the implementation opens the door to many different type of trade-offs. In this paper we propose three new schemes and different trade-offs, which are all interesting in themselves, since depending on potential future optimizations a scheme may eventually become more efficient than another. All the schemes we propose use a trusted helper: a first scheme permits to get a 1/2 cheating probability, a second scheme permits to decrease the cheating probability in 1/N but with a different approach than the recent FJR scheme and at last a third scheme propose a Veron-like adaptation of the FJR scheme in which the public key is a noisy codeword rather than a syndrome. We provide an extensive comparison table which lists various trade-offs between our schemes and previous ones.
△ Less
Submitted 14 January, 2022;
originally announced January 2022.
-
Some Strategies to Capture Karaka-Yogyata with Special Reference to apadana
Authors:
Swaraja Salaskar,
Diptesh Kanojia,
Malhar Kulkarni
Abstract:
In today's digital world language technology has gained importance. Several softwares, have been developed and are available in the field of computational linguistics. Such tools play a crucial role in making classical language texts easily accessible. Some Indian philosophical schools have contributed towards various techniques of verbal cognition to analyze sentences correctly. These theories ca…
▽ More
In today's digital world language technology has gained importance. Several softwares, have been developed and are available in the field of computational linguistics. Such tools play a crucial role in making classical language texts easily accessible. Some Indian philosophical schools have contributed towards various techniques of verbal cognition to analyze sentences correctly. These theories can be used to build computational tools for word sense disambiguation (WSD). In the absence of WSD, one cannot have proper verbal cognition. These theories considered the concept of 'Yogyatā' (congruity or compatibility) as the indispensable cause of verbal cognition. In this work, we come up with some insights on the basis of these theories to create a tool that will capture Yogyatā of words. We describe the problem of ambiguity in a text and present a method to resolve it computationally with the help of Yogyatā. Here, only two major schools i.e. Nyāya and Vyākarana are considered. Our paper attempts to show the implication of the creation of our tool in this area. Also, our tool involves the creation of an 'ontological tag-set' as well as strategies to mark up the lexicon. The introductory description of ablation is also covered in this paper. Such strategies and some case studies shall form the core of our paper.
△ Less
Submitted 5 January, 2022;
originally announced January 2022.
-
Strategies of Effective Digitization of Commentaries and Sub-commentaries: Towards the Construction of Textual History
Authors:
Diptesh Kanojia,
Malhar Kulkarni,
Sayali Ghodekar,
Eivind Kahrs,
Pushpak Bhattacharyya
Abstract:
This paper describes additional aspects of a digital tool called the 'Textual History Tool'. We describe its various salient features with special reference to those of its features that may help the philologist digitize commentaries and sub-commentaries on a text. This tool captures the historical evolution of a text through various temporal stages, and interrelated data culled from various types…
▽ More
This paper describes additional aspects of a digital tool called the 'Textual History Tool'. We describe its various salient features with special reference to those of its features that may help the philologist digitize commentaries and sub-commentaries on a text. This tool captures the historical evolution of a text through various temporal stages, and interrelated data culled from various types of related texts. We use the text of the Kāśikāvrtti (KV) as a sample text, and with the help of philologists, we digitize the commentaries available to us. We digitize the Nyāsa (Ny), the Padamañjarī (Pm) and sub commentaries on the KV text known as the Tantrapradīpa (Tp), and the Makaranda (Mk). We divide each commentary and sub-commentary into functional units and describe the methodology and motivation behind the functional unit division. Our functional unit division helps generate more accurate phylogenetic trees for the text, based on distance methods using the data entered in the tool.
△ Less
Submitted 5 January, 2022;
originally announced January 2022.
-
Utilizing Wordnets for Cognate Detection among Indian Languages
Authors:
Diptesh Kanojia,
Kevin Patel,
Pushpak Bhattacharyya,
Malhar Kulkarni,
Gholamreza Haffari
Abstract:
Automatic Cognate Detection (ACD) is a challenging task which has been utilized to help NLP applications like Machine Translation, Information Retrieval and Computational Phylogenetics. Unidentified cognate pairs can pose a challenge to these applications and result in a degradation of performance. In this paper, we detect cognate word pairs among ten Indian languages with Hindi and use deep learn…
▽ More
Automatic Cognate Detection (ACD) is a challenging task which has been utilized to help NLP applications like Machine Translation, Information Retrieval and Computational Phylogenetics. Unidentified cognate pairs can pose a challenge to these applications and result in a degradation of performance. In this paper, we detect cognate word pairs among ten Indian languages with Hindi and use deep learning methodologies to predict whether a word pair is cognate or not. We identify IndoWordnet as a potential resource to detect cognate word pairs based on orthographic similarity-based methods and train neural network models using the data obtained from it. We identify parallel corpora as another potential resource and perform the same experiments for them. We also validate the contribution of Wordnets through further experimentation and report improved performance of up to 26%. We discuss the nuances of cognate detection among closely related Indian languages and release the lists of detected cognates as a dataset. We also observe the behaviour of, to an extent, unrelated Indian language pairs and release the lists of detected cognates among them as well.
△ Less
Submitted 30 December, 2021;
originally announced December 2021.
-
Challenge Dataset of Cognates and False Friend Pairs from Indian Languages
Authors:
Diptesh Kanojia,
Pushpak Bhattacharyya,
Malhar Kulkarni,
Gholamreza Haffari
Abstract:
Cognates are present in multiple variants of the same text across different languages (e.g., "hund" in German and "hound" in English language mean "dog"). They pose a challenge to various Natural Language Processing (NLP) applications such as Machine Translation, Cross-lingual Sense Disambiguation, Computational Phylogenetics, and Information Retrieval. A possible solution to address this challeng…
▽ More
Cognates are present in multiple variants of the same text across different languages (e.g., "hund" in German and "hound" in English language mean "dog"). They pose a challenge to various Natural Language Processing (NLP) applications such as Machine Translation, Cross-lingual Sense Disambiguation, Computational Phylogenetics, and Information Retrieval. A possible solution to address this challenge is to identify cognates across language pairs. In this paper, we describe the creation of two cognate datasets for twelve Indian languages, namely Sanskrit, Hindi, Assamese, Oriya, Kannada, Gujarati, Tamil, Telugu, Punjabi, Bengali, Marathi, and Malayalam. We digitize the cognate data from an Indian language cognate dictionary and utilize linked Indian language Wordnets to generate cognate sets. Additionally, we use the Wordnet data to create a False Friends' dataset for eleven language pairs. We also evaluate the efficacy of our dataset using previously available baseline cognate detection approaches. We also perform a manual evaluation with the help of lexicographers and release the curated gold-standard dataset with this paper.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages
Authors:
Diptesh Kanojia,
Raj Dabre,
Shubham Dewangan,
Pushpak Bhattacharyya,
Gholamreza Haffari,
Malhar Kulkarni
Abstract:
Cognates are variants of the same lexical form across different languages; for example 'fonema' in Spanish and 'phoneme' in English are cognates, both of which mean 'a unit of sound'. The task of automatic detection of cognates among any two languages can help downstream NLP tasks such as Cross-lingual Information Retrieval, Computational Phylogenetics, and Machine Translation. In this paper, we d…
▽ More
Cognates are variants of the same lexical form across different languages; for example 'fonema' in Spanish and 'phoneme' in English are cognates, both of which mean 'a unit of sound'. The task of automatic detection of cognates among any two languages can help downstream NLP tasks such as Cross-lingual Information Retrieval, Computational Phylogenetics, and Machine Translation. In this paper, we demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian Languages. Our approach introduces the use of context from a knowledge graph to generate improved feature representations for cognate detection. We, then, evaluate the impact of our cognate detection mechanism on neural machine translation (NMT), as a downstream task. We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages, namely, Sanskrit, Hindi, Assamese, Oriya, Kannada, Gujarati, Tamil, Telugu, Punjabi, Bengali, Marathi, and Malayalam. Additionally, we create evaluation datasets for two more Indian languages, Konkani and Nepali. We observe an improvement of up to 18% points, in terms of F-score, for cognate detection. Furthermore, we observe that cognates extracted using our method help improve NMT quality by up to 2.76 BLEU. We also release our code, newly constructed datasets and cross-lingual models publicly.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
Learning Rich Representation of Keyphrases from Text
Authors:
Mayank Kulkarni,
Debanjan Mahata,
Ravneet Arora,
Rajarshi Bhowmik
Abstract:
In this work, we explore how to train task-specific language models aimed towards learning rich representation of keyphrases from text documents. We experiment with different masking strategies for pre-training transformer language models (LMs) in discriminative as well as generative settings. In the discriminative setting, we introduce a new pre-training objective - Keyphrase Boundary Infilling w…
▽ More
In this work, we explore how to train task-specific language models aimed towards learning rich representation of keyphrases from text documents. We experiment with different masking strategies for pre-training transformer language models (LMs) in discriminative as well as generative settings. In the discriminative setting, we introduce a new pre-training objective - Keyphrase Boundary Infilling with Replacement (KBIR), showing large gains in performance (upto 8.16 points in F1) over SOTA, when the LM pre-trained using KBIR is fine-tuned for the task of keyphrase extraction. In the generative setting, we introduce a new pre-training setup for BART - KeyBART, that reproduces the keyphrases related to the input text in the CatSeq format, instead of the denoised original input. This also led to gains in performance (upto 4.33 points in F1@M) over SOTA for keyphrase generation. Additionally, we also fine-tune the pre-trained language models on named entity recognition (NER), question answering (QA), relation extraction (RE), abstractive summarization and achieve comparable performance with that of the SOTA, showing that learning rich representation of keyphrases is indeed beneficial for many other fundamental NLP tasks.
△ Less
Submitted 10 July, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
Cognition-aware Cognate Detection
Authors:
Diptesh Kanojia,
Prashant Sharma,
Sayali Ghodekar,
Pushpak Bhattacharyya,
Gholamreza Haffari,
Malhar Kulkarni
Abstract:
Automatic detection of cognates helps downstream NLP tasks of Machine Translation, Cross-lingual Information Retrieval, Computational Phylogenetics and Cross-lingual Named Entity Recognition. Previous approaches for the task of cognate detection use orthographic, phonetic and semantic similarity based features sets. In this paper, we propose a novel method for enriching the feature sets, with cogn…
▽ More
Automatic detection of cognates helps downstream NLP tasks of Machine Translation, Cross-lingual Information Retrieval, Computational Phylogenetics and Cross-lingual Named Entity Recognition. Previous approaches for the task of cognate detection use orthographic, phonetic and semantic similarity based features sets. In this paper, we propose a novel method for enriching the feature sets, with cognitive features extracted from human readers' gaze behaviour. We collect gaze behaviour data for a small sample of cognates and show that extracted cognitive features help the task of cognate detection. However, gaze data collection and annotation is a costly task. We use the collected gaze behaviour data to predict cognitive features for a larger sample and show that predicted cognitive features, also, significantly improve the task performance. We report improvements of 10% with the collected gaze features, and 12% using the predicted gaze features, over the previously proposed approaches. Furthermore, we release the collected gaze behaviour data along with our code and cross-lingual models.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Autonomous Teamed Exploration of Subterranean Environments using Legged and Aerial Robots
Authors:
Mihir Kulkarni,
Mihir Dharmadhikari,
Marco Tranzatto,
Samuel Zimmermann,
Victor Reijgwart,
Paolo De Petris,
Huan Nguyen,
Nikhil Khedekar,
Christos Papachristos,
Lionel Ott,
Roland Siegwart,
Marco Hutter,
Kostas Alexis
Abstract:
This paper presents a novel strategy for autonomous teamed exploration of subterranean environments using legged and aerial robots. Tailored to the fact that subterranean settings, such as cave networks and underground mines, often involve complex, large-scale and multi-branched topologies, while wireless communication within them can be particularly challenging, this work is structured around the…
▽ More
This paper presents a novel strategy for autonomous teamed exploration of subterranean environments using legged and aerial robots. Tailored to the fact that subterranean settings, such as cave networks and underground mines, often involve complex, large-scale and multi-branched topologies, while wireless communication within them can be particularly challenging, this work is structured around the synergy of an onboard exploration path planner that allows for resilient long-term autonomy, and a multi-robot coordination framework. The onboard path planner is unified across legged and flying robots and enables navigation in environments with steep slopes, and diverse geometries. When a communication link is available, each robot of the team shares submaps to a centralized location where a multi-robot coordination framework identifies global frontiers of the exploration space to inform each system about where it should re-position to best continue its mission. The strategy is verified through a field deployment inside an underground mine in Switzerland using a legged and a flying robot collectively exploring for 45 min, as well as a longer simulation study with three systems.
△ Less
Submitted 16 May, 2022; v1 submitted 11 November, 2021;
originally announced November 2021.
-
Quasi-Cyclic Stern Proof of Knowledge
Authors:
Loïc Bidoux,
Philippe Gaborit,
Mukul Kulkarni,
Nicolas Sendrier
Abstract:
The ongoing NIST standardization process has shown that Proof of Knowledge (PoK) based signatures have become an important type of possible post-quantum signatures. Regarding code-based cryptography, the original approach for PoK based signatures is the Stern protocol which allows to prove the knowledge of a small weight vector solving a given instance of the Syndrome Decoding (SD) problem over F2…
▽ More
The ongoing NIST standardization process has shown that Proof of Knowledge (PoK) based signatures have become an important type of possible post-quantum signatures. Regarding code-based cryptography, the original approach for PoK based signatures is the Stern protocol which allows to prove the knowledge of a small weight vector solving a given instance of the Syndrome Decoding (SD) problem over F2. It features a soundness error equal to 2/3. This protocol was improved a few years later by Véron who proposed a variation of the scheme based on the General Syndrome Decoding (GSD) problem which leads to better results in term of communication. A few years later, the AGS protocol introduced a variation of the Véron protocol based on Quasi-Cyclic (QC) matrices. The AGS protocol permits to obtain an asymptotic soundness error of 1/2 and an improvement in term of communications. In the present paper, we introduce the Quasi-Cyclic Stern PoK which constitutes an adaptation of the AGS scheme in a SD context, as well as several new optimizations for code-based PoK. Our main optimization on the size of the signature can't be applied to GSD based protocols such as AGS and therefore motivated the design of our new protocol. In addition, we also provide a special soundness proof that is compatible with the use of the Fiat-Shamir transform for 5-round protocols. This approach is valid for our protocol but also for the AGS protocol which was lacking such a proof. We compare our results with existing signatures including the recent code-based signatures based on PoK leveraging the MPC in the head paradigm. In practice, our new protocol is as fast as AGS while reducing its associated signature length by 20%. As a consequence, it constitutes an interesting trade-off between signature length and execution time for the design of a code-based signature relying only on the difficulty of the SD problem.
△ Less
Submitted 4 February, 2022; v1 submitted 11 October, 2021;
originally announced October 2021.
-
DARM: Control-Flow Melding for SIMT Thread Divergence Reduction -- Extended Version
Authors:
Charitha Saumya,
Kirshanthan Sundararajah,
Milind Kulkarni
Abstract:
GPGPUs use the Single-Instruction-Multiple-Thread (SIMT) execution model where a group of threads-wavefront or warp-execute instructions in lockstep. When threads in a group encounter a branching instruction, not all threads in the group take the same path, a phenomenon known as control-flow divergence. The control-flow divergence causes performance degradation because both paths of the branch mus…
▽ More
GPGPUs use the Single-Instruction-Multiple-Thread (SIMT) execution model where a group of threads-wavefront or warp-execute instructions in lockstep. When threads in a group encounter a branching instruction, not all threads in the group take the same path, a phenomenon known as control-flow divergence. The control-flow divergence causes performance degradation because both paths of the branch must be executed one after the other. Prior research has primarily addressed this issue through architectural modifications. We observe that certain GPGPU kernels with control-flow divergence have similar control-flow structures with similar instructions on both sides of a branch. This structure can be exploited to reduce control-flow divergence by melding the two sides of the branch allowing threads to reconverge early, reducing divergence. In this work, we present DARM, a compiler analysis and transformation framework that can meld divergent control-flow structures with similar instruction sequences. We show that DARM can reduce the performance degradation from control-flow divergence.
△ Less
Submitted 13 January, 2022; v1 submitted 12 July, 2021;
originally announced July 2021.
-
Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations
Authors:
Chaitanya Koparkar,
Mike Rainey,
Michael Vollmer,
Milind Kulkarni,
Ryan R. Newton
Abstract:
Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a maximally dense format consequently serializes the processing of that data, yielding a tension between density and parallelism. This paper shows that a disciplined, p…
▽ More
Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a maximally dense format consequently serializes the processing of that data, yielding a tension between density and parallelism. This paper shows that a disciplined, practical compromise is possible. We present Parallel Gibbon, a compiler that obtains the benefits of dense data formats and parallelism. We formalize the semantics of the parallel location calculus underpinning this novel implementation strategy, and show that it is type-safe. Parallel Gibbon exceeds the parallel performance of existing compilers for purely functional programs that use recursive algebraic datatypes, including, notably, abstract-syntax-tree traversals as in compilers.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Embedded vs. External Controllers in Software-Defined IoT Networks
Authors:
Miheer Kulkarni,
Michael Baddeley,
Israat Haque
Abstract:
The flexible and programmable architectural model offered by Software-Defined Networking (SDN) has re-imagined modern networks. Supported by powerful hardware and high-speed communications between devices and the controller, SDN provides a means to virtualize control functionality and enable rapid network reconfiguration in response to dynamic application requirements. However, recent efforts to a…
▽ More
The flexible and programmable architectural model offered by Software-Defined Networking (SDN) has re-imagined modern networks. Supported by powerful hardware and high-speed communications between devices and the controller, SDN provides a means to virtualize control functionality and enable rapid network reconfiguration in response to dynamic application requirements. However, recent efforts to apply SDN's centralized control model to the Internet of Things (IoT) have identified significant challenges due to the constraints faced by embedded low-power devices and networks that reside at the IoT edge. In particular, reliance on external SDN controllers on the backbone network introduces a performance bottleneck (e.g., latency). To this end, we advocate a case for supporting Software-Defined IoT networks through the introduction of lightweight SDN controllers directly on the embedded hardware. We firstly explore the performance of two popular SDN implementations for IoT mesh networks, $μ$SDN and SDN-WISE, showing the former demonstrates considerable gains over the latter. We consequently employ $μ$SDN to conduct a study of embedded vs. external SDN controller performance. We highlight how the advantage of an embedded controller is reduced as the network scales, and quantify a point at which an external controller should be used for larger networks.
△ Less
Submitted 6 June, 2021;
originally announced June 2021.
-
Table Tennis Stroke Recognition Using Two-Dimensional Human Pose Estimation
Authors:
Kaustubh Milind Kulkarni,
Sucheth Shenoy
Abstract:
We introduce a novel method for collecting table tennis video data and perform stroke detection and classification. A diverse dataset containing video data of 11 basic strokes obtained from 14 professional table tennis players, summing up to a total of 22111 videos has been collected using the proposed setup. The temporal convolutional neural network model developed using 2D pose estimation perfor…
▽ More
We introduce a novel method for collecting table tennis video data and perform stroke detection and classification. A diverse dataset containing video data of 11 basic strokes obtained from 14 professional table tennis players, summing up to a total of 22111 videos has been collected using the proposed setup. The temporal convolutional neural network model developed using 2D pose estimation performs multiclass classification of these 11 table tennis strokes with a validation accuracy of 99.37%. Moreover, the neural network generalizes well over the data of a player excluded from the training and validation dataset, classifying the fresh strokes with an overall best accuracy of 98.72%. Various model architectures using machine learning and deep learning based approaches have been trained for stroke recognition and their performances have been compared and benchmarked. Inferences such as performance monitoring and stroke comparison of the players using the model have been discussed. Therefore, we are contributing to the development of a computer vision based sports analytics system for the sport of table tennis that focuses on the previously unexploited aspect of the sport i.e., a player's strokes, which is extremely insightful for performance improvement.
△ Less
Submitted 31 May, 2021; v1 submitted 20 April, 2021;
originally announced April 2021.