-
Evaluating NoSQL Databases for OLAP Workloads: A Benchmarking Study of MongoDB, Redis, Kudu and ArangoDB
Authors:
Rishi Kesav Mohan,
Risheek Rakshit Sukumar Kanmani,
Krishna Anandan Ganesan,
Nisha Ramasubramanian
Abstract:
In the era of big data, conventional RDBMS models have become impractical for handling colossal workloads. Consequently, NoSQL databases have emerged as the preferred storage solutions for executing processing-intensive Online Analytical Processing (OLAP) tasks. Within the realm of NoSQL databases, various classifications exist based on their data storage mechanisms, making it challenging to selec…
▽ More
In the era of big data, conventional RDBMS models have become impractical for handling colossal workloads. Consequently, NoSQL databases have emerged as the preferred storage solutions for executing processing-intensive Online Analytical Processing (OLAP) tasks. Within the realm of NoSQL databases, various classifications exist based on their data storage mechanisms, making it challenging to select the most suitable one for a given OLAP workload. While each NoSQL database boasts distinct advantages, inherent scalability, adaptability to diverse data formats, and high data availability are universally recognized benefits crucial for managing OLAP workloads effectively. Existing research predominantly evaluates individual databases within custom data pipeline setups, lacking a standardized approach for comparative analysis across different databases to identify the optimal data pipeline for OLAP workloads. In this paper, we present our experimental insights into how various NoSQL databases handle OLAP workloads within a standardized data processing pipeline. Our experimental pipeline comprises Apache Spark for large-scale transformations, data cleansing, and schema normalization, diverse NoSQL databases as data stores, and a Business Intelligence tool for data analysis and visualization.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Amodal Optical Flow
Authors:
Maximilian Luz,
Rohit Mohan,
Ahmed Rida Sekkat,
Oliver Sawade,
Elmar Matthes,
Thomas Brox,
Abhinav Valada
Abstract:
Optical flow estimation is very challenging in situations with transparent or occluded objects. In this work, we address these challenges at the task level by introducing Amodal Optical Flow, which integrates optical flow with amodal perception. Instead of only representing the visible regions, we define amodal optical flow as a multi-layered pixel-level motion field that encompasses both visible…
▽ More
Optical flow estimation is very challenging in situations with transparent or occluded objects. In this work, we address these challenges at the task level by introducing Amodal Optical Flow, which integrates optical flow with amodal perception. Instead of only representing the visible regions, we define amodal optical flow as a multi-layered pixel-level motion field that encompasses both visible and occluded regions of the scene. To facilitate research on this new task, we extend the AmodalSynthDrive dataset to include pixel-level labels for amodal optical flow estimation. We present several strong baselines, along with the Amodal Flow Quality metric to quantify the performance in an interpretable manner. Furthermore, we propose the novel AmodalFlowNet as an initial step toward addressing this task. AmodalFlowNet consists of a transformer-based cost-volume encoder paired with a recurrent transformer decoder which facilitates recurrent hierarchical feature propagation and amodal semantic grounding. We demonstrate the tractability of amodal optical flow in extensive experiments and show its utility for downstream tasks such as panoptic tracking. We make the dataset, code, and trained models publicly available at http://amodal-flow.cs.uni-freiburg.de.
△ Less
Submitted 7 May, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Panoptic Out-of-Distribution Segmentation
Authors:
Rohit Mohan,
Kiran Kumaraswamy,
Juana Valeria Hurtado,
Kürsat Petek,
Abhinav Valada
Abstract:
Deep learning has led to remarkable strides in scene understanding with panoptic segmentation emerging as a key holistic scene interpretation task. However, the performance of panoptic segmentation is severely impacted in the presence of out-of-distribution (OOD) objects i.e. categories of objects that deviate from the training distribution. To overcome this limitation, we propose Panoptic Out-of…
▽ More
Deep learning has led to remarkable strides in scene understanding with panoptic segmentation emerging as a key holistic scene interpretation task. However, the performance of panoptic segmentation is severely impacted in the presence of out-of-distribution (OOD) objects i.e. categories of objects that deviate from the training distribution. To overcome this limitation, we propose Panoptic Out-of Distribution Segmentation for joint pixel-level semantic in-distribution and out-of-distribution classification with instance prediction. We extend two established panoptic segmentation benchmarks, Cityscapes and BDD100K, with out-of-distribution instance segmentation annotations, propose suitable evaluation metrics, and present multiple strong baselines. Importantly, we propose the novel PoDS architecture with a shared backbone, an OOD contextual module for learning global and local OOD object cues, and dual symmetrical decoders with task-specific heads that employ our alignment-mismatch strategy for better OOD generalization. Combined with our data augmentation strategy, this approach facilitates progressive learning of out-of-distribution objects while maintaining in-distribution performance. We perform extensive evaluations that demonstrate that our proposed PoDS network effectively addresses the main challenges and substantially outperforms the baselines. We make the dataset, code, and trained models publicly available at http://pods.cs.uni-freiburg.de.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving
Authors:
Ahmed Rida Sekkat,
Rohit Mohan,
Oliver Sawade,
Elmar Matthes,
Abhinav Valada
Abstract:
Unlike humans, who can effortlessly estimate the entirety of objects even when partially occluded, modern computer vision algorithms still find this aspect extremely challenging. Leveraging this amodal perception for autonomous driving remains largely untapped due to the lack of suitable datasets. The curation of these datasets is primarily hindered by significant annotation costs and mitigating a…
▽ More
Unlike humans, who can effortlessly estimate the entirety of objects even when partially occluded, modern computer vision algorithms still find this aspect extremely challenging. Leveraging this amodal perception for autonomous driving remains largely untapped due to the lack of suitable datasets. The curation of these datasets is primarily hindered by significant annotation costs and mitigating annotator subjectivity in accurately labeling occluded regions. To address these limitations, we introduce AmodalSynthDrive, a synthetic multi-task multi-modal amodal perception dataset. The dataset provides multi-view camera images, 3D bounding boxes, LiDAR data, and odometry for 150 driving sequences with over 1M object annotations in diverse traffic, weather, and lighting conditions. AmodalSynthDrive supports multiple amodal scene understanding tasks including the introduced amodal depth estimation for enhanced spatial understanding. We evaluate several baselines for each of these tasks to illustrate the challenges and set up public benchmarking servers. The dataset is available at http://amodalsynthdrive.cs.uni-freiburg.de.
△ Less
Submitted 11 March, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Syn-Mediverse: A Multimodal Synthetic Dataset for Intelligent Scene Understanding of Healthcare Facilities
Authors:
Rohit Mohan,
José Arce,
Sassan Mokhtar,
Daniele Cattaneo,
Abhinav Valada
Abstract:
Safety and efficiency are paramount in healthcare facilities where the lives of patients are at stake. Despite the adoption of robots to assist medical staff in challenging tasks such as complex surgeries, human expertise is still indispensable. The next generation of autonomous healthcare robots hinges on their capacity to perceive and understand their complex and frenetic environments. While dee…
▽ More
Safety and efficiency are paramount in healthcare facilities where the lives of patients are at stake. Despite the adoption of robots to assist medical staff in challenging tasks such as complex surgeries, human expertise is still indispensable. The next generation of autonomous healthcare robots hinges on their capacity to perceive and understand their complex and frenetic environments. While deep learning models are increasingly used for this purpose, they require extensive annotated training data which is impractical to obtain in real-world healthcare settings. To bridge this gap, we present Syn-Mediverse, the first hyper-realistic multimodal synthetic dataset of diverse healthcare facilities. Syn-Mediverse contains over \num{48000} images from a simulated industry-standard optical tracking camera and provides more than 1.5M annotations spanning five different scene understanding tasks including depth estimation, object detection, semantic segmentation, instance segmentation, and panoptic segmentation. We demonstrate the complexity of our dataset by evaluating the performance on a broad range of state-of-the-art baselines for each task. To further advance research on scene understanding of healthcare facilities, along with the public dataset we provide an online evaluation benchmark available at \url{http://syn-mediverse.cs.uni-freiburg.de}
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
Controllable Prosody Generation With Partial Inputs
Authors:
Dan Andrei Iliescu,
Devang Savita Ram Mohan,
Tian Huey Teh,
Zack Hodari
Abstract:
We address the problem of human-in-the-loop control for generating prosody in the context of text-to-speech synthesis. Controlling prosody is challenging because existing generative models lack an efficient interface through which users can modify the output quickly and precisely. To solve this, we introduce a novel framework whereby the user provides partial inputs and the generative model genera…
▽ More
We address the problem of human-in-the-loop control for generating prosody in the context of text-to-speech synthesis. Controlling prosody is challenging because existing generative models lack an efficient interface through which users can modify the output quickly and precisely. To solve this, we introduce a novel framework whereby the user provides partial inputs and the generative model generates the missing features. We propose a model that is specifically designed to encode partial prosodic features and output complete audio. We show empirically that our model displays two essential qualities of a human-in-the-loop control mechanism: efficiency and robustness. With even a very small number of input values (~4), our model enables users to improve the quality of the output significantly in terms of listener preference (4:1).
△ Less
Submitted 15 April, 2024; v1 submitted 14 March, 2023;
originally announced March 2023.
-
NODAGS-Flow: Nonlinear Cyclic Causal Structure Learning
Authors:
Muralikrishnna G. Sethuraman,
Romain Lopez,
Rahul Mohan,
Faramarz Fekri,
Tommaso Biancalani,
Jan-Christian Hütter
Abstract:
Learning causal relationships between variables is a well-studied problem in statistics, with many important applications in science. However, modeling real-world systems remain challenging, as most existing algorithms assume that the underlying causal graph is acyclic. While this is a convenient framework for develo** theoretical developments about causal reasoning and inference, the underlying…
▽ More
Learning causal relationships between variables is a well-studied problem in statistics, with many important applications in science. However, modeling real-world systems remain challenging, as most existing algorithms assume that the underlying causal graph is acyclic. While this is a convenient framework for develo** theoretical developments about causal reasoning and inference, the underlying modeling assumption is likely to be violated in real systems, because feedback loops are common (e.g., in biological systems). Although a few methods search for cyclic causal models, they usually rely on some form of linearity, which is also limiting, or lack a clear underlying probabilistic model. In this work, we propose a novel framework for learning nonlinear cyclic causal graphical models from interventional data, called NODAGS-Flow. We perform inference via direct likelihood optimization, employing techniques from residual normalizing flows for likelihood estimation. Through synthetic experiments and an application to single-cell high-content perturbation screening data, we show significant performance improvements with our approach compared to state-of-the-art methods with respect to structure recovery and predictive performance.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
Perceiving the Invisible: Proposal-Free Amodal Panoptic Segmentation
Authors:
Rohit Mohan,
Abhinav Valada
Abstract:
Amodal panoptic segmentation aims to connect the perception of the world to its cognitive understanding. It entails simultaneously predicting the semantic labels of visible scene regions and the entire shape of traffic participant instances, including regions that may be occluded. In this work, we formulate a proposal-free framework that tackles this task as a multi-label and multi-class problem b…
▽ More
Amodal panoptic segmentation aims to connect the perception of the world to its cognitive understanding. It entails simultaneously predicting the semantic labels of visible scene regions and the entire shape of traffic participant instances, including regions that may be occluded. In this work, we formulate a proposal-free framework that tackles this task as a multi-label and multi-class problem by first assigning the amodal masks to different layers according to their relative occlusion order and then employing amodal instance regression on each layer independently while learning background semantics. We propose the \net architecture that incorporates a shared backbone and an asymmetrical dual-decoder consisting of several modules to facilitate within-scale and cross-scale feature aggregations, bilateral feature propagation between decoders, and integration of global instance-level and local pixel-level occlusion reasoning. Further, we propose the amodal mask refiner that resolves the ambiguity in complex occlusion scenarios by explicitly leveraging the embedding of unoccluded instance masks. Extensive evaluation on the BDD100K-APS and KITTI-360-APS datasets demonstrate that our approach set the new state-of-the-art on both benchmarks.
△ Less
Submitted 29 May, 2022;
originally announced May 2022.
-
Amodal Panoptic Segmentation
Authors:
Rohit Mohan,
Abhinav Valada
Abstract:
Humans have the remarkable ability to perceive objects as a whole, even when parts of them are occluded. This ability of amodal perception forms the basis of our perceptual and cognitive understanding of our world. To enable robots to reason with this capability, we formulate and propose a novel task that we name amodal panoptic segmentation. The goal of this task is to simultaneously predict the…
▽ More
Humans have the remarkable ability to perceive objects as a whole, even when parts of them are occluded. This ability of amodal perception forms the basis of our perceptual and cognitive understanding of our world. To enable robots to reason with this capability, we formulate and propose a novel task that we name amodal panoptic segmentation. The goal of this task is to simultaneously predict the pixel-wise semantic segmentation labels of the visible regions of stuff classes and the instance segmentation labels of both the visible and occluded regions of thing classes. To facilitate research on this new task, we extend two established benchmark datasets with pixel-level amodal panoptic segmentation labels that we make publicly available as KITTI-360-APS and BDD100K-APS. We present several strong baselines, along with the amodal panoptic quality (APQ) and amodal parsing coverage (APC) metrics to quantify the performance in an interpretable manner. Furthermore, we propose the novel amodal panoptic segmentation network (APSNet), as a first step towards addressing this task by explicitly modeling the complex relationships between the occluders and occludes. Extensive experimental evaluations demonstrate that APSNet achieves state-of-the-art performance on both benchmarks and more importantly exemplifies the utility of amodal recognition. The benchmarks are available at http://amodal-panoptic.cs.uni-freiburg.de.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
7th AI Driving Olympics: 1st Place Report for Panoptic Tracking
Authors:
Rohit Mohan,
Abhinav Valada
Abstract:
In this technical report, we describe our EfficientLPT architecture that won the panoptic tracking challenge in the 7th AI Driving Olympics at NeurIPS 2021. Our architecture builds upon the top-down EfficientLPS panoptic segmentation approach. EfficientLPT consists of a shared backbone with a modified EfficientNet-B5 model comprising the proximity convolution module as the encoder followed by the…
▽ More
In this technical report, we describe our EfficientLPT architecture that won the panoptic tracking challenge in the 7th AI Driving Olympics at NeurIPS 2021. Our architecture builds upon the top-down EfficientLPS panoptic segmentation approach. EfficientLPT consists of a shared backbone with a modified EfficientNet-B5 model comprising the proximity convolution module as the encoder followed by the range-aware FPN to aggregate semantically rich range-aware multi-scale features. Subsequently, we employ two task-specific heads, the scale-invariant semantic head and hybrid task cascade with feedback from the semantic head as the instance head. Further, we employ a novel panoptic fusion module to adaptively fuse logits from each of the heads to yield the panoptic tracking output. Our approach exploits three consecutive accumulated scans to predict locally consistent panoptic tracking IDs and also the overlap between the scans to predict globally consistent panoptic tracking IDs for a given sequence. The benchmarking results from the 7th AI Driving Olympics at NeurIPS 2021 show that our model is ranked #1 for the panoptic tracking task on the Panoptic nuScenes dataset.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking
Authors:
Whye Kit Fong,
Rohit Mohan,
Juana Valeria Hurtado,
Lubing Zhou,
Holger Caesar,
Oscar Beijbom,
Abhinav Valada
Abstract:
Panoptic scene understanding and tracking of dynamic agents are essential for robots and automated vehicles to navigate in urban environments. As LiDARs provide accurate illumination-independent geometric depictions of the scene, performing these tasks using LiDAR point clouds provides reliable predictions. However, existing datasets lack diversity in the type of urban scenes and have a limited nu…
▽ More
Panoptic scene understanding and tracking of dynamic agents are essential for robots and automated vehicles to navigate in urban environments. As LiDARs provide accurate illumination-independent geometric depictions of the scene, performing these tasks using LiDAR point clouds provides reliable predictions. However, existing datasets lack diversity in the type of urban scenes and have a limited number of dynamic object instances which hinders both learning of these tasks as well as credible benchmarking of the developed methods. In this paper, we introduce the large-scale Panoptic nuScenes benchmark dataset that extends our popular nuScenes dataset with point-wise groundtruth annotations for semantic segmentation, panoptic segmentation, and panoptic tracking tasks. To facilitate comparison, we provide several strong baselines for each of these tasks on our proposed dataset. Moreover, we analyze the drawbacks of the existing metrics for panoptic tracking and propose the novel instance-centric PAT metric that addresses the concerns. We present exhaustive experiments that demonstrate the utility of Panoptic nuScenes compared to existing datasets and make the online evaluation server available at nuScenes.org. We believe that this extension will accelerate the research of novel methods for scene understanding of dynamic urban environments.
△ Less
Submitted 23 December, 2021; v1 submitted 8 September, 2021;
originally announced September 2021.
-
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
Authors:
Devang S Ram Mohan,
Vivian Hu,
Tian Huey Teh,
Alexandra Torresquintero,
Christopher G. R. Wallis,
Marlene Staib,
Lorenzo Foglianti,
Jiameng Gao,
Simon King
Abstract:
Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text. One way to reduce the amount of unexplained variation in training data is to provide acoustic information as an additional learning signal. When generating speech, modifying this acoustic information enables multiple distinct rendit…
▽ More
Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text. One way to reduce the amount of unexplained variation in training data is to provide acoustic information as an additional learning signal. When generating speech, modifying this acoustic information enables multiple distinct renditions of a text to be produced.
Since much of the unexplained variation is in the prosody, we propose a model that generates speech explicitly conditioned on the three primary acoustic correlates of prosody: $F_{0}$, energy and duration. The model is flexible about how the values of these features are specified: they can be externally provided, or predicted from text, or predicted then subsequently modified.
Compared to a model that employs a variational auto-encoder to learn unsupervised latent features, our model provides more interpretable, temporally-precise, and disentangled control. When automatically predicting the acoustic features from text, it generates speech that is more natural than that from a Tacotron 2 model with reference encoder. Subsequent human-in-the-loop modification of the predicted acoustic features can significantly further increase naturalness.
△ Less
Submitted 15 June, 2021;
originally announced June 2021.
-
Machine Learning for Performance Prediction of Channel Bonding in Next-Generation IEEE 802.11 WLANs
Authors:
Francesc Wilhelmi,
David Góez,
Paola Soto,
Ramon Vallés,
Mohammad Alfaifi,
Abdulrahman Algunayah,
Jorge Martin-Pérez,
Luigi Girletti,
Rajasekar Mohan,
K Venkat Ramnan,
Boris Bellalta
Abstract:
With the advent of Artificial Intelligence (AI)-empowered communications, industry, academia, and standardization organizations are progressing on the definition of mechanisms and procedures to address the increasing complexity of future 5G and beyond communications. In this context, the International Telecommunication Union (ITU) organized the first AI for 5G Challenge to bring industry and acade…
▽ More
With the advent of Artificial Intelligence (AI)-empowered communications, industry, academia, and standardization organizations are progressing on the definition of mechanisms and procedures to address the increasing complexity of future 5G and beyond communications. In this context, the International Telecommunication Union (ITU) organized the first AI for 5G Challenge to bring industry and academia together to introduce and solve representative problems related to the application of Machine Learning (ML) to networks. In this paper, we present the results gathered from Problem Statement~13 (PS-013), organized by Universitat Pompeu Fabra (UPF), which primary goal was predicting the performance of next-generation Wireless Local Area Networks (WLANs) applying Channel Bonding (CB) techniques. In particular, we overview the ML models proposed by participants (including Artificial Neural Networks, Graph Neural Networks, Random Forest regression, and gradient boosting) and analyze their performance on an open dataset generated using the IEEE 802.11ax-oriented Komondor network simulator. The accuracy achieved by the proposed methods demonstrates the suitability of ML for predicting the performance of WLANs. Moreover, we discuss the importance of abstracting WLAN interactions to achieve better results, and we argue that there is certainly room for improvement in throughput prediction through ML.
△ Less
Submitted 29 May, 2021;
originally announced May 2021.
-
EfficientLPS: Efficient LiDAR Panoptic Segmentation
Authors:
Kshitij Sirohi,
Rohit Mohan,
Daniel Büscher,
Wolfram Burgard,
Abhinav Valada
Abstract:
Panoptic segmentation of point clouds is a crucial task that enables autonomous vehicles to comprehend their vicinity using their highly accurate and reliable LiDAR sensors. Existing top-down approaches tackle this problem by either combining independent task-specific networks or translating methods from the image domain ignoring the intricacies of LiDAR data and thus often resulting in sub-optima…
▽ More
Panoptic segmentation of point clouds is a crucial task that enables autonomous vehicles to comprehend their vicinity using their highly accurate and reliable LiDAR sensors. Existing top-down approaches tackle this problem by either combining independent task-specific networks or translating methods from the image domain ignoring the intricacies of LiDAR data and thus often resulting in sub-optimal performance. In this paper, we present the novel top-down Efficient LiDAR Panoptic Segmentation (EfficientLPS) architecture that addresses multiple challenges in segmenting LiDAR point clouds including distance-dependent sparsity, severe occlusions, large scale-variations, and re-projection errors. EfficientLPS comprises of a novel shared backbone that encodes with strengthened geometric transformation modeling capacity and aggregates semantically rich range-aware multi-scale features. It incorporates new scale-invariant semantic and instance segmentation heads along with the panoptic fusion module which is supervised by our proposed panoptic periphery loss function. Additionally, we formulate a regularized pseudo labeling framework to further improve the performance of EfficientLPS by training on unlabelled data. We benchmark our proposed model on two large-scale LiDAR datasets: nuScenes, for which we also provide ground truth annotations, and SemanticKITTI. Notably, EfficientLPS sets the new state-of-the-art on both these datasets.
△ Less
Submitted 4 November, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
-
Robust Vision Challenge 2020 -- 1st Place Report for Panoptic Segmentation
Authors:
Rohit Mohan,
Abhinav Valada
Abstract:
In this technical report, we present key details of our winning panoptic segmentation architecture EffPS_b1bs4_RVC. Our network is a lightweight version of our state-of-the-art EfficientPS architecture that consists of our proposed shared backbone with a modified EfficientNet-B5 model as the encoder, followed by the 2-way FPN to learn semantically rich multi-scale features. It consists of two task…
▽ More
In this technical report, we present key details of our winning panoptic segmentation architecture EffPS_b1bs4_RVC. Our network is a lightweight version of our state-of-the-art EfficientPS architecture that consists of our proposed shared backbone with a modified EfficientNet-B5 model as the encoder, followed by the 2-way FPN to learn semantically rich multi-scale features. It consists of two task-specific heads, a modified Mask R-CNN instance head and our novel semantic segmentation head that processes features of different scales with specialized modules for coherent feature refinement. Finally, our proposed panoptic fusion module adaptively fuses logits from each of the heads to yield the panoptic segmentation output. The Robust Vision Challenge 2020 benchmarking results show that our model is ranked #1 on Microsoft COCO, VIPER and WildDash, and is ranked #2 on Cityscapes and Mapillary Vistas, thereby achieving the overall rank #1 for the panoptic segmentation task.
△ Less
Submitted 23 August, 2020;
originally announced August 2020.
-
Phonological Features for 0-shot Multilingual Speech Synthesis
Authors:
Marlene Staib,
Tian Huey Teh,
Alexandra Torresquintero,
Devang S Ram Mohan,
Lorenzo Foglianti,
Raphael Lenain,
Jiameng Gao
Abstract:
Code-switching---the intra-utterance use of multiple languages---is prevalent across the world. Within text-to-speech (TTS), multilingual models have been found to enable code-switching. By modifying the linguistic input to sequence-to-sequence TTS, we show that code-switching is possible for languages unseen during training, even within monolingual models. We use a small set of phonological featu…
▽ More
Code-switching---the intra-utterance use of multiple languages---is prevalent across the world. Within text-to-speech (TTS), multilingual models have been found to enable code-switching. By modifying the linguistic input to sequence-to-sequence TTS, we show that code-switching is possible for languages unseen during training, even within monolingual models. We use a small set of phonological features derived from the International Phonetic Alphabet (IPA), such as vowel height and frontness, consonant place and manner. This allows the model topology to stay unchanged for different languages, and enables new, previously unseen feature combinations to be interpreted by the model. We show that this allows us to generate intelligible, code-switched speech in a new language at test time, including the approximation of sounds never seen in training.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning
Authors:
Devang S Ram Mohan,
Raphael Lenain,
Lorenzo Foglianti,
Tian Huey Teh,
Marlene Staib,
Alexandra Torresquintero,
Jiameng Gao
Abstract:
Modern approaches to text to speech require the entire input character sequence to be processed before any audio is synthesised. This latency limits the suitability of such models for time-sensitive tasks like simultaneous interpretation. Interleaving the action of reading a character with that of synthesising audio reduces this latency. However, the order of this sequence of interleaved actions v…
▽ More
Modern approaches to text to speech require the entire input character sequence to be processed before any audio is synthesised. This latency limits the suitability of such models for time-sensitive tasks like simultaneous interpretation. Interleaving the action of reading a character with that of synthesising audio reduces this latency. However, the order of this sequence of interleaved actions varies across sentences, which raises the question of how the actions should be chosen. We propose a reinforcement learning based framework to train an agent to make this decision. We compare our performance against that of deterministic, rule-based systems. Our results demonstrate that our agent successfully balances the trade-off between the latency of audio generation and the quality of synthesised audio. More broadly, we show that neural sequence-to-sequence models can be adapted to run in an incremental manner.
△ Less
Submitted 7 August, 2020;
originally announced August 2020.
-
MOPT: Multi-Object Panoptic Tracking
Authors:
Juana Valeria Hurtado,
Rohit Mohan,
Wolfram Burgard,
Abhinav Valada
Abstract:
Comprehensive understanding of dynamic scenes is a critical prerequisite for intelligent robots to autonomously operate in their environment. Research in this domain, which encompasses diverse perception problems, has primarily been focused on addressing specific tasks individually rather than modeling the ability to understand dynamic scenes holistically. In this paper, we introduce a novel perce…
▽ More
Comprehensive understanding of dynamic scenes is a critical prerequisite for intelligent robots to autonomously operate in their environment. Research in this domain, which encompasses diverse perception problems, has primarily been focused on addressing specific tasks individually rather than modeling the ability to understand dynamic scenes holistically. In this paper, we introduce a novel perception task denoted as multi-object panoptic tracking (MOPT), which unifies the conventionally disjoint tasks of semantic segmentation, instance segmentation, and multi-object tracking. MOPT allows for exploiting pixel-level semantic information of 'thing' and 'stuff' classes, temporal coherence, and pixel-level associations over time, for the mutual benefit of each of the individual sub-problems. To facilitate quantitative evaluations of MOPT in a unified manner, we propose the soft panoptic tracking quality (sPTQ) metric. As a first step towards addressing this task, we propose the novel PanopticTrackNet architecture that builds upon the state-of-the-art top-down panoptic segmentation network EfficientPS by adding a new tracking head to simultaneously learn all sub-tasks in an end-to-end manner. Additionally, we present several strong baselines that combine predictions from state-of-the-art panoptic segmentation and multi-object tracking models for comparison. We present extensive quantitative and qualitative evaluations of both vision-based and LiDAR-based MOPT that demonstrate encouraging results.
△ Less
Submitted 27 May, 2020; v1 submitted 17 April, 2020;
originally announced April 2020.
-
EfficientPS: Efficient Panoptic Segmentation
Authors:
Rohit Mohan,
Abhinav Valada
Abstract:
Understanding the scene in which an autonomous robot operates is critical for its competent functioning. Such scene comprehension necessitates recognizing instances of traffic participants along with general scene semantics which can be effectively addressed by the panoptic segmentation task. In this paper, we introduce the Efficient Panoptic Segmentation (EfficientPS) architecture that consists o…
▽ More
Understanding the scene in which an autonomous robot operates is critical for its competent functioning. Such scene comprehension necessitates recognizing instances of traffic participants along with general scene semantics which can be effectively addressed by the panoptic segmentation task. In this paper, we introduce the Efficient Panoptic Segmentation (EfficientPS) architecture that consists of a shared backbone which efficiently encodes and fuses semantically rich multi-scale features. We incorporate a new semantic head that aggregates fine and contextual features coherently and a new variant of Mask R-CNN as the instance head. We also propose a novel panoptic fusion module that congruously integrates the output logits from both the heads of our EfficientPS architecture to yield the final panoptic segmentation output. Additionally, we introduce the KITTI panoptic segmentation dataset that contains panoptic annotations for the popularly challenging KITTI benchmark. Extensive evaluations on Cityscapes, KITTI, Mapillary Vistas and Indian Driving Dataset demonstrate that our proposed architecture consistently sets the new state-of-the-art on all these four benchmarks while being the most efficient and fast panoptic segmentation architecture to date.
△ Less
Submitted 1 February, 2021; v1 submitted 5 April, 2020;
originally announced April 2020.
-
Vision-Based Autonomous UAV Navigation and Landing for Urban Search and Rescue
Authors:
Mayank Mittal,
Rohit Mohan,
Wolfram Burgard,
Abhinav Valada
Abstract:
Unmanned Aerial Vehicles (UAVs) equipped with bioradars are a life-saving technology that can enable identification of survivors under collapsed buildings in the aftermath of natural disasters such as earthquakes or gas explosions. However, these UAVs have to be able to autonomously navigate in disaster struck environments and land on debris piles in order to accurately locate the survivors. This…
▽ More
Unmanned Aerial Vehicles (UAVs) equipped with bioradars are a life-saving technology that can enable identification of survivors under collapsed buildings in the aftermath of natural disasters such as earthquakes or gas explosions. However, these UAVs have to be able to autonomously navigate in disaster struck environments and land on debris piles in order to accurately locate the survivors. This problem is extremely challenging as pre-existing maps cannot be leveraged for navigation due to structural changes that may have occurred. Furthermore, existing landing site detection algorithms are not suitable to identify safe landing regions on debris piles. In this work, we present a computationally efficient system for autonomous UAV navigation and landing that does not require any prior knowledge about the environment. We propose a novel landing site detection algorithm that computes costmaps based on several hazard factors including terrain flatness, steepness, depth accuracy, and energy consumption information. We also introduce a first-of-a-kind synthetic dataset of over 1.2 million images of collapsed buildings with groundtruth depth, surface normals, semantics and camera pose information. We demonstrate the efficacy of our system using experiments from a city scale hyperrealistic simulation environment and in real-world scenarios with collapsed buildings.
△ Less
Submitted 3 September, 2019; v1 submitted 4 June, 2019;
originally announced June 2019.
-
Robot Localization in Floor Plans Using a Room Layout Edge Extraction Network
Authors:
Federico Boniardi,
Abhinav Valada,
Rohit Mohan,
Tim Caselitz,
Wolfram Burgard
Abstract:
Indoor localization is one of the crucial enablers for deployment of service robots. Although several successful techniques for indoor localization have been proposed, the majority of them relies on maps generated from data gathered with the same sensor modality used for localization. Typically, tedious labor by experts is needed to acquire this data, thus limiting the readiness of the system as w…
▽ More
Indoor localization is one of the crucial enablers for deployment of service robots. Although several successful techniques for indoor localization have been proposed, the majority of them relies on maps generated from data gathered with the same sensor modality used for localization. Typically, tedious labor by experts is needed to acquire this data, thus limiting the readiness of the system as well as its ease of installation for inexperienced operators. In this paper, we propose a memory and computationally efficient monocular camera-based localization system that allows a robot to estimate its pose given an architectural floor plan. Our method employs a convolutional neural network to predict room layout edges from a single camera image and estimates the robot pose using a particle filter that matches the extracted edges to the given floor plan. We evaluate our localization system using multiple real-world experiments and demonstrate that it has the robustness and accuracy required for reliable indoor navigation.
△ Less
Submitted 12 July, 2019; v1 submitted 5 March, 2019;
originally announced March 2019.
-
Self-Supervised Model Adaptation for Multimodal Semantic Segmentation
Authors:
Abhinav Valada,
Rohit Mohan,
Wolfram Burgard
Abstract:
Learning to reliably perceive and understand the scene is an integral enabler for robots to operate in the real-world. This problem is inherently challenging due to the multitude of object types as well as appearance changes caused by varying illumination and weather conditions. Leveraging complementary modalities can enable learning of semantically richer representations that are resilient to suc…
▽ More
Learning to reliably perceive and understand the scene is an integral enabler for robots to operate in the real-world. This problem is inherently challenging due to the multitude of object types as well as appearance changes caused by varying illumination and weather conditions. Leveraging complementary modalities can enable learning of semantically richer representations that are resilient to such perturbations. Despite the tremendous progress in recent years, most multimodal convolutional neural network approaches directly concatenate feature maps from individual modality streams rendering the model incapable of focusing only on relevant complementary information for fusion. To address this limitation, we propose a mutimodal semantic segmentation framework that dynamically adapts the fusion of modality-specific features while being sensitive to the object category, spatial location and scene context in a self-supervised manner. Specifically, we propose an architecture consisting of two modality-specific encoder streams that fuse intermediate encoder representations into a single decoder using our proposed self-supervised model adaptation fusion mechanism which optimally combines complementary features. As intermediate representations are not aligned across modalities, we introduce an attention scheme for better correlation. In addition, we propose a computationally efficient unimodal segmentation architecture termed AdapNet++ that incorporates a new encoder with multiscale residual units and an efficient atrous spatial pyramid pooling that has a larger effective receptive field with more than 10x fewer parameters, complemented with a strong decoder with a multi-resolution supervision scheme that recovers high-resolution details. Comprehensive empirical evaluations on several benchmarks demonstrate that both our unimodal and multimodal architectures achieve state-of-the-art performance.
△ Less
Submitted 8 July, 2019; v1 submitted 11 August, 2018;
originally announced August 2018.
-
Ranking academic institutions on potential paper acceptance in upcoming conferences
Authors:
Jobin Wilson,
Ram Mohan,
Muhammad Arif,
Santanu Chaudhury,
Brejesh Lall
Abstract:
The crux of the problem in KDD Cup 2016 involves develo** data mining techniques to rank research institutions based on publications. Rank importance of research institutions are derived from predictions on the number of full research papers that would potentially get accepted in upcoming top-tier conferences, utilizing public information on the web. This paper describes our solution to KDD Cup…
▽ More
The crux of the problem in KDD Cup 2016 involves develo** data mining techniques to rank research institutions based on publications. Rank importance of research institutions are derived from predictions on the number of full research papers that would potentially get accepted in upcoming top-tier conferences, utilizing public information on the web. This paper describes our solution to KDD Cup 2016. We used a two step approach in which we first identify full research papers corresponding to each conference of interest and then train two variants of exponential smoothing models to make predictions. Our solution achieves an overall score of 0.7508, while the winning submission scored 0.7656 in the overall results.
△ Less
Submitted 10 October, 2016;
originally announced October 2016.
-
Real World Applications of Machine Learning Techniques over Large Mobile Subscriber Datasets
Authors:
Jobin Wilson,
Chitharanj Kachappilly,
Rakesh Mohan,
Prateek Kapadia,
Arun Soman,
Santanu Chaudhury
Abstract:
Communication Service Providers (CSPs) are in a unique position to utilize their vast transactional data assets generated from interactions of subscribers with network elements as well as with other subscribers. CSPs could leverage its data assets for a gamut of applications such as service personalization, predictive offer management, loyalty management, revenue forecasting, network capacity plan…
▽ More
Communication Service Providers (CSPs) are in a unique position to utilize their vast transactional data assets generated from interactions of subscribers with network elements as well as with other subscribers. CSPs could leverage its data assets for a gamut of applications such as service personalization, predictive offer management, loyalty management, revenue forecasting, network capacity planning, product bundle optimization and churn management to gain significant competitive advantage. However, due to the sheer data volume, variety, velocity and veracity of mobile subscriber datasets, sophisticated data analytics techniques and frameworks are necessary to derive actionable insights in a useable timeframe. In this paper, we describe our journey from a relational database management system (RDBMS) based campaign management solution which allowed data scientists and marketers to use hand-written rules for service personalization and targeted promotions to a distributed Big Data Analytics platform, capable of performing large scale machine learning and data mining to deliver real time service personalization, predictive modelling and product optimization. Our work involves a careful blend of technology, processes and best practices, which facilitate man-machine collaboration and continuous experimentation to derive measurable economic value from data. Our platform has a reach of more than 500 million mobile subscribers worldwide, delivering over 1 billion personalized recommendations annually, processing a total data volume of 64 Petabytes, corresponding to 8.5 trillion events.
△ Less
Submitted 8 February, 2015;
originally announced February 2015.
-
Deep Deconvolutional Networks for Scene Parsing
Authors:
Rahul Mohan
Abstract:
Scene parsing is an important and challenging prob- lem in computer vision. It requires labeling each pixel in an image with the category it belongs to. Tradition- ally, it has been approached with hand-engineered features from color information in images. Recently convolutional neural networks (CNNs), which automatically learn hierar- chies of features, have achieved record performance on the tas…
▽ More
Scene parsing is an important and challenging prob- lem in computer vision. It requires labeling each pixel in an image with the category it belongs to. Tradition- ally, it has been approached with hand-engineered features from color information in images. Recently convolutional neural networks (CNNs), which automatically learn hierar- chies of features, have achieved record performance on the task. These approaches typically include a post-processing technique, such as superpixels, to produce the final label- ing. In this paper, we propose a novel network architecture that combines deep deconvolutional neural networks with CNNs. Our experiments show that deconvolutional neu- ral networks are capable of learning higher order image structure beyond edge primitives in comparison to CNNs. The new network architecture is employed for multi-patch training, introduced as part of this work. Multi-patch train- ing makes it possible to effectively learn spatial priors from scenes. The proposed approach yields state-of-the-art per- formance on four scene parsing datasets, namely Stanford Background, SIFT Flow, CamVid, and KITTI. In addition, our system has the added advantage of having a training system that can be completely automated end-to-end with- out requiring any post-processing.
△ Less
Submitted 14 November, 2014;
originally announced November 2014.
-
Network Analysis and Application Control Software based on Client-Server Architecture
Authors:
Ramya Mohan
Abstract:
This paper outlines a comprehensive model to increase system efficiency, preserve network bandwidth, monitor incoming and outgoing packets, ensure the security of confidential files and reduce power wastage in an organization. This model illustrates the use and potential application of a Network Analysis Tool (NAT) in a multi-computer set-up of any scale. The model is designed to run in the backgr…
▽ More
This paper outlines a comprehensive model to increase system efficiency, preserve network bandwidth, monitor incoming and outgoing packets, ensure the security of confidential files and reduce power wastage in an organization. This model illustrates the use and potential application of a Network Analysis Tool (NAT) in a multi-computer set-up of any scale. The model is designed to run in the background and not hamper any currently executing applications, while using minimum system resources. It was developed as open source software, using VB. Net, with a view to overcoming limitations of legacy systems and financial restrictions in small-to mid-level organizations like businesses and educational institutes. It is fully-customizable and serves as a simple and open-source alternative to existing software. The NAT relies on simple client-server architecture and uses remote access to monitor and maintain the computers on a network, for example logging off a user or shutting down a computer after a certain "idle" time, enabling and disabling applications, troubleshooting and so on. The NAT was tested in a laboratory and resultant data is presented, along with the results of a survey that was conducted among users.
△ Less
Submitted 18 April, 2013;
originally announced April 2013.
-
On fractionally linear functions over a finite field
Authors:
V. M. Siddlenikov,
R. N. Mohan,
Moon Ho Lee
Abstract:
Abstrct: In this note, by considering fractionally linear functions over a finite field and consequently develo** an abstract sequence, we study some of its properties.
Abstrct: In this note, by considering fractionally linear functions over a finite field and consequently develo** an abstract sequence, we study some of its properties.
△ Less
Submitted 11 May, 2006;
originally announced May 2006.
-
On Orthogonalities in Matrices
Authors:
R. N. Mohan
Abstract:
In this paper we have discussed different possible orthogonalities in matrices, namely orthogonal, quasi-orthogonal, semi-orthogonal and non-orthogonal matrices including completely positive matrices, while giving some of their constructions besides studying some of their properties.
In this paper we have discussed different possible orthogonalities in matrices, namely orthogonal, quasi-orthogonal, semi-orthogonal and non-orthogonal matrices including completely positive matrices, while giving some of their constructions besides studying some of their properties.
△ Less
Submitted 9 May, 2006;
originally announced May 2006.
-
Certain t-partite graphs
Authors:
R. N. Mohan,
Moon Ho Lee,
Subhash Pokrel
Abstract:
By making use of the generalized concept of orthogonality in Latin squares, certain t-partite graphs have been constructed and a suggestion for a net work system and some applications have been made.
By making use of the generalized concept of orthogonality in Latin squares, certain t-partite graphs have been constructed and a suggestion for a net work system and some applications have been made.
△ Less
Submitted 10 May, 2006; v1 submitted 18 April, 2006;
originally announced April 2006.
-
A New Fault-Tolerant M-network and its Analysis
Authors:
R. N. Mohan,
P. T. Kulkarni
Abstract:
This paper introduces a new class of efficient inter connection networks called as M-graphs for large multi-processor systems.The concept of M-matrix and M-graph is an extension of Mn-matrices and Mn-graphs.We analyze these M-graphs regarding their suitability for large multi-processor systems. An(p,N) M-graph consists of N nodes, where p is the degree of each node.The topology is found to be ha…
▽ More
This paper introduces a new class of efficient inter connection networks called as M-graphs for large multi-processor systems.The concept of M-matrix and M-graph is an extension of Mn-matrices and Mn-graphs.We analyze these M-graphs regarding their suitability for large multi-processor systems. An(p,N) M-graph consists of N nodes, where p is the degree of each node.The topology is found to be having many attractive features prominent among them is the capability of maximal fault-tolerance, high density and constant diameter.It is found that these combinatorial structures exibit some properties like symmetry,and an inter-relation with the nodes, and degree of the concerned graph, which can be utilized for the purposes of inter connected networks.But many of the properties of these mathematical and graphical structures still remained unexplored and the present aim of the paper is to study and analyze some of the properties of these M-graphs and explore their application in networks and multi-processor systems.
△ Less
Submitted 12 April, 2006;
originally announced April 2006.
-
On Hadamard Conjecture
Authors:
R. N. Mohan
Abstract:
In this note, while giving an overview of the state of art of the well known Hadamard conjecture, which is more than a century old and now it has been established by using the methods given in the two papers by Mohan et al [6,7].
In this note, while giving an overview of the state of art of the well known Hadamard conjecture, which is more than a century old and now it has been established by using the methods given in the two papers by Mohan et al [6,7].
△ Less
Submitted 11 April, 2006;
originally announced April 2006.
-
A new M-matrix of Type III, its properties and applications
Authors:
R. N. Mohan,
Moon Ho Lee,
Ram Paudal
Abstract:
Some binary matrices like (1,-1) and (1,0) were studied by many authors like Cohn, Wang, Ehlich and Ehlich and Zeller, and Mohan, Kageyama, Lee, and Gao. In this recent paper by Mohan et al considered the M-matrices of Type I and II by studying some of their properties and applications. In the present paper they discussed the M-matrices of Type III, and studied their properties and applications.…
▽ More
Some binary matrices like (1,-1) and (1,0) were studied by many authors like Cohn, Wang, Ehlich and Ehlich and Zeller, and Mohan, Kageyama, Lee, and Gao. In this recent paper by Mohan et al considered the M-matrices of Type I and II by studying some of their properties and applications. In the present paper they discussed the M-matrices of Type III, and studied their properties and applications. They gave some constructions of SPBIB designs and some corresponding M-graphs, which are being constructed by it. This is the continuation of our earlier research work in this direction, and these papers establish the importance of non-orthogonal matrices as well.
△ Less
Submitted 11 April, 2006;
originally announced April 2006.
-
On Orthogonality of Latin Squares
Authors:
R. N. Mohan,
Moon Ho Lee,
Subash Pokreal
Abstract:
An arrangement of s elements in s rows and s columns, such that no element repeats more than once in each row and each column is called a Latin square of order s. If two Latin squares of the same order superimposed one on the other and in the resultant array if each ordered pair occurs once and only once then they are called othogonal Latin Squares. A frequency square is an nxn matrix, such that…
▽ More
An arrangement of s elements in s rows and s columns, such that no element repeats more than once in each row and each column is called a Latin square of order s. If two Latin squares of the same order superimposed one on the other and in the resultant array if each ordered pair occurs once and only once then they are called othogonal Latin Squares. A frequency square is an nxn matrix, such that each element from the list of n elements, occurs t times in each row and in each column. These two concepts lead to a new third concept called as t orthogonal latin squares, where from a set of m orthogonal Latin squares, if t orthogonal Latin squares are superimposed and each ordered t tuple in the resultant array occurs once and only once then it is t othogonal Latin square. In this paper it is proposed to construct such t othogonal latin squares
△ Less
Submitted 5 June, 2006; v1 submitted 10 April, 2006;
originally announced April 2006.
-
Certain new M-matrices and their properties and applications
Authors:
R. N. Mohan,
Sanpei Kageyama,
Moon Ho Lee,
Gao Yang
Abstract:
The Mn-matrix was defined by Mohan [20] in which he has shown a method of constructing (1,-1)-matrices and studied some of their properties. The (1,-1)-matrices were constructed and studied by Cohn [5],Wang [33], Ehrlich [8] and Ehrlich and Zeller[9]. But in this paper, while giving some resemblances of this matrix with Hadamard matrix, and by naming it as M-matrix, we show how to construct part…
▽ More
The Mn-matrix was defined by Mohan [20] in which he has shown a method of constructing (1,-1)-matrices and studied some of their properties. The (1,-1)-matrices were constructed and studied by Cohn [5],Wang [33], Ehrlich [8] and Ehrlich and Zeller[9]. But in this paper, while giving some resemblances of this matrix with Hadamard matrix, and by naming it as M-matrix, we show how to construct partially balanced incomplete block (PBIB) designs and some regular bipartite graphs by it. We have considered two types of these M- matrices. Also we will make a mention of certain applications of these M-matrices in signal and communication processing, and network systems and end with some open problems.
△ Less
Submitted 9 April, 2006;
originally announced April 2006.