Search | arXiv e-print repository

Exploring the Task-agnostic Trait of Self-supervised Learning in the Context of Detecting Mental Disorders

Abstract: Self-supervised learning (SSL) has been investigated to generate task-agnostic representations across various domains. However, such investigation has not been conducted for detecting multiple mental disorders. The rationale behind the existence of a task-agnostic representation lies in the overlap** symptoms among multiple mental disorders. Consequently, the behavioural data collected for menta… ▽ More Self-supervised learning (SSL) has been investigated to generate task-agnostic representations across various domains. However, such investigation has not been conducted for detecting multiple mental disorders. The rationale behind the existence of a task-agnostic representation lies in the overlap** symptoms among multiple mental disorders. Consequently, the behavioural data collected for mental health assessment may carry a mixed bag of attributes related to multiple disorders. Motivated by that, in this study, we explore a task-agnostic representation derived through SSL in the context of detecting major depressive disorder (MDD) and post-traumatic stress disorder (PTSD) using audio and video data collected during interactive sessions. This study employs SSL models trained by predicting multiple fixed targets or masked frames. We propose a list of fixed targets to make the generated representation more efficient for detecting MDD and PTSD. Furthermore, we modify the hyper-parameters of the SSL encoder predicting fixed targets to generate global representations that capture varying temporal contexts. Both these innovations are noted to yield improved detection performances for considered mental disorders and exhibit task-agnostic traits. In the context of the SSL model predicting masked frames, the generated global representations are also noted to exhibit task-agnostic traits. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2309.08603 [pdf, other]

Closing the Loop on Runtime Monitors with Fallback-Safe MPC

Authors: Rohan Sinha, Edward Schmerling, Marco Pavone

Abstract: When we rely on deep-learned models for robotic perception, we must recognize that these models may behave unreliably on inputs dissimilar from the training data, compromising the closed-loop system's safety. This raises fundamental questions on how we can assess confidence in perception systems and to what extent we can take safety-preserving actions when external environmental changes degrade ou… ▽ More When we rely on deep-learned models for robotic perception, we must recognize that these models may behave unreliably on inputs dissimilar from the training data, compromising the closed-loop system's safety. This raises fundamental questions on how we can assess confidence in perception systems and to what extent we can take safety-preserving actions when external environmental changes degrade our perception model's performance. Therefore, we present a framework to certify the safety of a perception-enabled system deployed in novel contexts. To do so, we leverage robust model predictive control (MPC) to control the system using the perception estimates while maintaining the feasibility of a safety-preserving fallback plan that does not rely on the perception system. In addition, we calibrate a runtime monitor using recently proposed conformal prediction techniques to certifiably detect when the perception system degrades beyond the tolerance of the MPC controller, resulting in an end-to-end safety assurance. We show that this control framework and calibration technique allows us to certify the system's safety with orders of magnitudes fewer samples than required to retrain the perception network when we deploy in a novel context on a photo-realistic aircraft taxiing simulator. Furthermore, we illustrate the safety-preserving behavior of the MPC on simulated examples of a quadrotor. We open-source our simulation platform and provide videos of our results at our project page: https://tinyurl.com/fallback-safe-mpc. △ Less

Submitted 17 September, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: Accepted to the 2023 IEEE Conference on Decision and Control

arXiv:2308.05133 [pdf, other]

Analyzing the Effect of Data Impurity on the Detection Performances of Mental Disorders

Authors: Rohan Kumar Gupta, Rohit Sinha

Abstract: The primary method for identifying mental disorders automatically has traditionally involved using binary classifiers. These classifiers are trained using behavioral data obtained from an interview setup. In this training process, data from individuals with the specific disorder under consideration are categorized as the positive class, while data from all other participants constitute the negativ… ▽ More The primary method for identifying mental disorders automatically has traditionally involved using binary classifiers. These classifiers are trained using behavioral data obtained from an interview setup. In this training process, data from individuals with the specific disorder under consideration are categorized as the positive class, while data from all other participants constitute the negative class. In practice, it is widely recognized that certain mental disorders share similar symptoms, causing the collected behavioral data to encompass a variety of attributes associated with multiple disorders. Consequently, attributes linked to the targeted mental disorder might also be present within the negative class. This data impurity may lead to sub-optimal training of the classifier for a mental disorder of interest. In this study, we investigate this hypothesis in the context of major depressive disorder (MDD) and post-traumatic stress disorder detection (PTSD). The results show that upon removal of such data impurity, MDD and PTSD detection performances are significantly improved. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2212.14020 [pdf, other]

A System-Level View on Out-of-Distribution Data in Robotics

Authors: Rohan Sinha, Apoorva Sharma, Somrita Banerjee, Thomas Lew, Rachel Luo, Spencer M. Richards, Yixiao Sun, Edward Schmerling, Marco Pavone

Abstract: When testing conditions differ from those represented in training data, so-called out-of-distribution (OOD) inputs can mar the reliability of learned components in the modern robot autonomy stack. Therefore, co** with OOD data is an important challenge on the path towards trustworthy learning-enabled open-world autonomy. In this paper, we aim to demystify the topic of OOD data and its associated… ▽ More When testing conditions differ from those represented in training data, so-called out-of-distribution (OOD) inputs can mar the reliability of learned components in the modern robot autonomy stack. Therefore, co** with OOD data is an important challenge on the path towards trustworthy learning-enabled open-world autonomy. In this paper, we aim to demystify the topic of OOD data and its associated challenges in the context of data-driven robotic systems, drawing connections to emerging paradigms in the ML community that study the effect of OOD data on learned models in isolation. We argue that as roboticists, we should reason about the overall \textit{system-level} competence of a robot as it operates in OOD conditions. We highlight key research questions around this system-level view of OOD problems to guide future research toward safe and reliable learning-enabled autonomy. △ Less

Submitted 25 August, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

arXiv:2212.01371 [pdf, other]

Adaptive Robust Model Predictive Control via Uncertainty Cancellation

Authors: Rohan Sinha, James Harrison, Spencer M. Richards, Marco Pavone

Abstract: We propose a learning-based robust predictive control algorithm that compensates for significant uncertainty in the dynamics for a class of discrete-time systems that are nominally linear with an additive nonlinear component. Such systems commonly model the nonlinear effects of an unknown environment on a nominal system. We optimize over a class of nonlinear feedback policies inspired by certainty… ▽ More We propose a learning-based robust predictive control algorithm that compensates for significant uncertainty in the dynamics for a class of discrete-time systems that are nominally linear with an additive nonlinear component. Such systems commonly model the nonlinear effects of an unknown environment on a nominal system. We optimize over a class of nonlinear feedback policies inspired by certainty equivalent "estimate-and-cancel" control laws pioneered in classical adaptive control to achieve significant performance improvements in the presence of uncertainties of large magnitude, a setting in which existing learning-based predictive control algorithms often struggle to guarantee safety. In contrast to previous work in robust adaptive MPC, our approach allows us to take advantage of structure (i.e., the numerical predictions) in the a priori unknown dynamics learned online through function approximation. Our approach also extends typical nonlinear adaptive control methods to systems with state and input constraints even when we cannot directly cancel the additive uncertain function from the dynamics. We apply contemporary statistical estimation techniques to certify the system's safety through persistent constraint satisfaction with high probability. Moreover, we propose using Bayesian meta-learning algorithms that learn calibrated model priors to help satisfy the assumptions of the control design in challenging settings. Finally, we show in simulation that our method can accommodate more significant unknown dynamics terms than existing methods and that the use of Bayesian meta-learning allows us to adapt to the test environments more rapidly. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: Under review for the IEEE Transaction on Automatic Control, special issue on learning and control. arXiv admin note: text overlap with arXiv:2104.08261

arXiv:2208.02463 [pdf, other]

Exploring the Role of Emotion Regulation Difficulties in the Assessment of Mental Disorders

Authors: Rohan Kumar Gupta, Rohit Sinha

Abstract: Several studies have been reported in the literature for the automatic detection of mental disorders. It is reported that mental disorders are highly correlated. The exploration of this fact for the automatic detection of mental disorders is yet to explore. Emotion regulation difficulties (ERD) characterize several mental disorders. Motivated by that, we investigated the use of ERD for the detecti… ▽ More Several studies have been reported in the literature for the automatic detection of mental disorders. It is reported that mental disorders are highly correlated. The exploration of this fact for the automatic detection of mental disorders is yet to explore. Emotion regulation difficulties (ERD) characterize several mental disorders. Motivated by that, we investigated the use of ERD for the detection of two opted mental disorders in this study. For this, we have collected audio-video data of human subjects while conversing with a computer agent based on a specific questionnaire. Subsequently, a subject's responses are collected to obtain the ground truths of the audio-video data of that subject. The results indicate that the ERD can be used as an intermediate representation of audio-video data for detecting mental disorders. △ Less

Submitted 4 August, 2022; originally announced August 2022.

arXiv:2205.13851 [pdf, other]

Speaker-conditioning Single-channel Target Speaker Extraction using Conformer-based Architectures

Authors: Ragini Sinha, Marvin Tammen, Christian Rollwage, Simon Doclo

Abstract: Target speaker extraction aims at extracting the target speaker from a mixture of multiple speakers exploiting auxiliary information about the target speaker. In this paper, we consider a complete time-domain target speaker extraction system consisting of a speaker embedder network and a speaker separator network which are jointly trained in an end-to-end learning process. We propose two different… ▽ More Target speaker extraction aims at extracting the target speaker from a mixture of multiple speakers exploiting auxiliary information about the target speaker. In this paper, we consider a complete time-domain target speaker extraction system consisting of a speaker embedder network and a speaker separator network which are jointly trained in an end-to-end learning process. We propose two different architectures for the speaker separator network which are based on the convolutional augmented transformer (conformer). The first architecture uses stacks of conformer and external feed-forward blocks (Conformer-FFN), while the second architecture uses stacks of temporal convolutional network (TCN) and conformer blocks (TCN-Conformer). Experimental results for 2-speaker mixtures, 3-speaker mixtures, and noisy mixtures of 2-speakers show that among the proposed separator networks, the TCN-Conformer significantly improves the target speaker extraction performance compared to the Conformer-FFN and a TCN-based baseline system. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: submitted to IWAENC 2022

arXiv:2112.12046 [pdf]

doi 10.1109/SEAA51224.2020.00087

Graph-Theoretic Models of Resource Distribution for Cyber-Physical Systems of Disaster-Affected Regions

Authors: Kenneth Johnson, Samaneh Madanian, Roopak Sinha

Abstract: We propose a tool-supported framework to reason about requirements constraining resource distributions and devise strategies for routing essential services in a disaster-affected region. At the core of our approach is the Route Advisor for Disaster-Affected Regions (RADAR) framework that operates on high-level algebraic representations of the region, modelled as a cyber-physical system (cps) where… ▽ More We propose a tool-supported framework to reason about requirements constraining resource distributions and devise strategies for routing essential services in a disaster-affected region. At the core of our approach is the Route Advisor for Disaster-Affected Regions (RADAR) framework that operates on high-level algebraic representations of the region, modelled as a cyber-physical system (cps) where resource distribution is carried out over an infrastructure connecting physical geographical locations. The Satisfiable-Modulo Theories (SMT) and graph-theoretic algorithms used by the framework supports disaster management decision-making during response and preparedness phases. We demonstrate our approach on a case study in disaster management and describe scenarios to illustrate the usefulness of RADAR. △ Less

Submitted 7 September, 2021; originally announced December 2021.

Comments: Conference paper, 9 pages, 2 figures, 1 table

Journal ref: Proceedings of the 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA2020). Portoroz, Slovenia, IEEE Computer Society Press, pp.521-528

arXiv:2112.05322 [pdf]

doi 10.1007/s00521-018-3656-1

Dynamic hardware system for cascade SVM classification of melanoma

Authors: Shereen Afifi, Hamid GholamHosseini, Roopak Sinha

Abstract: Melanoma is the most dangerous form of skin cancer, which is responsible for the majority of skin cancer-related deaths. Early diagnosis of melanoma can significantly reduce mortality rates and treatment costs. Therefore, skin cancer specialists are using image-based diagnostic tools for detecting melanoma earlier. We aim to develop a handheld device featured with low cost and high performance to… ▽ More Melanoma is the most dangerous form of skin cancer, which is responsible for the majority of skin cancer-related deaths. Early diagnosis of melanoma can significantly reduce mortality rates and treatment costs. Therefore, skin cancer specialists are using image-based diagnostic tools for detecting melanoma earlier. We aim to develop a handheld device featured with low cost and high performance to enhance early detection of melanoma at the primary healthcare. But, develo** this device is very challenging due to the complicated computations required by the embedded diagnosis system. Thus, we aim to exploit the recent hardware technology in reconfigurable computing to achieve a high-performance embedded system at low cost. Support vector machine (SVM) is a common classifier that shows high accuracy for classifying melanoma within the diagnosis system and is considered as the most compute-intensive task in the system. In this paper, we propose a dynamic hardware system for implementing a cascade SVM classifier on FPGA for early melanoma detection. A multi-core architecture is proposed to implement a two-stage cascade classifier using two classifiers with accuracies of 98% and 73%. The hardware implementation results were optimized by using the dynamic partial reconfiguration technology, where very low resource utilization of 1% slices and power consumption of 1.5 W were achieved. Consequently, the implemented dynamic hardware system meets vital embedded system constraints of high performance and low cost, resource utilization, and power consumption, while achieving efficient classification with high accuracy. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: Journal paper, 9 pages, 4 figures, 4 tables

Journal ref: Neural Computing & Applications 32 (2020) pp.1777-1788

arXiv:2111.01914 [pdf, other]

Reduction of Subjective Listening Effort for TV Broadcast Signals with Recurrent Neural Networks

Authors: Nils L. Westhausen, Rainer Huber, Hannah Baumgartner, Ragini Sinha, Jan Rennies, Bernd T. Meyer

Abstract: Listening to the audio of TV broadcast signals can be challenging for hearing-impaired as well as normal-hearing listeners, especially when background sounds are prominent or too loud compared to the speech signal. This can result in a reduced satisfaction and increased listening effort of the listeners. Since the broadcast sound is usually premixed, we perform a subjective evaluation for quantify… ▽ More Listening to the audio of TV broadcast signals can be challenging for hearing-impaired as well as normal-hearing listeners, especially when background sounds are prominent or too loud compared to the speech signal. This can result in a reduced satisfaction and increased listening effort of the listeners. Since the broadcast sound is usually premixed, we perform a subjective evaluation for quantifying the potential of speech enhancement systems based on audio source separation and recurrent neural networks (RNN). Recently, RNNs have shown promising results in the context of sound source separation and real-time signal processing. In this paper, we separate the speech from the background signals and remix the separated sounds at a higher signal-to-noise ratio. This differs from classic speech enhancement, where usually only the extracted speech signal is exploited. The subjective evaluation with 20 normal-hearing subjects on real TV-broadcast material shows that our proposed enhancement system is able to reduce the listening effort by around 2 points on a 13-point listening effort rating scale and increases the perceived sound quality compared to the original mixture. △ Less

Submitted 2 November, 2021; originally announced November 2021.

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing. This version is the authors' version and may vary from the final publication in details

arXiv:2110.00797 [pdf, other]

Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition

Authors: Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna

Abstract: The automatic recognition of pathological speech, particularly from children with any articulatory impairment, is a challenging task due to various reasons. The lack of available domain specific data is one such obstacle that hinders its usage for different speech-based applications targeting pathological speakers. In line with the challenge, in this work, we investigate a few data augmentation te… ▽ More The automatic recognition of pathological speech, particularly from children with any articulatory impairment, is a challenging task due to various reasons. The lack of available domain specific data is one such obstacle that hinders its usage for different speech-based applications targeting pathological speakers. In line with the challenge, in this work, we investigate a few data augmentation techniques to simulate training data for improving the children speech recognition considering the case of cleft lip and palate (CLP) speech. The augmentation techniques explored in this study, include vocal tract length perturbation (VTLP), reverberation, speaking rate, pitch modification, and speech feature modification using cycle consistent adversarial networks (CycleGAN). Our study finds that the data augmentation methods significantly improve the CLP speech recognition performance, which is more evident when we used feature modification using CycleGAN, VTLP and reverberation based methods. More specifically, the results from this study show that our systems produce an improved phone error rate compared to the systems without data augmentation. △ Less

Submitted 2 October, 2021; originally announced October 2021.

arXiv:2110.00794 [pdf, other]

Processing Phoneme Specific Segments for Cleft Lip and Palate Speech Enhancement

Authors: Protima Nomo Sudro, Rohit Sinha, S. R. Mahadeva Prasanna

Abstract: The cleft lip and palate (CLP) speech intelligibility is distorted due to the deformation in their articulatory system. For addressing the same, a few previous works perform phoneme specific modification in CLP speech. In CLP speech, both the articulation error and the nasalization distorts the intelligibility of a word. Consequently, modification of a specific phoneme may not always yield in enha… ▽ More The cleft lip and palate (CLP) speech intelligibility is distorted due to the deformation in their articulatory system. For addressing the same, a few previous works perform phoneme specific modification in CLP speech. In CLP speech, both the articulation error and the nasalization distorts the intelligibility of a word. Consequently, modification of a specific phoneme may not always yield in enhanced entire word-level intelligibility. For such cases, it is important to identify and isolate the phoneme specific error based on the knowledge of acoustic events. Accordingly, the phoneme specific error modification algorithms can be exploited for transforming the specified errors and enhance the word-level intelligibility. Motivated by that, in this work, we combine some of salient phoneme specific enhancement approaches and demonstrate their effectiveness in improving the word-level intelligibility of CLP speech. The enhanced speech samples are evaluated using subjective and objective evaluation metrics. △ Less

Submitted 2 October, 2021; originally announced October 2021.

arXiv:2109.14840 [pdf]

doi 10.1016/j.micpro.2018.12.005

A system on chip for melanoma detection using FPGA-based SVM classifier

Authors: Shereen Afifi, Hamid GholamHosseini, Roopak Sinha

Abstract: Support Vector Machine (SVM) is a robust machine learning model that shows high accuracy with different classification problems, and is widely used for various embedded applications. However , implementation of embedded SVM classifiers is challenging, due to the inherent complicated computations required. This motivates implementing the SVM on hardware platforms for achieving high performance comp… ▽ More Support Vector Machine (SVM) is a robust machine learning model that shows high accuracy with different classification problems, and is widely used for various embedded applications. However , implementation of embedded SVM classifiers is challenging, due to the inherent complicated computations required. This motivates implementing the SVM on hardware platforms for achieving high performance computing at low cost and power consumption. Melanoma is the most aggressive form of skin cancer that increases the mortality rate. We aim to develop an optimized embedded SVM classifier dedicated for a low-cost handheld device for early detection of melanoma at the primary healthcare. In this paper, we propose a hardware/software co-design for implementing the SVM classifier onto FPGA to realize melanoma detection on a chip. The implemented SVM on a recent hybrid FPGA (Zynq) platform utilizing the modern UltraFast High-Level Synthesis design methodology achieves efficient melanoma classification on chip. The hardware implementation results demonstrate classification accuracy of 97.9%, and a significant hardware acceleration rate of 21 with only 3% resources utilization and 1.69W for power consumption. These results show that the implemented system on chip meets crucial embedded system constraints of high performance and low resources utilization, power consumption, and cost, while achieving efficient classification with high classification accuracy. △ Less

Submitted 30 September, 2021; originally announced September 2021.

Comments: Journal paper, 13 pages, 3 figures, 9 tables

Journal ref: A system on chip for melanoma detection using FPGA-based SVM classifier, Microprocessors and Microsystems 65(2019) pp.57-68

arXiv:2108.11957 [pdf]

doi 10.1109/EMBC.2017.8036814

SVM Classifier on Chip for Melanoma Detection

Authors: Shereen Afifi, Hamid GholamHosseini, Roopak Sinha

Abstract: Support Vector Machine (SVM) is a common classifier used for efficient classification with high accuracy. SVM shows high accuracy for classifying melanoma (skin cancer) clinical images within computer-aided diagnosis systems used by skin cancer specialists to detect melanoma early and save lives. We aim to develop a medical low-cost handheld device that runs a real-time embedded SVM- based diagnos… ▽ More Support Vector Machine (SVM) is a common classifier used for efficient classification with high accuracy. SVM shows high accuracy for classifying melanoma (skin cancer) clinical images within computer-aided diagnosis systems used by skin cancer specialists to detect melanoma early and save lives. We aim to develop a medical low-cost handheld device that runs a real-time embedded SVM- based diagnosis system for use in primary care for early detection of melanoma. In this paper, an optimized SVM classifier is implemented onto a recent FPGA platform using the latest design methodology to be embedded into the proposed device for realizing online efficient melanoma detection on a single system on chip/device. The hardware implementation results demonstrate a high classification accuracy of 97.9% and a significant acceleration factor of 26 from equivalent software implementation on an embedded processor, with 34% of resources utilization and 2 watts for power consumption. Consequently, the implemented system meets crucial embedded systems constraints of high performance and low cost, resources utilization and power consumption, while achieving high classification accuracy. △ Less

Submitted 26 August, 2021; originally announced August 2021.

Comments: Conference paper, 5 pages, 4 figures, 1 tables

Journal ref: Proceedings of the 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC2017). Jeju, Korea (South), IEEE Computer Society Press, pp.270-274

arXiv:2108.07933 [pdf]

doi 10.1109/INDIN.2018.8471951.

Assessing the Integration of Software Agents and Industrial Automation Systems with ISO/IEC 25010

Authors: Stamatis Karnouskos, Roopak Sinha, Paulo Leitão, Luis Ribeiro, Thomas. I. Strasser

Abstract: Agent-technologies have been used for higher-level decision making in addition to carrying out lower-level automation and control functions in industrial systems. Recent research has identified a number of architectural patterns for the use of agents in industrial automation systems but these practices vary in several ways, including how closely agents are coupled with physical systems and their c… ▽ More Agent-technologies have been used for higher-level decision making in addition to carrying out lower-level automation and control functions in industrial systems. Recent research has identified a number of architectural patterns for the use of agents in industrial automation systems but these practices vary in several ways, including how closely agents are coupled with physical systems and their control functions. Such practices may play a pivotal role in the Cyber-Physical System integration and interaction. Hence, there is a clear need for a common set of criteria for assessing available practices and identifying a best-fit practice for a given industrial use case. Unfortunately, no such common criteria exist currently. This work proposes an assessment criteria approach as well as a methodology to enable the use case based selection of a best practice for integrating agents and industrial systems. The software product quality model proposed by the ISO/IEC 25010 family of standards is used as starting point and is put in the industrial automation context. Subsequently, the proposed methodology is applied, and a survey of experts in the domain is carried out, in order to reveal some insights on the key characteristics of the subject matter. △ Less

Submitted 17 August, 2021; originally announced August 2021.

Comments: Conference paper, 7 pages, 2 figures, 1 table

Journal ref: Proceedings of the 16th International Conference on Industrial Informatics (INDIN2018). Porto, Portugal, IEEE Computer Society Press, pp.61-66

arXiv:2107.08232 [pdf]

doi 10.1109/IECON43393.2020.9254313.

Dynamic Prioritization of Emergency Vehicles For Self-Organizing Traffic using VTL+EV *

Authors: Subash Humagain, Roopak Sinha

Abstract: Cooperative vehicular technology in recent times has aided in realizing some state-of-art technologies like autonomous driving. Effective and efficient prioritization of emergency vehicles (EVs) using cooperative vehicular technology can undoubtedly aid in saving property and lives. Contemporary EV prioritization, called preemption, is highly dependent on existing traffic infrastructure. Accessing… ▽ More Cooperative vehicular technology in recent times has aided in realizing some state-of-art technologies like autonomous driving. Effective and efficient prioritization of emergency vehicles (EVs) using cooperative vehicular technology can undoubtedly aid in saving property and lives. Contemporary EV prioritization, called preemption, is highly dependent on existing traffic infrastructure. Accessing crucial decision parameters for preemption like speed, position and acceleration data in real-time is almost impossible in current systems. The connected vehicle can provide such data in real-time, which makes EV preemption more responsive and effective. Also, autonomous vehicles can help in optimizing the timing in traffic phases and minimize human-related loss like higher headway times and inconsistent inter-vehicle spacing when following each other. In this paper, we introduce self-coordinating a decentralized traffic control system termed as Virtual Traffic Light plus for Emergency Vehicle (VTL+EV) to prioritize EVs in an intersection. The proposed system can expedite EVs movement through intersections and impose minimal waiting time for ordinary vehicles. The VTL+EV algorithm also can improve overall throughput making an intersection more efficient. △ Less

Submitted 17 July, 2021; originally announced July 2021.

Comments: Conference paper, 6 pages, 8 figures

Journal ref: Proceedings of the 46th Annual Conference of the IEEE Industrial Electronics Society (IECON2020). Singapore, IEEE Computer Society Press, pp.789-794

arXiv:2107.08167 [pdf]

doi 10.1109/INDIN41052.2019.8972218.

Routing Autonomous Emergency Vehicles in Smart Cities Using Real Time Systems Analogy: A Conceptual Model

Authors: Subash Humagain, Roopak Sinha

Abstract: Emergency service vehicles like ambulance, fire, police etc. should respond to emergencies on time. Existing barriers like increased congestion, multiple signalized intersections, queued vehicles, traffic phase timing etc. can prevent emergency vehicles (EVs) achieving desired response times. Existing solutions to route EVs have not been successful because they do not use dynamic traffic parameter… ▽ More Emergency service vehicles like ambulance, fire, police etc. should respond to emergencies on time. Existing barriers like increased congestion, multiple signalized intersections, queued vehicles, traffic phase timing etc. can prevent emergency vehicles (EVs) achieving desired response times. Existing solutions to route EVs have not been successful because they do not use dynamic traffic parameters. Real time information on increased congestion, halts on road, pedestrian flow, queued vehicles, real and adaptive speed, can be used to properly actuate pre-emption and minimise the impact that EV movement can have on other traffic. Smart cities provide the necessary infrastructure to enable two critical factors in EV routing: real-time traffic data and connectivity. In addition, using autonomous vehicles (AVs) in place of normal emergency service vehicles can have further advantages in terms of safety and adaptability in smart city environments. AVs feature several sensors and connectivity that can help them make real-time decisions. We propose a novel idea of using autonomous emergency vehicles (AEVs) that can meet the critical response time and drive through a complex road network in smart cities efficiently and safely. This is achieved by considering traffic network analogous to real-time systems (RTS) where we use mixed-criticality real-time system (MCRTS) task scheduling to schedule AEVs for meeting response time. △ Less

Submitted 16 July, 2021; originally announced July 2021.

Comments: Conference paper, 6 pages, 3 figures, 1 table

Journal ref: Proceedings of the 17th International Conference on Industrial Informatics (INDIN2019). Helsinki, Finland, IEEE Computer Society Press, pp.1097-1102

arXiv:2104.08261 [pdf, other]

Adaptive Robust Model Predictive Control with Matched and Unmatched Uncertainty

Authors: Rohan Sinha, James Harrison, Spencer M. Richards, Marco Pavone

Abstract: We propose a learning-based robust predictive control algorithm that compensates for significant uncertainty in the dynamics for a class of discrete-time systems that are nominally linear with an additive nonlinear component. Such systems commonly model the nonlinear effects of an unknown environment on a nominal system. We optimize over a class of nonlinear feedback policies inspired by certainty… ▽ More We propose a learning-based robust predictive control algorithm that compensates for significant uncertainty in the dynamics for a class of discrete-time systems that are nominally linear with an additive nonlinear component. Such systems commonly model the nonlinear effects of an unknown environment on a nominal system. We optimize over a class of nonlinear feedback policies inspired by certainty equivalent "estimate-and-cancel" control laws pioneered in classical adaptive control to achieve significant performance improvements in the presence of uncertainties of large magnitude, a setting in which existing learning-based predictive control algorithms often struggle to guarantee safety. In contrast to previous work in robust adaptive MPC, our approach allows us to take advantage of structure (i.e., the numerical predictions) in the a priori unknown dynamics learned online through function approximation. Our approach also extends typical nonlinear adaptive control methods to systems with state and input constraints even when we cannot directly cancel the additive uncertain function from the dynamics. Moreover, we apply contemporary statistical estimation techniques to certify the system's safety through persistent constraint satisfaction with high probability. Finally, we show in simulation that our method can accommodate more significant unknown dynamics terms than existing methods. △ Less

Submitted 13 October, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

Comments: Major revision

arXiv:2104.04234 [pdf, other]

Speaker-conditioned Target Speaker Extraction based on Customized LSTM Cells

Authors: Ragini Sinha, Marvin Tammen, Christian Rollwage, Simon Doclo

Abstract: Speaker-conditioned target speaker extraction systems rely on auxiliary information about the target speaker to extract the target speaker signal from a mixture of multiple speakers. Typically, a deep neural network is applied to isolate the relevant target speaker characteristics. In this paper, we focus on a single-channel target speaker extraction system based on a CNN-LSTM separator network an… ▽ More Speaker-conditioned target speaker extraction systems rely on auxiliary information about the target speaker to extract the target speaker signal from a mixture of multiple speakers. Typically, a deep neural network is applied to isolate the relevant target speaker characteristics. In this paper, we focus on a single-channel target speaker extraction system based on a CNN-LSTM separator network and a speaker embedder network requiring reference speech of the target speaker. In the LSTM layer of the separator network, we propose to customize the LSTM cells in order to only remember the specific voice patterns corresponding to the target speaker by modifying the information processing in the forget gate. Experimental results for two-speaker mixtures using the Librispeech dataset show that this customization significantly improves the target speaker extraction performance compared to using standard LSTM cells. △ Less

Submitted 9 April, 2021; originally announced April 2021.

arXiv:2102.00270 [pdf, other]

Enhancing the Intelligibility of Cleft Lip and Palate Speech using Cycle-consistent Adversarial Networks

Authors: Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S R Mahadeva Prasanna

Abstract: Cleft lip and palate (CLP) refer to a congenital craniofacial condition that causes various speech-related disorders. As a result of structural and functional deformities, the affected subjects' speech intelligibility is significantly degraded, limiting the accessibility and usability of speech-controlled devices. Towards addressing this problem, it is desirable to improve the CLP speech intelligi… ▽ More Cleft lip and palate (CLP) refer to a congenital craniofacial condition that causes various speech-related disorders. As a result of structural and functional deformities, the affected subjects' speech intelligibility is significantly degraded, limiting the accessibility and usability of speech-controlled devices. Towards addressing this problem, it is desirable to improve the CLP speech intelligibility. Moreover, it would be useful during speech therapy. In this study, the cycle-consistent adversarial network (CycleGAN) method is exploited for improving CLP speech intelligibility. The model is trained on native Kannada-speaking childrens' speech data. The effectiveness of the proposed approach is also measured using automatic speech recognition performance. Further, subjective evaluation is performed, and those results also confirm the intelligibility improvement in the enhanced speech over the original. △ Less

Submitted 30 January, 2021; originally announced February 2021.

Comments: 8 pages, 4 figures, IEEE spoken language and technology workshop

arXiv:1907.08293 [pdf, other]

Investigating Target Set Reduction for End-to-End Speech Recognition of Hindi-English Code-Switching Data

Authors: Kunal Dhawan, Ganji Sreeram, Kumar Priyadarshi, Rohit Sinha

Abstract: End-to-end (E2E) systems are fast replacing the conventional systems in the domain of automatic speech recognition. As the target labels are learned directly from speech data, the E2E systems need a bigger corpus for effective training. In the context of code-switching task, the E2E systems face two challenges: (i) the expansion of the target set due to multiple languages involved, and (ii) the la… ▽ More End-to-end (E2E) systems are fast replacing the conventional systems in the domain of automatic speech recognition. As the target labels are learned directly from speech data, the E2E systems need a bigger corpus for effective training. In the context of code-switching task, the E2E systems face two challenges: (i) the expansion of the target set due to multiple languages involved, and (ii) the lack of availability of sufficiently large domain-specific corpus. Towards addressing those challenges, we propose an approach for reducing the number of target labels for reliable training of the E2E systems on limited data. The efficacy of the proposed approach has been demonstrated on two prominent architectures, namely CTC-based and attention-based E2E networks. The experimental validations are performed on a recently created Hindi-English code-switching corpus. For contrast purpose, the results for the full target set based E2E system and a hybrid DNN-HMM system are also reported. △ Less

Submitted 15 July, 2019; originally announced July 2019.

arXiv:1907.06342 [pdf, other]

Joint Language Identification of Code-Switching Speech using Attention based E2E Network

Authors: Sreeram Ganji, Kunal Dhawan, Kumar Priyadarshi, Rohit Sinha

Abstract: Language identification (LID) has relevance in many speech processing applications. For the automatic recognition of code-switching speech, the conventional approaches often employ an LID system for detecting the languages present within an utterance. In the existing works, the LID on code-switching speech involves modelling of the underlying languages separately. In this work, we propose a joint… ▽ More Language identification (LID) has relevance in many speech processing applications. For the automatic recognition of code-switching speech, the conventional approaches often employ an LID system for detecting the languages present within an utterance. In the existing works, the LID on code-switching speech involves modelling of the underlying languages separately. In this work, we propose a joint modelling based LID system for code-switching speech. To achieve the same, an attention-based end-to-end (E2E) network has been explored. For the development and evaluation of the proposed approach, a recently created Hindi-English code-switching corpus has been used. For the contrast purpose, an LID system employing the connectionist temporal classification-based E2E network is also developed. On comparing both the LID systems, the attention based approach is noted to result in better LID accuracy. The effective location of code-switching boundaries within the utterance by the proposed approach has been demonstrated by plotting the attention weights of E2E network. △ Less

Submitted 15 July, 2019; originally announced July 2019.

arXiv:1806.06579 [pdf, other]

doi 10.1016/j.conengprac.2018.09.026

Resetting Disturbance Observers with application in Compensation of bounded nonlinearities like Hysteresis in Piezo-Actuators

Authors: Niranjan Saikumar, Rahul Kumar Sinha, S. Hassan HosseinNia

Abstract: This paper presents a novel nonlinear (reset) disturbance observer for dynamic compensation of bounded nonlinearities like hysteresis in piezoelectric actuators. Proposed Resetting Disturbance Observer (RDOB) utilizes a novel Constant-gain Lead-phase (CgLp) element based on the concept of reset control. The fundamental limitations of linear DOB which results in contradictory requirements and in a… ▽ More This paper presents a novel nonlinear (reset) disturbance observer for dynamic compensation of bounded nonlinearities like hysteresis in piezoelectric actuators. Proposed Resetting Disturbance Observer (RDOB) utilizes a novel Constant-gain Lead-phase (CgLp) element based on the concept of reset control. The fundamental limitations of linear DOB which results in contradictory requirements and in a dependent design between DOB and feedback controller are analysed. Two different configurations of RDOB which attempt to alleviate these problems from different perspectives are presented and an example plant is used to highlight the improvement. Stability criteria are presented for both configurations. Performance improvement seen with both RDOB configurations compared to linear DOB is also verified on a practical piezoelectric setup for hysteresis compensation and results analysed. △ Less

Submitted 18 June, 2018; originally announced June 2018.

arXiv:1805.12406 [pdf, other]

doi 10.1109/TMECH.2019.2909082

'Constant in gain Lead in phase' element - Application in precision motion control

Authors: Niranjan Saikumar, Rahul Kumar Sinha, S. Hassan HosseinNia

Abstract: This work presents a novel 'Constant in gain Lead in phase' (CgLp) element using nonlinear reset technique. PID is the industrial workhorse even to this day in high-tech precision positioning applications. However, Bode's gain phase relationship and waterbed effect fundamentally limit performance of PID and other linear controllers. This paper presents CgLp as a controlled nonlinear element which… ▽ More This work presents a novel 'Constant in gain Lead in phase' (CgLp) element using nonlinear reset technique. PID is the industrial workhorse even to this day in high-tech precision positioning applications. However, Bode's gain phase relationship and waterbed effect fundamentally limit performance of PID and other linear controllers. This paper presents CgLp as a controlled nonlinear element which can be introduced within the framework of PID allowing for wide applicability and overcoming linear control limitations. Design of CgLp with generalized first order reset element (GFORE) and generalized second order reset element (GSORE) (introduced in this work) is presented using describing function analysis. A more detailed analysis of reset elements in frequency domain compared to existing literature is first carried out for this purpose. Finally, CgLp is integrated with PID and tested on one of the DOFs of a planar precision positioning stage. Performance improvement is shown in terms of tracking, steady-state precision and bandwidth. △ Less

Submitted 28 September, 2018; v1 submitted 31 May, 2018; originally announced May 2018.

Showing 1–24 of 24 results for author: Sinha, R