Search | arXiv e-print repository

Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization

Authors: Xiang Li, Vivek Govindan, Rohit Paturi, Sundararajan Srinivasan

Abstract: End-to-end neural diarization (EEND) models offer significant improvements over traditional embedding-based Speaker Diarization (SD) approaches but falls short on generalizing to long-form audio with large number of speakers. EEND-vector-clustering method mitigates this by combining local EEND with global clustering of speaker embeddings from local windows, but this requires an additional speaker… ▽ More End-to-end neural diarization (EEND) models offer significant improvements over traditional embedding-based Speaker Diarization (SD) approaches but falls short on generalizing to long-form audio with large number of speakers. EEND-vector-clustering method mitigates this by combining local EEND with global clustering of speaker embeddings from local windows, but this requires an additional speaker embedding framework alongside the EEND module. In this paper, we propose a novel framework applying EEND both locally and globally for long-form audio without separate speaker embeddings. This approach achieves significant relative DER reduction of 13% and 10% over the conventional 1-pass EEND on Callhome American English and RT03-CTS datasets respectively and marginal improvements over EEND-vector-clustering without the need for additional speaker embeddings. Furthermore, we discuss the computational complexity of our proposed framework and explore strategies for reducing processing times. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Accepted at INTERSPEECH 2024

arXiv:2406.17266 [pdf, other]

AG-LSEC: Audio Grounded Lexical Speaker Error Correction

Authors: Rohit Paturi, Xiang Li, Sundararajan Srinivasan

Abstract: Speaker Diarization (SD) systems are typically audio-based and operate independently of the ASR system in traditional speech transcription pipelines and can have speaker errors due to SD and/or ASR reconciliation, especially around speaker turns and regions of speech overlap. To reduce these errors, a Lexical Speaker Error Correction (LSEC), in which an external language model provides lexical inf… ▽ More Speaker Diarization (SD) systems are typically audio-based and operate independently of the ASR system in traditional speech transcription pipelines and can have speaker errors due to SD and/or ASR reconciliation, especially around speaker turns and regions of speech overlap. To reduce these errors, a Lexical Speaker Error Correction (LSEC), in which an external language model provides lexical information to correct the speaker errors, was recently proposed. Though the approach achieves good Word Diarization error rate (WDER) improvements, it does not use any additional acoustic information and is prone to miscorrections. In this paper, we propose to enhance and acoustically ground the LSEC system with speaker scores directly derived from the existing SD pipeline. This approach achieves significant relative WDER reductions in the range of 25-40% over the audio-based SD, ASR system and beats the LSEC system by 15-25% relative on RT03-CTS, Callhome American English and Fisher datasets. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Accepted at INTERSPEECH 2024

arXiv:2405.09023 [pdf, other]

The Rise of Recommerce: Ownership and Sustainability with Overlap** Generations

Authors: Rubing Li, Arun Sundararajan

Abstract: The emergence of the branded recommerce channel - digitally enabled and branded marketplaces that facilitate purchasing pre-owned items directly from a manufacturer's e-commerce site - leads to new variants of classic IS and economic questions relating to secondary markets. Such branded recommerce is increasingly platform-enabled, creating opportunities for greater sustainability and stronger bran… ▽ More The emergence of the branded recommerce channel - digitally enabled and branded marketplaces that facilitate purchasing pre-owned items directly from a manufacturer's e-commerce site - leads to new variants of classic IS and economic questions relating to secondary markets. Such branded recommerce is increasingly platform-enabled, creating opportunities for greater sustainability and stronger brand experience control but posing a greater risk of cannibalization of the sales of new items. We model the effects that the sales of pre-owned items have on market segmentation and product durability choices for a monopolist facing heterogeneous customers, contrasting outcomes when the trade of pre-owned goods takes place through a third-party marketplace with outcomes under branded recommerce. We show that the direct revenue benefits of branded recommerce are not their primary source of value to the monopolist, and rather, there are three indirect effects that alter profits and sustainability. Product durability increases, a seller finds it optimal to forgo marketplace fees altogether, and there are greater seller incentives to lower the quality uncertainty associated with pre-owned items. We establish these results for a simple two-period model as well as develo** a new infinite horizon model with overlap** generations. Our paper sheds new insight into this emerging digital channel phenomenon, underscoring the importance of recommerce platforms in aligning seller profits with sustainability goals. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.08317 [pdf, other]

SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

Authors: Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

Abstract: Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we… ▽ More Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we design algorithms that can generate adversarial examples to jailbreak SLMs in both white-box and black-box attack settings without human involvement. Additionally, we propose countermeasures to thwart such jailbreaking attacks. Our models, trained on dialog data with speech instructions, achieve state-of-the-art performance on spoken question-answering task, scoring over 80% on both safety and helpfulness metrics. Despite safety guardrails, experiments on jailbreaking demonstrate the vulnerability of SLMs to adversarial perturbations and transfer attacks, with average attack success rates of 90% and 10% respectively when evaluated on a dataset of carefully designed harmful questions spanning 12 different toxic categories. However, we demonstrate that our proposed countermeasures reduce the attack success significantly. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 9+6 pages, Submitted to ACL 2024

arXiv:2405.08295 [pdf, other]

SpeechVerse: A Large-scale Generalizable Audio Language Model

Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore develop SpeechVerse, a robust multi-task training and curriculum learning framework that combines pre-trained speech and text foundation models via a small set of learnable parameters, while kee** the pre-trained models frozen during training. The models are instruction finetuned using continuous latent representations extracted from the speech foundation model to achieve optimal zero-shot performance on a diverse range of speech processing tasks using natural language instructions. We perform extensive benchmarking that includes comparing our model performance against traditional baselines across several datasets and tasks. Furthermore, we evaluate the model's capability for generalized instruction following by testing on out-of-domain datasets, novel prompts, and unseen tasks. Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks. △ Less

Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: Single Column, 13 page

arXiv:2404.19534 [pdf, other]

MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu **, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huan**g Yue, **gyu Yang , et al. (38 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2404.14353 [pdf]

Electroporation-mediated Metformin for effective anticancer treatment of triple-negative breast cancer cells

Authors: Praveen Sahu, Ignacio G. Camarillo, Pragatheiswar Giri, Raji Sundararajan

Abstract: In this research, we investigated the efficacy of Metformin, the most commonly administered type-2 diabetes drug for triple negative breast cancer (TNBC) treatment, due to its various anticancer properties. It is a plant-based bio-compound, synthesized as a novel biguanide, called dimethyl biguanide or metformin. One of the ways it operates is by hindering electron transport chain-complex I, in mi… ▽ More In this research, we investigated the efficacy of Metformin, the most commonly administered type-2 diabetes drug for triple negative breast cancer (TNBC) treatment, due to its various anticancer properties. It is a plant-based bio-compound, synthesized as a novel biguanide, called dimethyl biguanide or metformin. One of the ways it operates is by hindering electron transport chain-complex I, in mitochondria, which causes a drop-in energy (ATP) generation. This eventually builds energetic stress and a decline in energy. Therefore, the natural cellular processes and proliferating tumor cells are obstructed. Here, we used electroporation, where, the MDA-MB-231, human TNBC cells were subjected to high intensity, short-duration electrical pulses (EP) in the presence of Metformin. The cell viability results indicate lower cell viability of 43.45% as compared to 85.20% with drug alone at 5mM concentration. This indicates that Metformin, the most common diabetes drug could also be explored for cancer treatment. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.06977 [pdf]

Accurate Tennis Court Line Detection on Amateur Recorded Matches

Authors: Sameer Agrawal, Ragoth Sundararajan, Vishak Sagar

Abstract: Typically, tennis court line detection is done by running Hough-Line-Detection to find straight lines in the image, and then computing a transformation matrix from the detected lines to create the final court structure. We propose numerous improvements and enhancements to this algorithm, including using pretrained State-of-the-Art shadow-removal and object-detection ML models to make our line-dete… ▽ More Typically, tennis court line detection is done by running Hough-Line-Detection to find straight lines in the image, and then computing a transformation matrix from the detected lines to create the final court structure. We propose numerous improvements and enhancements to this algorithm, including using pretrained State-of-the-Art shadow-removal and object-detection ML models to make our line-detection more robust. Compared to the original algorithm, our method can accurately detect lines on amateur, dirty courts. When combined with a robust ball-tracking system, our method will enable accurate, automatic refereeing for amateur and professional tennis matches alike. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Accepted to 5th International conference on Image, Video Processing and Artificial Intelligence

ACM Class: I.4.6

arXiv:2404.04103 [pdf, other]

Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo

Authors: Barkavi Sundararajan, Somayajulu Sripada, Ehud Reiter

Abstract: Neural Table-to-Text models tend to hallucinate, producing texts that contain factual errors. We investigate whether such errors in the output can be traced back to problems with the input. We manually annotated 1,837 texts generated by multiple models in the politics domain of the ToTTo dataset. We identify the input problems that are responsible for many output errors and show that fixing these… ▽ More Neural Table-to-Text models tend to hallucinate, producing texts that contain factual errors. We investigate whether such errors in the output can be traced back to problems with the input. We manually annotated 1,837 texts generated by multiple models in the politics domain of the ToTTo dataset. We identify the input problems that are responsible for many output errors and show that fixing these inputs reduces factual errors by between 52% and 76% (depending on the model). In addition, we observe that models struggle in processing tabular inputs that are structured in a non-standard way, particularly when the input lacks distinct row and column values or when the column headers are not correctly mapped to corresponding values. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: Added link to human evaluation guidelines and error annotations

arXiv:2403.09148 [pdf, other]

Evaluating LLMs for Gender Disparities in Notable Persons

Authors: Lauren Rhue, Sofie Goethals, Arun Sundararajan

Abstract: This study examines the use of Large Language Models (LLMs) for retrieving factual information, addressing concerns over their propensity to produce factually incorrect "hallucinated" responses or to altogether decline to even answer prompt at all. Specifically, it investigates the presence of gender-based biases in LLMs' responses to factual inquiries. This paper takes a multi-pronged approach to… ▽ More This study examines the use of Large Language Models (LLMs) for retrieving factual information, addressing concerns over their propensity to produce factually incorrect "hallucinated" responses or to altogether decline to even answer prompt at all. Specifically, it investigates the presence of gender-based biases in LLMs' responses to factual inquiries. This paper takes a multi-pronged approach to evaluating GPT models by evaluating fairness across multiple dimensions of recall, hallucinations and declinations. Our findings reveal discernible gender disparities in the responses generated by GPT-3.5. While advancements in GPT-4 have led to improvements in performance, they have not fully eradicated these gender disparities, notably in instances where responses are declined. The study further explores the origins of these disparities by examining the influence of gender associations in prompts and the homogeneity in the responses. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.00697 [pdf, other]

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

Authors: Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico

Abstract: Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers. In this paper, we tackle single-channel multi-speaker conversational ST with an end-to-end and multi-task training model, named Speaker-Turn Aware Conversational Speech Translation, that combin… ▽ More Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers. In this paper, we tackle single-channel multi-speaker conversational ST with an end-to-end and multi-task training model, named Speaker-Turn Aware Conversational Speech Translation, that combines automatic speech recognition, speech translation and speaker turn detection using special tokens in a serialized labeling format. We run experiments on the Fisher-CALLHOME corpus, which we adapted by merging the two single-speaker channels into one multi-speaker channel, thus representing the more realistic and challenging scenario with multi-speaker turns and cross-talk. Experimental results across single- and multi-speaker conditions and against conventional ST systems, show that our model outperforms the reference systems on the multi-speaker condition, while attaining comparable performance on the single-speaker condition. We release scripts for data processing and model training. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: Accepted at EMNLP 2023. Code: https://github.com/amazon-science/stac-speech-translation

arXiv:2310.01892 [pdf, ps, other]

FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations

Authors: Chanakya Ekbote, A**kya Pankaj Deshpande, Arun Iyer, Ramakrishna Bairi, Sundararajan Sellamanickam

Abstract: Unsupervised node representations learnt using contrastive learning-based methods have shown good performance on downstream tasks. However, these methods rely on augmentations that mimic low-pass filters, limiting their performance on tasks requiring different eigen-spectrum parts. This paper presents a simple filter-based augmentation method to capture different parts of the eigen-spectrum. We sh… ▽ More Unsupervised node representations learnt using contrastive learning-based methods have shown good performance on downstream tasks. However, these methods rely on augmentations that mimic low-pass filters, limiting their performance on tasks requiring different eigen-spectrum parts. This paper presents a simple filter-based augmentation method to capture different parts of the eigen-spectrum. We show significant improvements using these augmentations. Further, we show that sharing the same weights across these different filter augmentations is possible, reducing the computational load. In addition, previous works have shown that good performance on downstream tasks requires high dimensional representations. Working with high dimensions increases the computations, especially when multiple augmentations are involved. We mitigate this problem and recover good performance through lower dimensional embeddings using simple random Fourier feature projections. Our method, FiGURe achieves an average gain of up to 4.4%, compared to the state-of-the-art unsupervised models, across all datasets in consideration, both homophilic and heterophilic. Our code can be found at: https://github.com/microsoft/figure. △ Less

Submitted 4 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.11516 [pdf, other]

Private Matrix Factorization with Public Item Features

Authors: Mihaela Curmei, Walid Krichene, Li Zhang, Mukund Sundararajan

Abstract: We consider the problem of training private recommendation models with access to public item features. Training with Differential Privacy (DP) offers strong privacy guarantees, at the expense of loss in recommendation quality. We show that incorporating public item features during training can help mitigate this loss in quality. We propose a general approach based on collective matrix factorizatio… ▽ More We consider the problem of training private recommendation models with access to public item features. Training with Differential Privacy (DP) offers strong privacy guarantees, at the expense of loss in recommendation quality. We show that incorporating public item features during training can help mitigate this loss in quality. We propose a general approach based on collective matrix factorization (CMF), that works by simultaneously factorizing two matrices: the user feedback matrix (representing sensitive data) and an item feature matrix that encodes publicly available (non-sensitive) item information. The method is conceptually simple, easy to tune, and highly scalable. It can be applied to different types of public item data, including: (1) categorical item features; (2) item-item similarities learned from public sources; and (3) publicly available user feedback. Furthermore, these data modalities can be collectively utilized to fully leverage public data. Evaluating our method on a standard DP recommendation benchmark, we find that using public item features significantly narrows the quality gap between private models and their non-private counterparts. As privacy constraints become more stringent, models rely more heavily on public side features for recommendation. This results in a smooth transition from collaborative filtering to item-based contextual recommendations. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: Presented at ACM Recsys 2023

arXiv:2309.10881 [pdf]

doi 10.33218/001c.92224

Nanorobotics in Medicine: A Systematic Review of Advances, Challenges, and Future Prospects

Authors: Shishir Rajendran, Prathic Sundararajan, Ashi Awasthi, Suraj Rajendran

Abstract: Nanorobotics offers an emerging frontier in biomedicine, holding the potential to revolutionize diagnostic and therapeutic applications through its unique capabilities in manipulating biological systems at the nanoscale. Following PRISMA guidelines, a comprehensive literature search was conducted using IEEE Xplore and PubMed databases, resulting in the identification and analysis of a total of 414… ▽ More Nanorobotics offers an emerging frontier in biomedicine, holding the potential to revolutionize diagnostic and therapeutic applications through its unique capabilities in manipulating biological systems at the nanoscale. Following PRISMA guidelines, a comprehensive literature search was conducted using IEEE Xplore and PubMed databases, resulting in the identification and analysis of a total of 414 papers. The studies were filtered to include only those that addressed both nanorobotics and direct medical applications. Our analysis traces the technology's evolution, highlighting its growing prominence in medicine as evidenced by the increasing number of publications over time. Applications ranged from targeted drug delivery and single-cell manipulation to minimally invasive surgery and biosensing. Despite the promise, limitations such as biocompatibility, precise control, and ethical concerns were also identified. This review aims to offer a thorough overview of the state of nanorobotics in medicine, drawing attention to current challenges and opportunities, and providing directions for future research in this rapidly advancing field. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2308.13773 [pdf, other]

Solving the insecurity problem for assertions

Authors: R Ramanujam, Vaishnavi Sundararajan, S P Suresh

Abstract: In the symbolic verification of cryptographic protocols, a central problem is deciding whether a protocol admits an execution which leaks a designated secret to the malicious intruder. Rusinowitch & Turuani (2003) show that, when considering finitely many sessions, this ``insecurity problem'' is NP-complete. Central to their proof strategy is the observation that any execution of a protocol can be… ▽ More In the symbolic verification of cryptographic protocols, a central problem is deciding whether a protocol admits an execution which leaks a designated secret to the malicious intruder. Rusinowitch & Turuani (2003) show that, when considering finitely many sessions, this ``insecurity problem'' is NP-complete. Central to their proof strategy is the observation that any execution of a protocol can be simulated by one where the intruder only communicates terms of bounded size. However, when we consider models where, in addition to terms, one can also communicate logical statements about terms, the analysis of the insecurity problem becomes tricky when both these inference systems are considered together. In this paper we consider the insecurity problem for protocols with logical statements that include {\em equality on terms} and {\em existential quantification}. Witnesses for existential quantifiers may be unbounded, and obtaining small witness terms while maintaining equality proofs complicates the analysis considerably. We extend techniques from Rusinowitch & Turuani (2003) to show that this problem is also in NP. △ Less

Submitted 26 January, 2024; v1 submitted 26 August, 2023; originally announced August 2023.

arXiv:2308.10882 [pdf, other]

Giraffe: Adventures in Expanding Context Lengths in LLMs

Authors: Arka Pal, Deep Karkhanis, Manley Roberts, Samuel Dooley, Arvind Sundararajan, Siddartha Naidu

Abstract: Modern large language models (LLMs) that rely on attention mechanisms are typically trained with fixed context lengths which enforce upper limits on the length of input sequences that they can handle at evaluation time. To use these models on sequences longer than the train-time context length, one might employ techniques from the growing family of context length extrapolation methods -- most of w… ▽ More Modern large language models (LLMs) that rely on attention mechanisms are typically trained with fixed context lengths which enforce upper limits on the length of input sequences that they can handle at evaluation time. To use these models on sequences longer than the train-time context length, one might employ techniques from the growing family of context length extrapolation methods -- most of which focus on modifying the system of positional encodings used in the attention mechanism to indicate where tokens or activations are located in the input sequence. We conduct a wide survey of existing methods of context length extrapolation on a base LLaMA or LLaMA 2 model, and introduce some of our own design as well -- in particular, a new truncation strategy for modifying the basis for the position encoding. We test these methods using three new evaluation tasks (FreeFormQA, AlteredNumericQA, and LongChat-Lines) as well as perplexity, which we find to be less fine-grained as a measure of long context performance of LLMs. We release the three tasks publicly as datasets on HuggingFace. We discover that linear scaling is the best method for extending context length, and show that further gains can be achieved by using longer scales at evaluation time. We also discover promising extrapolation capabilities in the truncated basis. To support further research in this area, we release three new 13B parameter long-context models which we call Giraffe: 4k and 16k context models trained from base LLaMA-13B, and a 32k context model trained from base LLaMA2-13B. We also release the code to replicate our results. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.02160 [pdf, other]

Speaker Diarization of Scripted Audiovisual Content

Authors: Yogesh Virkar, Brian Thompson, Rohit Paturi, Sundararajan Srinivasan, Marcello Federico

Abstract: The media localization industry usually requires a verbatim script of the final film or TV production in order to create subtitles or dubbing scripts in a foreign language. In particular, the verbatim script (i.e. as-broadcast script) must be structured into a sequence of dialogue lines each including time codes, speaker name and transcript. Current speech recognition technology alleviates the tra… ▽ More The media localization industry usually requires a verbatim script of the final film or TV production in order to create subtitles or dubbing scripts in a foreign language. In particular, the verbatim script (i.e. as-broadcast script) must be structured into a sequence of dialogue lines each including time codes, speaker name and transcript. Current speech recognition technology alleviates the transcription step. However, state-of-the-art speaker diarization models still fall short on TV shows for two main reasons: (i) their inability to track a large number of speakers, (ii) their low accuracy in detecting frequent speaker changes. To mitigate this problem, we present a novel approach to leverage production scripts used during the shooting process, to extract pseudo-labeled data for the speaker diarization task. We propose a novel semi-supervised approach and demonstrate improvements of 51.7% relative to two unsupervised baseline models on our metrics on a 66 show test set. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: 5 pages, 3 figures

arXiv:2307.09954 [pdf, other]

Priority-based DREAM Approach for Highly Manoeuvring Intruders in A Perimeter Defense Problem

Authors: Shridhar Velhal, Suresh Sundaram, Narasimhan Sundararajan

Abstract: In this paper, a Priority-based Dynamic REsource Allocation with decentralized Multi-task assignment (P-DREAM) approach is presented to protect a territory from highly manoeuvring intruders. In the first part, static optimization problems are formulated to compute the following parameters of the perimeter defense problem; the number of reserve stations, their locations, the priority region, the mo… ▽ More In this paper, a Priority-based Dynamic REsource Allocation with decentralized Multi-task assignment (P-DREAM) approach is presented to protect a territory from highly manoeuvring intruders. In the first part, static optimization problems are formulated to compute the following parameters of the perimeter defense problem; the number of reserve stations, their locations, the priority region, the monitoring region, and the minimum number of defenders required for the monitoring purpose. The concept of a prioritized intruder is proposed here to identify and handle those critical intruders (computed based on the velocity ratio and location) to be tackled on a priority basis. The computed priority region helps to assign reserve defenders sufficiently earlier such that they can neutralize the prioritized intruders. The monitoring region defines the minimum region to be monitored and is sufficient enough to handle the intruders. In the second part, the earlier developed DREAM approach is modified to incorporate the priority of an intruder. The proposed P-DREAM approach assigns the defenders to the prioritized intruders as the first task. A convex territory protection problem is simulated to illustrate the P-DREAM approach. It involves the computation of static parameters and solving the prioritized task assignments with dynamic resource allocation. Monte-Carlo results were conducted to verify the performance of P-DREAM, and the results clearly show that the P-DREAM approach can protect the territory with consistent performance against highly manoeuvring intruders. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2306.17776 [pdf, other]

A multivariate heavy-tailed integer-valued GARCH process with EM algorithm-based inference

Authors: Yuhyeong Jang, Raanju R. Sundararajan, Wagner Barreto-Souza

Abstract: A new multivariate integer-valued Generalized AutoRegressive Conditional Heteroscedastic process based on a multivariate Poisson generalized inverse Gaussian distribution is proposed. The estimation of parameters of the proposed multivariate heavy-tailed count time series model via maximum likelihood method is challenging since the likelihood function involves a Bessel function that depends on the… ▽ More A new multivariate integer-valued Generalized AutoRegressive Conditional Heteroscedastic process based on a multivariate Poisson generalized inverse Gaussian distribution is proposed. The estimation of parameters of the proposed multivariate heavy-tailed count time series model via maximum likelihood method is challenging since the likelihood function involves a Bessel function that depends on the multivariate counts and its dimension. As a consequence, numerical instability is often experienced in optimization procedures. To overcome this computational problem, two feasible variants of the Expectation-Maximization (EM) algorithm are proposed for estimating parameters of our model under low and high-dimensional settings. These EM algorithm variants provide computational benefits and help avoid the difficult direct optimization of the likelihood function from the proposed model. Our model and proposed estimation procedures can handle multiple features such as modeling of multivariate counts, heavy-taildness, overdispersion, accommodation of outliers, allowances for both positive and negative autocorrelations, estimation of cross/contemporaneous-correlation, and the efficient estimation of parameters from both statistical and computational points of view. Extensive Monte Carlo simulation studies are presented to assess the performance of the proposed EM algorithms. An application to modeling bivariate count time series data on cannabis possession-related offenses in Australia is discussed. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: 32pages, 14figures

MSC Class: 62M10 (Primary); 62M09; 62P25 (Secondary)

arXiv:2306.09313 [pdf, other]

Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction

Authors: Rohit Paturi, Sundararajan Srinivasan, Xiang Li

Abstract: Speaker diarization (SD) is typically used with an automatic speech recognition (ASR) system to ascribe speaker labels to recognized words. The conventional approach reconciles outputs from independently optimized ASR and SD systems, where the SD system typically uses only acoustic information to identify the speakers in the audio stream. This approach can lead to speaker errors especially around… ▽ More Speaker diarization (SD) is typically used with an automatic speech recognition (ASR) system to ascribe speaker labels to recognized words. The conventional approach reconciles outputs from independently optimized ASR and SD systems, where the SD system typically uses only acoustic information to identify the speakers in the audio stream. This approach can lead to speaker errors especially around speaker turns and regions of speaker overlap. In this paper, we propose a novel second-pass speaker error correction system using lexical information, leveraging the power of modern language models (LMs). Our experiments across multiple telephony datasets show that our approach is both effective and robust. Training and tuning only on the Fisher dataset, this error correction approach leads to relative word-level diarization error rate (WDER) reductions of 15-30% on three telephony datasets: RT03-CTS, Callhome American English and held-out portions of Fisher. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted at INTERSPEECH 2023

arXiv:2306.09055 [pdf, other]

Predictive Maneuver Planning with Deep Reinforcement Learning (PMP-DRL) for comfortable and safe autonomous driving

Authors: Jayabrata Chowdhury, Vishruth Veerendranath, Suresh Sundaram, Narasimhan Sundararajan

Abstract: This paper presents a Predictive Maneuver Planning with Deep Reinforcement Learning (PMP-DRL) model for maneuver planning. Traditional rule-based maneuver planning approaches often have to improve their abilities to handle the variabilities of real-world driving scenarios. By learning from its experience, a Reinforcement Learning (RL)-based driving agent can adapt to changing driving conditions an… ▽ More This paper presents a Predictive Maneuver Planning with Deep Reinforcement Learning (PMP-DRL) model for maneuver planning. Traditional rule-based maneuver planning approaches often have to improve their abilities to handle the variabilities of real-world driving scenarios. By learning from its experience, a Reinforcement Learning (RL)-based driving agent can adapt to changing driving conditions and improve its performance over time. Our proposed approach combines a predictive model and an RL agent to plan for comfortable and safe maneuvers. The predictive model is trained using historical driving data to predict the future positions of other surrounding vehicles. The surrounding vehicles' past and predicted future positions are embedded in context-aware grid maps. At the same time, the RL agent learns to make maneuvers based on this spatio-temporal context information. Performance evaluation of PMP-DRL has been carried out using simulated environments generated from publicly available NGSIM US101 and I80 datasets. The training sequence shows the continuous improvement in the driving experiences. It shows that proposed PMP-DRL can learn the trade-off between safety and comfortability. The decisions generated by the recent imitation learning-based model are compared with the proposed PMP-DRL for unseen scenarios. The results clearly show that PMP-DRL can handle complex real-world scenarios and make better comfortable and safe maneuver decisions than rule-based and imitative models. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.06234 [pdf, other]

Using Foundation Models to Detect Policy Violations with Minimal Supervision

Authors: Sid Mittal, Vineet Gupta, Frederick Liu, Mukund Sundararajan

Abstract: Foundation models, i.e. large neural networks pre-trained on large text corpora, have revolutionized NLP. They can be instructed directly (e.g. (arXiv:2005.14165)) - this is called hard prompting - and they can be tuned using very little data (e.g. (arXiv:2104.08691)) - this technique is called soft prompting. We seek to leverage their capabilities to detect policy violations. Our contributions ar… ▽ More Foundation models, i.e. large neural networks pre-trained on large text corpora, have revolutionized NLP. They can be instructed directly (e.g. (arXiv:2005.14165)) - this is called hard prompting - and they can be tuned using very little data (e.g. (arXiv:2104.08691)) - this technique is called soft prompting. We seek to leverage their capabilities to detect policy violations. Our contributions are: We identify a hard prompt that adapts chain-of-thought prompting to policy violation tasks. This prompt produces policy violation classifications, along with extractive explanations that justify the classification. We compose the hard-prompts with soft prompt tuning to produce a classifier that attains high accuracy with very little supervision; the same classifier also produces explanations. Though the supervision only acts on the classifications, we find that the modified explanations remain consistent with the (tuned) model's response. Along the way, we identify several unintuitive aspects of foundation models. For instance, adding an example from a specific class can actually reduce predictions of that class, and separately, the effects of tokenization on scoring etc. Based on our technical results, we identify a simple workflow for product teams to quickly develop effective policy violation detectors. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 16 pages

arXiv:2305.04497 [pdf, other]

IIITD-20K: Dense captioning for Text-Image ReID

Authors: A V Subramanyam, Niranjan Sundararajan, Vibhu Dubey, Brejesh Lall

Abstract: Text-to-Image (T2I) ReID has attracted a lot of attention in the recent past. CUHK-PEDES, RSTPReid and ICFG-PEDES are the three available benchmarks to evaluate T2I ReID methods. RSTPReid and ICFG-PEDES comprise of identities from MSMT17 but due to limited number of unique persons, the diversity is limited. On the other hand, CUHK-PEDES comprises of 13,003 identities but has relatively shorter tex… ▽ More Text-to-Image (T2I) ReID has attracted a lot of attention in the recent past. CUHK-PEDES, RSTPReid and ICFG-PEDES are the three available benchmarks to evaluate T2I ReID methods. RSTPReid and ICFG-PEDES comprise of identities from MSMT17 but due to limited number of unique persons, the diversity is limited. On the other hand, CUHK-PEDES comprises of 13,003 identities but has relatively shorter text description on average. Further, these datasets are captured in a restricted environment with limited number of cameras. In order to further diversify the identities and provide dense captions, we propose a novel dataset called IIITD-20K. IIITD-20K comprises of 20,000 unique identities captured in the wild and provides a rich dataset for text-to-image ReID. With a minimum of 26 words for a description, each image is densely captioned. We further synthetically generate images and fine-grained captions using Stable-diffusion and BLIP models trained on our dataset. We perform elaborate experiments using state-of-art text-to-image ReID models and vision-language pre-trained models and present a comprehensive analysis of the dataset. Our experiments also reveal that synthetically generated data leads to a substantial performance improvement in both same dataset as well as cross dataset settings. Our dataset is available at https://bit.ly/3pkA3Rj. △ Less

Submitted 8 May, 2023; originally announced May 2023.

arXiv:2302.07975 [pdf, other]

Multi-Task Differential Privacy Under Distribution Skew

Authors: Walid Krichene, Prateek Jain, Shuang Song, Mukund Sundararajan, Abhradeep Thakurta, Li Zhang

Abstract: We study the problem of multi-task learning under user-level differential privacy, in which $n$ users contribute data to $m$ tasks, each involving a subset of users. One important aspect of the problem, that can significantly impact quality, is the distribution skew among tasks. Certain tasks may have much fewer data samples than others, making them more susceptible to the noise added for privacy.… ▽ More We study the problem of multi-task learning under user-level differential privacy, in which $n$ users contribute data to $m$ tasks, each involving a subset of users. One important aspect of the problem, that can significantly impact quality, is the distribution skew among tasks. Certain tasks may have much fewer data samples than others, making them more susceptible to the noise added for privacy. It is natural to ask whether algorithms can adapt to this skew to improve the overall utility. We give a systematic analysis of the problem, by studying how to optimally allocate a user's privacy budget among tasks. We propose a generic algorithm, based on an adaptive reweighting of the empirical loss, and show that when there is task distribution skew, this gives a quantifiable improvement of excess empirical risk. Experimental studies on recommendation problems that exhibit a long tail of small tasks, demonstrate that our methods significantly improve utility, achieving the state of the art on two standard benchmarks. △ Less

Submitted 15 February, 2023; originally announced February 2023.

arXiv:2301.03664 [pdf, other]

Frequency Band Analysis of Nonstationary Multivariate Time Series

Authors: Raanju R. Sundararajan, Scott A. Bruce

Abstract: Information from frequency bands in biomedical time series provides useful summaries of the observed signal. Many existing methods consider summaries of the time series obtained over a few well-known, pre-defined frequency bands of interest. However, these methods do not provide data-driven methods for identifying frequency bands that optimally summarize frequency-domain information in the time se… ▽ More Information from frequency bands in biomedical time series provides useful summaries of the observed signal. Many existing methods consider summaries of the time series obtained over a few well-known, pre-defined frequency bands of interest. However, these methods do not provide data-driven methods for identifying frequency bands that optimally summarize frequency-domain information in the time series. A new method to identify partition points in the frequency space of a multivariate locally stationary time series is proposed. These partition points signify changes across frequencies in the time-varying behavior of the signal and provide frequency band summary measures that best preserve the nonstationary dynamics of the observed series. An $L_2$ norm-based discrepancy measure that finds differences in the time-varying spectral density matrix is constructed, and its asymptotic properties are derived. New nonparametric bootstrap tests are also provided to identify significant frequency partition points and to identify components and cross-components of the spectral matrix exhibiting changes over frequencies. Finite-sample performance of the proposed method is illustrated via simulations. The proposed method is used to develop optimal frequency band summary measures for characterizing time-varying behavior in resting-state electroencephalography (EEG) time series, as well as identifying components and cross-components associated with each frequency partition point. △ Less

Submitted 9 January, 2023; originally announced January 2023.

MSC Class: 62M10; 62M15

arXiv:2301.02032 [pdf, other]

On the fractional transversely isotropic functionally graded nature of soft biological tissues

Authors: Sachin Gunda, Sundararajan Natarajan, Olga Barrera

Abstract: This paper focuses on the origin of the poroelastic anisotropic behaviour of the meniscal tissue and its spatially varying properties. We present confined compression creep test results on samples extracted from three parts of the tissue (Central body, Anterior horn and Posterior horn) in three orientations (Circumferential, Radial and Vertical). We show that a poroelastic model in which the fluid… ▽ More This paper focuses on the origin of the poroelastic anisotropic behaviour of the meniscal tissue and its spatially varying properties. We present confined compression creep test results on samples extracted from three parts of the tissue (Central body, Anterior horn and Posterior horn) in three orientations (Circumferential, Radial and Vertical). We show that a poroelastic model in which the fluid flow evolution is ruled by non-integer order operators (fractional Darcy's law) provides accurate agreement with the experimental creep data. The model is validated against two additional sets of experimental data: stress relaxation and fluid loss during the consolidation process measured as weight reduction. Results show that the meniscus can be considered as a transversely isotropic poroelastic material. This behaviour is due to the fluid flow rate being about three times higher in the circumferential direction than in the radial and vertical directions in the body region of the meniscus. In the anterior horn, the elastic properties are transversely isotropic, with the aggregate modulus higher in the radial direction than in the circumferential and vertical directions. The 3D fractional poroelastic model is implemented in finite element software and quantities such as flux of interstitial fluid during the consolidation process, a non-trivial experimental measure, are determined. △ Less

Submitted 5 January, 2023; originally announced January 2023.

arXiv:2212.11950 [pdf, other]

doi 10.1109/ITSC48978.2021.9564420

Peek into the Future Camera-based Occupant Sensing in Configurable Cabins for Autonomous Vehicles

Authors: Avinash Prabu, Renran Tian, Lingxi Li, Jialiang Le, Srinivasan Sundararajan, Saeed Barbat

Abstract: The development of fully autonomous vehicles (AVs) can potentially eliminate drivers and introduce unprecedented seating design. However, highly flexible seat configurations may lead to occupants' unconventional poses and actions. Understanding occupant behaviors and prioritize safety features become eye-catching topics in the AV research frontier. Visual sensors have the advantages of cost-effici… ▽ More The development of fully autonomous vehicles (AVs) can potentially eliminate drivers and introduce unprecedented seating design. However, highly flexible seat configurations may lead to occupants' unconventional poses and actions. Understanding occupant behaviors and prioritize safety features become eye-catching topics in the AV research frontier. Visual sensors have the advantages of cost-efficiency and high-fidelity imaging and become more widely applied for in-car sensing purposes. Occlusion is one big concern for this type of system in crowded car cabins. It is important but largely unknown about how a visual-sensing framework will look like to support 2-D and 3-D human pose tracking towards highly configurable seats. As one of the first studies to touch this topic, we peek into the future camera-based sensing framework via a simulation experiment. Constructed representative car-cabin, seat layouts, and occupant sizes, camera coverage from different angles and positions is simulated and calculated. The comprehensive coverage data are synthesized through an optimization process to determine the camera layout and overall occupant coverage. The results show the needs and design of a different number of cameras to fully or partially cover all the occupants with changeable configurations of up to six seats. △ Less

Submitted 22 December, 2022; originally announced December 2022.

Comments: Conference: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) Link: https://ieeexplore.ieee.org/document/9564420

arXiv:2212.07084 [pdf, other]

Fully Complex-valued Fully Convolutional Multi-feature Fusion Network (FC2MFN) for Building Segmentation of InSAR images

Authors: Aniruddh Sikdar, Sumanth Udupa, Suresh Sundaram, Narasimhan Sundararajan

Abstract: Building segmentation in high-resolution InSAR images is a challenging task that can be useful for large-scale surveillance. Although complex-valued deep learning networks perform better than their real-valued counterparts for complex-valued SAR data, phase information is not retained throughout the network, which causes a loss of information. This paper proposes a Fully Complex-valued, Fully Conv… ▽ More Building segmentation in high-resolution InSAR images is a challenging task that can be useful for large-scale surveillance. Although complex-valued deep learning networks perform better than their real-valued counterparts for complex-valued SAR data, phase information is not retained throughout the network, which causes a loss of information. This paper proposes a Fully Complex-valued, Fully Convolutional Multi-feature Fusion Network(FC2MFN) for building semantic segmentation on InSAR images using a novel, fully complex-valued learning scheme. The network learns multi-scale features, performs multi-feature fusion, and has a complex-valued output. For the particularity of complex-valued InSAR data, a new complex-valued pooling layer is proposed that compares complex numbers considering their magnitude and phase. This helps the network retain the phase information even through the pooling layer. Experimental results on the simulated InSAR dataset show that FC2MFN achieves better results compared to other state-of-the-art methods in terms of segmentation performance and model complexity. △ Less

Submitted 14 December, 2022; originally announced December 2022.

Comments: Accepted for publication in IEEE Symposium Series On Computational Intelligence 2022, 8 pages, 6 figures

arXiv:2211.13280 [pdf, other]

Device Directedness with Contextual Cues for Spoken Dialog Systems

Authors: Dhanush Bekal, Sundararajan Srinivasan, Sravan Bodapati, Srikanth Ronanki, Katrin Kirchhoff

Abstract: In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins. Following the success of pre-trained models, we use low-level speech representations from a self-supervised representation learning model for our downstream classification task. Further, we propose a novel technique to infu… ▽ More In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins. Following the success of pre-trained models, we use low-level speech representations from a self-supervised representation learning model for our downstream classification task. Further, we propose a novel technique to infuse lexical information directly into speech representations to improve the domain-specific language information implicitly learned during pre-training. Experiments conducted on spoken dialog data show that our proposed model trained to validate barge-in entirely from speech representations is faster by 38% relative and achieves 4.5% relative F1 score improvement over a baseline LSTM model that uses both audio and Automatic Speech Recognition (ASR) 1-best hypotheses. On top of this, our best proposed model with lexically infused representations along with contextual features provides a further relative improvement of 5.7% in the F1 score but only 22% faster than the baseline. △ Less

Submitted 23 November, 2022; originally announced November 2022.

arXiv:2208.14494 [pdf, other]

Theoretical analysis of cargo transport by catch bonded motors in optical trap** assays

Authors: Naren Sundararajan, Sougata Guha, Sudipto Muhuri, Mithun K. Mitra

Abstract: Dynein motors exhibit catch bonding, where the unbinding rate of the motors from microtubule filaments decreases with increasing opposing load. The implications of this catch bond on the transport properties of dynein-driven cargo are yet to be fully understood. In this context, optical trap** assays constitute an important means of accurately measuring the forces generated by molecular motor pr… ▽ More Dynein motors exhibit catch bonding, where the unbinding rate of the motors from microtubule filaments decreases with increasing opposing load. The implications of this catch bond on the transport properties of dynein-driven cargo are yet to be fully understood. In this context, optical trap** assays constitute an important means of accurately measuring the forces generated by molecular motor proteins. We investigate, using theory and stochastic simulations, the transport properties of cargo transported by catch bonded dynein molecular motors - both singly and in teams - in a harmonic potential, which mimics the variable force experienced by cargo in an optical trap. We estimate the biologically relevant measures of first passage time - the time during which the cargo remains bound to the microtubule and detachment force -the force at which the cargo unbinds from the microtubule, using both two-dimensional and one-dimensional force balance frameworks. Our results suggest that even for cargo transported by a single motor, catch bonding may play a role depending on the force scale which marks the onset of the catch bond. By comparing with experimental measurements on single dynein-driven transport, we estimate realistic bounds of this catch bond force scale. Generically, catch bonding results in increased persistent motion, and can also generate non-monotonic behaviour of first passage times. For cargo transported by multiple motors, emergent collective effects due to catch bonding can result in non-trivial re-entrant phenomena wherein average first passage times and detachment forces exhibit non-monotonic behaviour as a function of the stall force and the motor velocity. △ Less

Submitted 16 November, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

Comments: 12 pages, 7 figures

arXiv:2205.11781 [pdf, other]

Attributing AUC-ROC to Analyze Binary Classifier Performance

Authors: Arya Tafvizi, Besim Avci, Mukund Sundararajan

Abstract: Area Under the Receiver Operating Characteristic Curve (AUC-ROC) is a popular evaluation metric for binary classifiers. In this paper, we discuss techniques to segment the AUC-ROC along human-interpretable dimensions. AUC-ROC is not an additive/linear function over the data samples, therefore such segmenting the overall AUC-ROC is different from tabulating the AUC-ROC of data segments. To segment… ▽ More Area Under the Receiver Operating Characteristic Curve (AUC-ROC) is a popular evaluation metric for binary classifiers. In this paper, we discuss techniques to segment the AUC-ROC along human-interpretable dimensions. AUC-ROC is not an additive/linear function over the data samples, therefore such segmenting the overall AUC-ROC is different from tabulating the AUC-ROC of data segments. To segment the overall AUC-ROC, we must first solve an \emph{attribution} problem to identify credit for individual examples. We observe that AUC-ROC, though non-linear over examples, is linear over \emph{pairs} of examples. This observation leads to a simple, efficient attribution technique for examples (example attributions), and for pairs of examples (pair attributions). We automatically slice these attributions using decision trees by making the tree predict the attributions; we use the notion of honest estimates along with a t-test to mitigate false discovery. Our experiments with the method show that an inferior model can outperform a superior model (trained to optimize a different training objective) on the inferior model's own training objective, a manifestation of Goodhart's Law. In contrast, AUC attributions enable a reasonable comparison. Example attributions can be used to slice this comparison. Pair attributions are used to categorize pairs of items -- one positively labeled and one negatively -- that the model has trouble separating. These categories identify the decision boundary of the classifier and the headroom to improve AUC. △ Less

Submitted 24 May, 2022; originally announced May 2022.

arXiv:2205.10092 [pdf, other]

An efficient Deep Spatio-Temporal Context Aware decision Network (DST-CAN) for Predictive Manoeuvre Planning

Authors: Jayabrata Chowdhury, Suresh Sundaram, Nishant Rao, Narasimhan Sundararajan

Abstract: To ensure the safety and efficiency of its maneuvers, an Autonomous Vehicle (AV) should anticipate the future intentions of surrounding vehicles using its sensor information. If an AV can predict its surrounding vehicles' future trajectories, it can make safe and efficient manoeuvre decisions. In this paper, we present such a Deep Spatio-Temporal Context-Aware decision Network (DST-CAN) model for… ▽ More To ensure the safety and efficiency of its maneuvers, an Autonomous Vehicle (AV) should anticipate the future intentions of surrounding vehicles using its sensor information. If an AV can predict its surrounding vehicles' future trajectories, it can make safe and efficient manoeuvre decisions. In this paper, we present such a Deep Spatio-Temporal Context-Aware decision Network (DST-CAN) model for predictive manoeuvre planning of AVs. A memory neuron network is used to predict future trajectories of its surrounding vehicles. The driving environment's spatio-temporal information (past, present, and predicted future trajectories) are embedded into a context-aware grid. The proposed DST-CAN model employs these context-aware grids as inputs to a convolutional neural network to understand the spatial relationships between the vehicles and determine a safe and efficient manoeuvre decision. The DST-CAN model also uses information of human driving behavior on a highway. Performance evaluation of DST-CAN has been carried out using two publicly available NGSIM US-101 and I-80 datasets. Also, rule-based ground truth decisions have been compared with those generated by DST-CAN. The results clearly show that DST-CAN can make much better decisions with 3-sec of predicted trajectories of neighboring vehicles compared to currently existing methods that do not use this prediction. △ Less

Submitted 20 May, 2022; originally announced May 2022.

Comments: 11 pages, 9 figures

arXiv:2202.13870 [pdf, other]

Simulating Network Paths with Recurrent Buffering Units

Authors: Divyam Anshumaan, Sriram Balasubramanian, Shubham Tiwari, Nagarajan Natarajan, Sundararajan Sellamanickam, Venkata N. Padmanabhan

Abstract: Simulating physical network paths (e.g., Internet) is a cornerstone research problem in the emerging sub-field of AI-for-networking. We seek a model that generates end-to-end packet delay values in response to the time-varying load offered by a sender, which is typically a function of the previously output delays. The problem setting is unique, and renders the state-of-the-art text and time-series… ▽ More Simulating physical network paths (e.g., Internet) is a cornerstone research problem in the emerging sub-field of AI-for-networking. We seek a model that generates end-to-end packet delay values in response to the time-varying load offered by a sender, which is typically a function of the previously output delays. The problem setting is unique, and renders the state-of-the-art text and time-series generative models inapplicable or ineffective. We formulate an ML problem at the intersection of dynamical systems, sequential decision making, and time-series modeling. We propose a novel grey-box approach to network simulation that embeds the semantics of physical network path in a new RNN-style model called RBU, providing the interpretability of standard network simulator tools, the power of neural models, the efficiency of SGD-based techniques for learning, and yielding promising results on synthetic and real-world network traces. △ Less

Submitted 6 December, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: Accepted in AAAI 2023, 19 pages, 14 figures

arXiv:2202.11844 [pdf, other]

First is Better Than Last for Language Data Influence

Authors: Chih-Kuan Yeh, Ankur Taly, Mukund Sundararajan, Frederick Liu, Pradeep Ravikumar

Abstract: The ability to identify influential training examples enables us to debug training data and explain model behavior. Existing techniques to do so are based on the flow of training data influence through the model parameters. For large models in NLP applications, it is often computationally infeasible to study this flow through all model parameters, therefore techniques usually pick the last layer o… ▽ More The ability to identify influential training examples enables us to debug training data and explain model behavior. Existing techniques to do so are based on the flow of training data influence through the model parameters. For large models in NLP applications, it is often computationally infeasible to study this flow through all model parameters, therefore techniques usually pick the last layer of weights. However, we observe that since the activation connected to the last layer of weights contains "shared logic", the data influenced calculated via the last layer weights prone to a ``cancellation effect'', where the data influence of different examples have large magnitude that contradicts each other. The cancellation effect lowers the discriminative power of the influence score, and deleting influential examples according to this measure often does not change the model's behavior by much. To mitigate this, we propose a technique called TracIn-WE that modifies a method called TracIn to operate on the word embedding layer instead of the last layer, where the cancellation effect is less severe. One potential concern is that influence based on the word embedding layer may not encode sufficient high level information. However, we find that gradients (unlike embeddings) do not suffer from this, possibly because they chain through higher layers. We show that TracIn-WE significantly outperforms other data influence methods applied on the last layer significantly on the case deletion evaluation on three language classification tasks for different models. In addition, TracIn-WE can produce scores not just at the level of the overall training input, but also at the level of words within the training input, a further aid in debugging. △ Less

Submitted 27 October, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

arXiv:2202.09480 [pdf, other]

Reciprocity in Machine Learning

Authors: Mukund Sundararajan, Walid Krichene

Abstract: Machine learning is pervasive. It powers recommender systems such as Spotify, Instagram and YouTube, and health-care systems via models that predict sleep patterns, or the risk of disease. Individuals contribute data to these models and benefit from them. Are these contributions (outflows of influence) and benefits (inflows of influence) reciprocal? We propose measures of outflows, inflows and rec… ▽ More Machine learning is pervasive. It powers recommender systems such as Spotify, Instagram and YouTube, and health-care systems via models that predict sleep patterns, or the risk of disease. Individuals contribute data to these models and benefit from them. Are these contributions (outflows of influence) and benefits (inflows of influence) reciprocal? We propose measures of outflows, inflows and reciprocity building on previously proposed measures of training data influence. Our initial theoretical and empirical results indicate that under certain distributional assumptions, some classes of models are approximately reciprocal. We conclude with several open directions. △ Less

Submitted 18 February, 2022; originally announced February 2022.

arXiv:2202.04518 [pdf, ps, other]

Insecurity problem for assertions remains in NP

Authors: R. Ramanujam, Vaishnavi Sundararajan, S. P. Suresh

Abstract: In the symbolic verification of cryptographic protocols, a central problem is deciding whether a protocol admits an execution which leaks a designated secret to the malicious intruder. Rusinowitch and Turuani (2003) show that, when considering finitely many sessions and a protocol model where only terms are communicated, this ``insecurity problem'' is NP-complete. Central to their proof strategy i… ▽ More In the symbolic verification of cryptographic protocols, a central problem is deciding whether a protocol admits an execution which leaks a designated secret to the malicious intruder. Rusinowitch and Turuani (2003) show that, when considering finitely many sessions and a protocol model where only terms are communicated, this ``insecurity problem'' is NP-complete. Central to their proof strategy is the observation that any execution of a protocol can be simulated by one where the intruder only communicates terms of bounded size. However, when we consider models where, in addition to terms, one can also communicate logical formulas, the analysis of the insecurity problem becomes tricky. In this paper we consider the insecurity problem for protocols with logical statements that include equality on terms and existential quantification. Witnesses for existential quantifiers may be of unbounded size, and obtaining small witnesses while maintaining equality proofs complicates the analysis. We use a notion of "typed" equality proofs, and extend techniques from [RT03] to show that this problem is also in NP. We also show that these techniques can be used to analyze the insecurity problem for systems such as the one proposed in Ramanujam, Sundararajan and Suresh (2017). △ Less

Submitted 25 January, 2023; v1 submitted 9 February, 2022; originally announced February 2022.

arXiv:2112.05863 [pdf, other]

Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech

Authors: Rohit Paturi, Sundararajan Srinivasan, Katrin Kirchhoff, Daniel Garcia-Romero

Abstract: Many of the recent advances in speech separation are primarily aimed at synthetic mixtures of short audio utterances with high degrees of overlap. Most of these approaches need an additional stitching step to stitch the separated speech chunks for long form audio. Since most of the approaches involve Permutation Invariant training (PIT), the order of separated speech chunks is nondeterministic and… ▽ More Many of the recent advances in speech separation are primarily aimed at synthetic mixtures of short audio utterances with high degrees of overlap. Most of these approaches need an additional stitching step to stitch the separated speech chunks for long form audio. Since most of the approaches involve Permutation Invariant training (PIT), the order of separated speech chunks is nondeterministic and leads to difficulty in accurately stitching homogenous speaker chunks for downstream tasks like Automatic Speech Recognition (ASR). Also, most of these models are trained with synthetic mixtures and do not generalize to real conversational data. In this paper, we propose a speaker conditioned separator trained on speaker embeddings extracted directly from the mixed signal using an over-clustering based approach. This model naturally regulates the order of the separated chunks without the need for an additional stitching step. We also introduce a data sampling strategy with real and synthetic mixtures which generalizes well to real conversation speech. With this model and data sampling technique, we show significant improvements in speaker-attributed word error rate (SA-WER) on Hub5 data. △ Less

Submitted 6 September, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

Comments: Accepted for publication at Interspeech 2022

arXiv:2112.04960 [pdf, other]

mechanoChemML: A software library for machine learning in computational materials physics

Authors: X. Zhang, G. H. Teichert, Z. Wang, M. Duschenes, S. Srivastava, E. Livingston, J. Holber, M. Faghih Shojaei, A. Sundararajan, K. Garikipati

Abstract: We present mechanoChemML, a machine learning software library for computational materials physics. mechanoChemML is designed to function as an interface between platforms that are widely used for machine learning on one hand, and others for solution of partial differential equations-based models of physics. Of special interest here, and the focus of mechanoChemML, are applications to computational… ▽ More We present mechanoChemML, a machine learning software library for computational materials physics. mechanoChemML is designed to function as an interface between platforms that are widely used for machine learning on one hand, and others for solution of partial differential equations-based models of physics. Of special interest here, and the focus of mechanoChemML, are applications to computational materials physics. These typically feature the coupled solution of material transport, reaction, phase transformation, mechanics, heat transport and electrochemistry. Central to the organization of mechanoChemML are machine learning workflows that arise in the context of data-driven computational materials physics. The mechanoChemML code structure is described, the machine learning workflows are laid out and their application to the solution of several problems in materials physics is outlined. △ Less

Submitted 30 April, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

arXiv:2112.03499 [pdf, other]

A Piece-wise Polynomial Filtering Approach for Graph Neural Networks

Authors: Vijay Lingam, Chanakya Ekbote, Manan Sharma, Rahul Ragesh, Arun Iyer, Sundararajan Sellamanickam

Abstract: Graph Neural Networks (GNNs) exploit signals from node features and the input graph topology to improve node classification task performance. However, these models tend to perform poorly on heterophilic graphs, where connected nodes have different labels. Recently proposed GNNs work across graphs having varying levels of homophily. Among these, models relying on polynomial graph filters have shown… ▽ More Graph Neural Networks (GNNs) exploit signals from node features and the input graph topology to improve node classification task performance. However, these models tend to perform poorly on heterophilic graphs, where connected nodes have different labels. Recently proposed GNNs work across graphs having varying levels of homophily. Among these, models relying on polynomial graph filters have shown promise. We observe that solutions to these polynomial graph filter models are also solutions to an overdetermined system of equations. It suggests that in some instances, the model needs to learn a reasonably high order polynomial. On investigation, we find the proposed models ineffective at learning such polynomials due to their designs. To mitigate this issue, we perform an eigendecomposition of the graph and propose to learn multiple adaptive polynomial filters acting on different subsets of the spectrum. We theoretically and empirically show that our proposed model learns a better filter, thereby improving classification accuracy. We study various aspects of our proposed model including, dependency on the number of eigencomponents utilized, latent polynomial filters learned, and performance of the individual polynomials on the node classification task. We further show that our model is scalable by evaluating over large graphs. Our model achieves performance gains of up to 5% over the state-of-the-art models and outperforms existing polynomial filter-based approaches in general. △ Less

Submitted 7 December, 2021; originally announced December 2021.

Comments: 28 pages, 9 figures, Under Review

arXiv:2112.00158 [pdf]

Representation learning through cross-modal conditional teacher-student training for speech emotion recognition

Authors: Sundararajan Srinivasan, Zhaocheng Huang, Katrin Kirchhoff

Abstract: Generic pre-trained speech and text representations promise to reduce the need for large labeled datasets on specific speech and language tasks. However, it is not clear how to effectively adapt these representations for speech emotion recognition. Recent public benchmarks show the efficacy of several popular self-supervised speech representations for emotion classification. In this study, we show… ▽ More Generic pre-trained speech and text representations promise to reduce the need for large labeled datasets on specific speech and language tasks. However, it is not clear how to effectively adapt these representations for speech emotion recognition. Recent public benchmarks show the efficacy of several popular self-supervised speech representations for emotion classification. In this study, we show that the primary difference between the top-performing representations is in predicting valence while the differences in predicting activation and dominance dimensions are less pronounced. However, we show that even the best-performing HuBERT representation underperforms on valence prediction compared to a multimodal model that also incorporates text representation. We address this shortcoming by injecting lexical information into the speech representation using the multimodal model as a teacher. To improve the efficacy of our approach, we propose a novel estimate of the quality of the emotion predictions, to condition teacher-student training. We report new audio-only state-of-the-art concordance correlation coefficient (CCC) values of 0.757, 0.627, 0.671 for activation, valence and dominance predictions, respectively, on the MSP-Podcast corpus, and also state-of-the-art values of 0.667, 0.582, 0.545 on the IEMOCAP corpus. △ Less

Submitted 27 January, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

Comments: Accepted for publication at IEEE ICASSP 2022

arXiv:2109.13995 [pdf, other]

IGLU: Efficient GCN Training via Lazy Updates

Authors: S Deepak Narayanan, Aditya Sinha, Prateek Jain, Purushottam Kar, Sundararajan Sellamanickam

Abstract: Training multi-layer Graph Convolution Networks (GCN) using standard SGD techniques scales poorly as each descent step ends up updating node embeddings for a large portion of the graph. Recent attempts to remedy this sub-sample the graph that reduces compute but introduce additional variance and may offer suboptimal performance. This paper develops the IGLU method that caches intermediate computat… ▽ More Training multi-layer Graph Convolution Networks (GCN) using standard SGD techniques scales poorly as each descent step ends up updating node embeddings for a large portion of the graph. Recent attempts to remedy this sub-sample the graph that reduces compute but introduce additional variance and may offer suboptimal performance. This paper develops the IGLU method that caches intermediate computations at various GCN layers thus enabling lazy updates that significantly reduce the compute cost of descent. IGLU introduces bounded bias into the gradients but nevertheless converges to a first-order saddle point under standard assumptions such as objective smoothness. Benchmark experiments show that IGLU offers up to 1.2% better accuracy despite requiring up to 88% less compute. △ Less

Submitted 3 April, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

Comments: Published as Conference Paper at ICLR 2022, 36 Pages

arXiv:2107.13312 [pdf, other]

Effective Eigendecomposition based Graph Adaptation for Heterophilic Networks

Authors: Vijay Lingam, Rahul Ragesh, Arun Iyer, Sundararajan Sellamanickam

Abstract: Graph Neural Networks (GNNs) exhibit excellent performance when graphs have strong homophily property, i.e. connected nodes have the same labels. However, they perform poorly on heterophilic graphs. Several approaches address the issue of heterophily by proposing models that adapt the graph by optimizing task-specific loss function using labelled data. These adaptations are made either via attenti… ▽ More Graph Neural Networks (GNNs) exhibit excellent performance when graphs have strong homophily property, i.e. connected nodes have the same labels. However, they perform poorly on heterophilic graphs. Several approaches address the issue of heterophily by proposing models that adapt the graph by optimizing task-specific loss function using labelled data. These adaptations are made either via attention or by attenuating or enhancing various low-frequency/high-frequency signals, as needed for the task at hand. More recent approaches adapt the eigenvalues of the graph. One important interpretation of this adaptation is that these models select/weigh the eigenvectors of the graph. Based on this interpretation, we present an eigendecomposition based approach and propose EigenNetwork models that improve the performance of GNNs on heterophilic graphs. Performance improvement is achieved by learning flexible graph adaptation functions that modulate the eigenvalues of the graph. Regularization of these functions via parameter sharing helps to improve the performance even more. Our approach achieves up to 11% improvement in performance over the state-of-the-art methods on heterophilic graphs. △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: arXiv admin note: text overlap with arXiv:2106.12807

arXiv:2107.10135 [pdf, other]

Global Outliers Detection in Wireless Sensor Networks: A Novel Approach Integrating Time-Series Analysis, Entropy, and Random Forest-based Classification

Authors: Mahmood Safaei, Maha Driss, Wadii Boulila, Elankovan A Sundararajan, Mitra Safaei

Abstract: Wireless Sensor Networks (WSNs) have recently attracted greater attention worldwide due to their practicality in monitoring, communicating, and reporting specific physical phenomena. The data collected by WSNs is often inaccurate as a result of unavoidable environmental factors, which may include noise, signal weakness, or intrusion attacks depending on the specific situation. Sending high-noise d… ▽ More Wireless Sensor Networks (WSNs) have recently attracted greater attention worldwide due to their practicality in monitoring, communicating, and reporting specific physical phenomena. The data collected by WSNs is often inaccurate as a result of unavoidable environmental factors, which may include noise, signal weakness, or intrusion attacks depending on the specific situation. Sending high-noise data has negative effects not just on data accuracy and network reliability, but also regarding the decision-making processes in the base station. Anomaly detection, or outlier detection, is the process of detecting noisy data amidst the contexts thus described. The literature contains relatively few noise detection techniques in the context of WSNs, particularly for outlier-detection algorithms applying time series analysis, which considers the effective neighbors to ensure a global-collaborative detection. Hence, the research presented in this paper is intended to design and implement a global outlier-detection approach, which allows us to find and select appropriate neighbors to ensure an adaptive collaborative detection based on time-series analysis and entropy techniques. The proposed approach applies a random forest algorithm for identifying the best results. To measure the effectiveness and efficiency of the proposed approach, a comprehensive and real scenario provided by the Intel Berkeley Research lab has been simulated. Noisy data have been injected into the collected data randomly. The results obtained from the experiment then conducted experimentation demonstrate that our approach can detect anomalies with up to 99% accuracy. △ Less

Submitted 21 July, 2021; originally announced July 2021.

arXiv:2106.12807 [pdf, other]

Simple Truncated SVD based Model for Node Classification on Heterophilic Graphs

Authors: Vijay Lingam, Rahul Ragesh, Arun Iyer, Sundararajan Sellamanickam

Abstract: Graph Neural Networks (GNNs) have shown excellent performance on graphs that exhibit strong homophily with respect to the node labels i.e. connected nodes have same labels. However, they perform poorly on heterophilic graphs. Recent approaches have typically modified aggregation schemes, designed adaptive graph filters, etc. to address this limitation. In spite of this, the performance on heteroph… ▽ More Graph Neural Networks (GNNs) have shown excellent performance on graphs that exhibit strong homophily with respect to the node labels i.e. connected nodes have same labels. However, they perform poorly on heterophilic graphs. Recent approaches have typically modified aggregation schemes, designed adaptive graph filters, etc. to address this limitation. In spite of this, the performance on heterophilic graphs can still be poor. We propose a simple alternative method that exploits Truncated Singular Value Decomposition (TSVD) of topological structure and node features. Our approach achieves up to ~30% improvement in performance over state-of-the-art methods on heterophilic graphs. This work is an early investigation into methods that differ from aggregation based approaches. Our experimental results suggest that it might be important to explore other alternatives to aggregation methods for heterophilic setting. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: Accepted at Deep Learning on Graphs: Method and Applications (DLG-KDD 2021)

arXiv:2106.11716 [pdf, ps, other]

doi 10.1016/j.engappai.2022.104717

Robust EMRAN-aided Coupled Controller for Autonomous Vehicles

Authors: Sauranil Debarshi, Suresh Sundaram, Narasimhan Sundararajan

Abstract: This paper presents a coupled, neural network-aided longitudinal cruise and lateral path-tracking controller for an autonomous vehicle with model uncertainties and experiencing unknown external disturbances. Using a feedback error learning mechanism, an inverse vehicle dynamics learning scheme utilizing an adaptive Radial Basis Function (RBF) neural network, referred to as the Extended Minimal Res… ▽ More This paper presents a coupled, neural network-aided longitudinal cruise and lateral path-tracking controller for an autonomous vehicle with model uncertainties and experiencing unknown external disturbances. Using a feedback error learning mechanism, an inverse vehicle dynamics learning scheme utilizing an adaptive Radial Basis Function (RBF) neural network, referred to as the Extended Minimal Resource Allocating Network (EMRAN) is employed. EMRAN uses an extended Kalman filter for online learning and weight updates, and also incorporates a growing/pruning strategy for maintaining a compact network for easier real-time implementation. The online learning algorithm handles the parametric uncertainties and eliminates the effect of unknown disturbances on the road. Combined with a self-regulating learning scheme for improving generalization performance, the proposed EMRAN-aided control architecture aids a basic PID cruise and Stanley path-tracking controllers in a coupled form. Its performance and robustness to various disturbances and uncertainties are compared with the conventional PID and Stanley controllers, along with a comparison with a fuzzy-based PID controller and an active disturbance rejection control (ADRC) scheme. Simulation results are presented for both slow and high speed scenarios. The root mean square (RMS) and maximum tracking errors clearly indicate the effectiveness of the proposed control scheme in achieving better tracking performance in autonomous vehicles under unknown environments. △ Less

Submitted 8 January, 2022; v1 submitted 22 June, 2021; originally announced June 2021.

Report number: Engineering Applications of Artificial Intelligence, vol. 110, p. 104717

arXiv:2106.05792 [pdf, other]

Speaker-conversation factorial designs for diarization error analysis

Authors: Scott Seyfarth, Sundararajan Srinivasan, Katrin Kirchhoff

Abstract: Speaker diarization accuracy can be affected by both acoustics and conversation characteristics. Determining the cause of diarization errors is difficult because speaker voice acoustics and conversation structure co-vary, and the interactions between acoustics, conversational structure, and diarization accuracy are complex. This paper proposes a methodology that can distinguish independent margina… ▽ More Speaker diarization accuracy can be affected by both acoustics and conversation characteristics. Determining the cause of diarization errors is difficult because speaker voice acoustics and conversation structure co-vary, and the interactions between acoustics, conversational structure, and diarization accuracy are complex. This paper proposes a methodology that can distinguish independent marginal effects of acoustic and conversation characteristics on diarization accuracy by remixing conversations in a factorial design. As an illustration, this approach is used to investigate gender-related and language-related accuracy differences with three diarization systems: a baseline system using subsegment x-vector clustering, a variant of it with shorter subsegments, and a third system based on a Bayesian hidden Markov model. Our analysis shows large accuracy disparities for the baseline system primarily due to conversational structure, which are partially mitigated in the other two systems. The illustration thus demonstrates how the methodology can be used to identify and guide diarization model improvements. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: 5 pages, 2 figures, Interspeech 2021

arXiv:2106.01977 [pdf, other]

Safe RAN control: A Symbolic Reinforcement Learning Approach

Authors: Alexandros Nikou, Anusha Mujumdar, Vaishnavi Sundararajan, Marin Orlic, Aneta Vulgarakis Feljan

Abstract: In this paper, we present a Symbolic Reinforcement Learning (SRL) based architecture for safety control of Radio Access Network (RAN) applications. In particular, we provide a purely automated procedure in which a user can specify high-level logical safety specifications for a given cellular network topology in order for the latter to execute optimal safe performance which is measured through cert… ▽ More In this paper, we present a Symbolic Reinforcement Learning (SRL) based architecture for safety control of Radio Access Network (RAN) applications. In particular, we provide a purely automated procedure in which a user can specify high-level logical safety specifications for a given cellular network topology in order for the latter to execute optimal safe performance which is measured through certain Key Performance Indicators (KPIs). The network consists of a set of fixed Base Stations (BS) which are equipped with antennas, which one can control by adjusting their vertical tilt angle. The aforementioned process is called Remote Electrical Tilt (RET) optimization. Recent research has focused on performing this RET optimization by employing Reinforcement Learning (RL) strategies due to the fact that they have self-learning capabilities to adapt in uncertain environments. The term safety refers to particular constraints bounds of the network KPIs in order to guarantee that when the algorithms are deployed in a live network, the performance is maintained. In our proposed architecture the safety is ensured through model-checking techniques over combined discrete system models (automata) that are abstracted through the learning process. We introduce a user interface (UI) developed to help a user set intent specifications to the system, and inspect the difference in agent proposed actions, and those that are allowed and blocked according to the safety specification. △ Less

Submitted 25 April, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: To appear in International Conference of Control and Automation (ICCA) 2022

arXiv:2105.14418 [pdf, other]

Virtual element approximation of two-dimensional parabolic variational inequalities

Authors: Dibyendu Adak, Gianmarco Manzini, Sundararajan Natarajan

Abstract: We design a virtual element method for the numerical treatment of the two-dimensional parabolic variational inequality problem on unstructured polygonal meshes. Due to the expected low regularity of the exact solution, the virtual element method is based on the lowest-order virtual element space that contains the subspace of the linear polynomials defined on each element. The connection between th… ▽ More We design a virtual element method for the numerical treatment of the two-dimensional parabolic variational inequality problem on unstructured polygonal meshes. Due to the expected low regularity of the exact solution, the virtual element method is based on the lowest-order virtual element space that contains the subspace of the linear polynomials defined on each element. The connection between the nonnegativity of the virtual element functions and the nonnegativity of the degrees of freedom, i.e., the values at the mesh vertices, is established by applying the Maximum and Minimum Principle Theorem. The mass matrix is computed through an approximate L 2 polynomial projection, whose properties are carefully investigated in the paper. We prove the well-posedness of the resulting scheme in two different ways that reveal the contractive nature of the VEM and its connection with the minimization of quadratic functionals. The convergence analysis requires the existence of a nonnegative quasi-interpolation operator, whose construction is also discussed in the paper. The variational crime introduced by the virtual element setting produces five error terms that we control by estimating a suitable upper bound. Numerical experiments confirm the theoretical convergence rate for the refinement in space and time on three different mesh families including distorted squares, nonconvex elements, and Voronoi tesselations. △ Less

Submitted 29 May, 2021; originally announced May 2021.

Comments: 33 pages, 3 figures

MSC Class: 65M12; 65M60

Showing 1–50 of 195 results for author: Sundararajan