Search | arXiv e-print repository

arXiv:2405.20735 [pdf, other]

Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images

Authors: Mansi Kakkar, Dattesh Shanbhag, Chandan Aladahalli, Gurunath Reddy M

Abstract: Vision-language models have emerged as a powerful tool for previously challenging multi-modal classification problem in the medical domain. This development has led to the exploration of automated image description generation for multi-modal clinical scans, particularly for radiology report generation. Existing research has focused on clinical descriptions for specific modalities or body regions,… ▽ More Vision-language models have emerged as a powerful tool for previously challenging multi-modal classification problem in the medical domain. This development has led to the exploration of automated image description generation for multi-modal clinical scans, particularly for radiology report generation. Existing research has focused on clinical descriptions for specific modalities or body regions, leaving a gap for a model providing entire-body multi-modal descriptions. In this paper, we address this gap by automating the generation of standardized body station(s) and list of organ(s) across the whole body in multi-modal MR and CT radiological images. Leveraging the versatility of the Contrastive Language-Image Pre-training (CLIP), we refine and augment the existing approach through multiple experiments, including baseline model fine-tuning, adding station(s) as a superset for better correlation between organs, along with image and language augmentations. Our proposed approach demonstrates 47.6% performance improvement over baseline PubMedCLIP. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: $©$ 2024 IEEE. Accepted in 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2024

arXiv:2405.12018 [pdf, other]

Continuous Sign Language Recognition with Adapted Conformer via Unsupervised Pretraining

Authors: Neena Aloysius, Geetha M, Prema Nedungadi

Abstract: Conventional Deep Learning frameworks for continuous sign language recognition (CSLR) are comprised of a single or multi-modal feature extractor, a sequence-learning module, and a decoder for outputting the glosses. The sequence learning module is a crucial part wherein transformers have demonstrated their efficacy in the sequence-to-sequence tasks. Analyzing the research progress in the field of… ▽ More Conventional Deep Learning frameworks for continuous sign language recognition (CSLR) are comprised of a single or multi-modal feature extractor, a sequence-learning module, and a decoder for outputting the glosses. The sequence learning module is a crucial part wherein transformers have demonstrated their efficacy in the sequence-to-sequence tasks. Analyzing the research progress in the field of Natural Language Processing and Speech Recognition, a rapid introduction of various transformer variants is observed. However, in the realm of sign language, experimentation in the sequence learning component is limited. In this work, the state-of-the-art Conformer model for Speech Recognition is adapted for CSLR and the proposed model is termed ConSignformer. This marks the first instance of employing Conformer for a vision-based task. ConSignformer has bimodal pipeline of CNN as feature extractor and Conformer for sequence learning. For improved context learning we also introduce Cross-Modal Relative Attention (CMRA). By incorporating CMRA into the model, it becomes more adept at learning and utilizing complex relationships within the data. To further enhance the Conformer model, unsupervised pretraining called Regressional Feature Extraction is conducted on a curated sign language dataset. The pretrained Conformer is then fine-tuned for the downstream recognition task. The experimental results confirm the effectiveness of the adopted pretraining strategy and demonstrate how CMRA contributes to the recognition process. Remarkably, leveraging a Conformer-based backbone, our model achieves state-of-the-art performance on the benchmark datasets: PHOENIX-2014 and PHOENIX-2014T. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2401.13891 [pdf]

Text to speech synthesis

Authors: Harini s, Manoj G M

Abstract: Text-to-speech (TTS) synthesis is a technology that converts written text into spoken words, enabling a natural and accessible means of communication. This abstract explores the key aspects of TTS synthesis, encompassing its underlying technologies, applications, and implications for various sectors. The technology utilizes advanced algorithms and linguistic models to convert textual information i… ▽ More Text-to-speech (TTS) synthesis is a technology that converts written text into spoken words, enabling a natural and accessible means of communication. This abstract explores the key aspects of TTS synthesis, encompassing its underlying technologies, applications, and implications for various sectors. The technology utilizes advanced algorithms and linguistic models to convert textual information into life like speech, allowing for enhanced user experiences in diverse contexts such as accessibility tools, navigation systems, and virtual assistants. The abstract delves into the challenges and advancements in TTS synthesis, including considerations for naturalness, multilingual support, and emotional expression in synthesized speech. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.02541 [pdf]

Autonomous Multi-Rotor UAVs: A Holistic Approach to Design, Optimization, and Fabrication

Authors: Aniruth A, Chirag Satpathy, Jothika K, Nitteesh M, Gokulraj M, Venkatram K, Harshith G, Shristi S, Anushka Vani, Jonathan Spurgeon

Abstract: Unmanned Aerial Vehicles (UAVs) have become pivotal in domains spanning military, agriculture, surveillance, and logistics, revolutionizing data collection and environmental interaction. With the advancement in drone technology, there is a compelling need to develop a holistic methodology for designing UAVs. This research focuses on establishing a procedure encompassing conceptual design, use of c… ▽ More Unmanned Aerial Vehicles (UAVs) have become pivotal in domains spanning military, agriculture, surveillance, and logistics, revolutionizing data collection and environmental interaction. With the advancement in drone technology, there is a compelling need to develop a holistic methodology for designing UAVs. This research focuses on establishing a procedure encompassing conceptual design, use of composite materials, weight optimization, stability analysis, avionics integration, advanced manufacturing, and incorporation of autonomous payload delivery through object detection models tailored to satisfy specific applications while maintaining cost efficiency. The study conducts a comparative assessment of potential composite materials and various quadcopter frame configurations. The novel features include a payload-drop** mechanism, a unibody arm fixture, and the utilization of carbon-fibre-balsa composites. A quadcopter is designed and analyzed using the proposed methodology, followed by its fabrication using additive manufacturing and vacuum bagging techniques. A computer vision-based deep learning model enables precise delivery of payloads by autonomously detecting targets. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2310.18642 [pdf]

One-shot Localization and Segmentation of Medical Images with Foundation Models

Authors: Deepa Anand, Gurunath Reddy M, Vanika Singhal, Dattesh D. Shanbhag, Shriram KS, Uday Patil, Chitresh Bhushan, Kavitha Manickam, Dawei Gui, Rakesh Mullick, Avinash Gopal, Parminder Bhatia, Taha Kass-Hout

Abstract: Recent advances in Vision Transformers (ViT) and Stable Diffusion (SD) models with their ability to capture rich semantic features of the image have been used for image correspondence tasks on natural images. In this paper, we examine the ability of a variety of pre-trained ViT (DINO, DINOv2, SAM, CLIP) and SD models, trained exclusively on natural images, for solving the correspondence problems o… ▽ More Recent advances in Vision Transformers (ViT) and Stable Diffusion (SD) models with their ability to capture rich semantic features of the image have been used for image correspondence tasks on natural images. In this paper, we examine the ability of a variety of pre-trained ViT (DINO, DINOv2, SAM, CLIP) and SD models, trained exclusively on natural images, for solving the correspondence problems on medical images. While many works have made a case for in-domain training, we show that the models trained on natural images can offer good performance on medical images across different modalities (CT,MR,Ultrasound) sourced from various manufacturers, over multiple anatomical regions (brain, thorax, abdomen, extremities), and on wide variety of tasks. Further, we leverage the correspondence with respect to a template image to prompt a Segment Anything (SAM) model to arrive at single shot segmentation, achieving dice range of 62%-90% across tasks, using just one image as reference. We also show that our single-shot method outperforms the recently proposed few-shot segmentation method - UniverSeg (Dice range 47%-80%) on most of the semantic segmentation tasks(six out of seven) across medical imaging modalities. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: Accepted at NeurIPS 2023 R0-FoMo Workshop

arXiv:2301.10015 [pdf, other]

Deep Attention-Based Alignment Network for Melody Generation from Incomplete Lyrics

Authors: Gurunath Reddy M, Zhe Zhang, Yi Yu, Florian Harscoet, Simon Canales, Suhua Tang

Abstract: We propose a deep attention-based alignment network, which aims to automatically predict lyrics and melody with given incomplete lyrics as input in a way similar to the music creation of humans. Most importantly, a deep neural lyrics-to-melody net is trained in an encoder-decoder way to predict possible pairs of lyrics-melody when given incomplete lyrics (few keywords). The attention mechanism is… ▽ More We propose a deep attention-based alignment network, which aims to automatically predict lyrics and melody with given incomplete lyrics as input in a way similar to the music creation of humans. Most importantly, a deep neural lyrics-to-melody net is trained in an encoder-decoder way to predict possible pairs of lyrics-melody when given incomplete lyrics (few keywords). The attention mechanism is exploited to align the predicted lyrics with the melody during the lyrics-to-melody generation. The qualitative and quantitative evaluation metrics reveal that the proposed method is indeed capable of generating proper lyrics and corresponding melody for composing new songs given a piece of incomplete seed lyrics. △ Less

Submitted 22 January, 2023; originally announced January 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2011.06380

arXiv:2209.15186 [pdf, other]

Leveraging Probabilistic Switching in Superparamagnets for Temporal Information Encoding in Neuromorphic Systems

Authors: Kezhou Yang, Dhuruva Priyan G M, Abhronil Sengupta

Abstract: Brain-inspired computing - leveraging neuroscientific principles underpinning the unparalleled efficiency of the brain in solving cognitive tasks - is emerging to be a promising pathway to solve several algorithmic and computational challenges faced by deep learning today. Nonetheless, current research in neuromorphic computing is driven by our well-developed notions of running deep learning algor… ▽ More Brain-inspired computing - leveraging neuroscientific principles underpinning the unparalleled efficiency of the brain in solving cognitive tasks - is emerging to be a promising pathway to solve several algorithmic and computational challenges faced by deep learning today. Nonetheless, current research in neuromorphic computing is driven by our well-developed notions of running deep learning algorithms on computing platforms that perform deterministic operations. In this article, we argue that taking a different route of performing temporal information encoding in probabilistic neuromorphic systems may help solve some of the current challenges in the field. The article considers superparamagnetic tunnel junctions as a potential pathway to enable a new generation of brain-inspired computing that combines the facets and associated advantages of two complementary insights from computational neuroscience -- how information is encoded and how computing occurs in the brain. Hardware-algorithm co-design analysis demonstrates $97.41\%$ accuracy of a state-compressed 3-layer spintronics enabled stochastic spiking network on the MNIST dataset with high spiking sparsity due to temporal information encoding. △ Less

Submitted 11 January, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

arXiv:2209.03149 [pdf, other]

MultiViz: A Gephi Plugin for Scalable Visualization of Multi-Layer Networks

Authors: Jayamohan Pillai C. S., Ayan Chatterjee, Geetha M., Amitava Mukherjee

Abstract: The process of visually presenting networks is an effective way to understand entity relationships within the networks since it reveals the overall structure and topology of the network. Real networks are extremely difficult to visualize due to their immense complexity, which includes vast amounts of data, several types of interactions, various subsystems and several levels of connectivity as well… ▽ More The process of visually presenting networks is an effective way to understand entity relationships within the networks since it reveals the overall structure and topology of the network. Real networks are extremely difficult to visualize due to their immense complexity, which includes vast amounts of data, several types of interactions, various subsystems and several levels of connectivity as well as changes over time. This paper introduces the "MultiViz Plugin," a plugin for gephi, an open-source software tool for graph visualization and modification, in order to to visualize complex networks in a multi-layer manner. A collection of settings are availabe through the plugin to transform an existing network into a multi-layered network. The plugin supports several layout algorithms and lets user to choose which property of the network to be used as the layer. The goal of the study is to give the user complete control over how the network is visualized in a multi-layer fashion. We demonstrate the ability of the plugin to visualize multi-layer data using a real-life complex multi-layer datasets. △ Less

Submitted 6 September, 2022; originally announced September 2022.

arXiv:2206.07910 [pdf, ps, other]

Introducing the Huber mechanism for differentially private low-rank matrix completion

Authors: R Adithya Gowtham, Gokularam M, Thulasi Tholeti, Sheetal Kalyani

Abstract: Performing low-rank matrix completion with sensitive user data calls for privacy-preserving approaches. In this work, we propose a novel noise addition mechanism for preserving differential privacy where the noise distribution is inspired by Huber loss, a well-known loss function in robust statistics. The proposed Huber mechanism is evaluated against existing differential privacy mechanisms while… ▽ More Performing low-rank matrix completion with sensitive user data calls for privacy-preserving approaches. In this work, we propose a novel noise addition mechanism for preserving differential privacy where the noise distribution is inspired by Huber loss, a well-known loss function in robust statistics. The proposed Huber mechanism is evaluated against existing differential privacy mechanisms while solving the matrix completion problem using the Alternating Least Squares approach. We also propose using the Iteratively Re-Weighted Least Squares algorithm to complete low-rank matrices and study the performance of different noise mechanisms in both synthetic and real datasets. We prove that the proposed mechanism achieves ε-differential privacy similar to the Laplace mechanism. Furthermore, empirical results indicate that the Huber mechanism outperforms Laplacian and Gaussian in some cases and is comparable, otherwise. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: 13 pages

arXiv:2202.01078 [pdf, other]

Melody Extraction from Polyphonic Music by Deep Learning Approaches: A Review

Authors: Gurunath Reddy M, K. Sreenivasa Rao, Partha Pratim Das

Abstract: Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment wit… ▽ More Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment with the vocals makes extracting the melody from the mixture signal much more challenging. Until recently, classical signal processing-based melody extraction methods were quite popular among melody extraction researchers. The ability of the deep learning models to model large-scale data and the ability of the models to learn automatic features by exploiting spatial and temporal dependencies inspired many researchers to adopt deep learning models for melody extraction. In this paper, an attempt has been made to review the up-to-date data-driven deep learning approaches for melody extraction from polyphonic music. The available deep models have been categorized based on the type of neural network used and the output representation they use for predicting melody. Further, the architectures of the 25 melody extraction models are briefly presented. The loss functions used to optimize the model parameters of the melody extraction models are broadly categorized into four categories and briefly describe the loss functions used by various melody extraction models. Also, the various input representations adopted by the melody extraction models and the parameter settings are deeply described. A section describing the explainability of the block-box melody extraction deep neural networks is included. The performance of 25 melody extraction methods is compared. The possible future directions to explore/improve the melody extraction methods are also presented in the paper. △ Less

Submitted 2 February, 2022; originally announced February 2022.

Comments: 72 pages

arXiv:2106.06980 [pdf]

An Approach Towards Physics Informed Lung Ultrasound Image Scoring Neural Network for Diagnostic Assistance in COVID-19

Authors: Mahesh Raveendranatha Panicker, Yale Tung Chen, Gayathri M, Madhavanunni A N, Kiran Vishnu Narayan, C Kesavadas, A P Vinod

Abstract: Ultrasound is fast becoming an inevitable diagnostic tool for regular and continuous monitoring of the lung with the recent outbreak of COVID-19. In this work, a novel approach is presented to extract acoustic propagation-based features to automatically highlight the region below pleura, which is an important landmark in lung ultrasound (LUS). Subsequently, a multichannel input formed by using the… ▽ More Ultrasound is fast becoming an inevitable diagnostic tool for regular and continuous monitoring of the lung with the recent outbreak of COVID-19. In this work, a novel approach is presented to extract acoustic propagation-based features to automatically highlight the region below pleura, which is an important landmark in lung ultrasound (LUS). Subsequently, a multichannel input formed by using the acoustic physics-based feature maps is fused to train a neural network, referred to as LUSNet, to classify the LUS images into five classes of varying severity of lung infection to track the progression of COVID-19. In order to ensure that the proposed approach is agnostic to the type of acquisition, the LUSNet, which consists of a U-net architecture is trained in an unsupervised manner with the acoustic feature maps to ensure that the encoder-decoder architecture is learning features in the pleural region of interest. A novel combination of the U-net output and the U-net encoder output is employed for the classification of severity of infection in the lung. A detailed analysis of the proposed approach on LUS images over the infection to full recovery period of ten confirmed COVID-19 subjects shows an average five-fold cross-validation accuracy, sensitivity, and specificity of 97%, 93%, and 98% respectively over 5000 frames of COVID-19 videos. The analysis also shows that, when the input dataset is limited and diverse as in the case of COVID-19 pandemic, an aided effort of combining acoustic propagation-based features along with the gray scale images, as proposed in this work, improves the performance of the neural network significantly and also aids the labelling and triaging process. △ Less

Submitted 13 June, 2021; originally announced June 2021.

Comments: 8 pages, 8 figures, 3 tables, submitted to Springer SIVP Special Issue for COVID19

arXiv:2011.04297 [pdf, other]

Knowledge Distillation for Singing Voice Detection

Authors: Soumava Paul, Gurunath Reddy M, K Sreenivasa Rao, Partha Pratim Das

Abstract: Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for C… ▽ More Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for CNN and 65.7K for RNN) and hence not suitable for deployment on devices like smartphones or embedded sensors with limited capacity in terms of memory and computation power. The most popular method to address this issue is known as knowledge distillation in deep learning literature (in addition to model compression) where a large pre-trained network known as the teacher is used to train a smaller student network. Given the wide applications of SVD in music information retrieval, to the best of our knowledge, model compression for practical deployment has not yet been explored. In this paper, efforts have been made to investigate this issue using both conventional as well as ensemble knowledge distillation techniques. △ Less

Submitted 19 August, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: Accepted at INTERSPEECH 2021. 5 pages, 3 figures

arXiv:2010.06142 [pdf, other]

Hindsight Experience Replay with Kronecker Product Approximate Curvature

Authors: Dhuruva Priyan G M, Abhik Singla, Shalabh Bhatnagar

Abstract: Hindsight Experience Replay (HER) is one of the efficient algorithm to solve Reinforcement Learning tasks related to sparse rewarded environments.But due to its reduced sample efficiency and slower convergence HER fails to perform effectively. Natural gradients solves these challenges by converging the model parameters better. It avoids taking bad actions that collapse the training performance. Ho… ▽ More Hindsight Experience Replay (HER) is one of the efficient algorithm to solve Reinforcement Learning tasks related to sparse rewarded environments.But due to its reduced sample efficiency and slower convergence HER fails to perform effectively. Natural gradients solves these challenges by converging the model parameters better. It avoids taking bad actions that collapse the training performance. However updating parameters in neural networks requires expensive computation and thus increase in training time. Our proposed method solves the above mentioned challenges with better sample efficiency and faster convergence with increased success rate. A common failure mode for DDPG is that the learned Q-function begins to dramatically overestimate Q-values, which then leads to the policy breaking, because it exploits the errors in the Q-function. We solve this issue by including Twin Delayed Deep Deterministic Policy Gradients(TD3) in HER. TD3 learns two Q-functions instead of one and it adds noise tothe target action, to make it harder for the policy to exploit Q-function errors. The experiments are done with the help of OpenAis Mujoco environments. Results on these environments show that our algorithm (TDHER+KFAC) performs better inmost of the scenarios △ Less

Submitted 9 October, 2020; originally announced October 2020.

Comments: arXiv admin note: text overlap with arXiv:1708.05144 by other authors

arXiv:2008.00106 [pdf, other]

Utilising Visual Attention Cues for Vehicle Detection and Tracking

Authors: Feiyan Hu, Venkatesh G M, Noel E. O'Connor, Alan F. Smeaton, Suzanne Little

Abstract: Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision-based sensors are the closest way to emulate human driver visual behavior while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a \emph{subjectness} attention or saliency m… ▽ More Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision-based sensors are the closest way to emulate human driver visual behavior while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a \emph{subjectness} attention or saliency map and an \emph{objectness} attention map can facilitate region proposal generation in a 2-stage object detector; 2) How a visual attention map can be used for tracking multiple objects. We propose a neural network that can simultaneously detect objects as and generate objectness and subjectness maps to save computational power. We further exploit the visual attention map during tracking using a sequential Monte Carlo probability hypothesis density (PHD) filter. The experiments are conducted on KITTI and DETRAC datasets. The use of visual attention and hierarchical features has shown a considerable improvement of $\approx$8\% in object detection which effectively increased tracking performance by $\approx$4\% on KITTI dataset. △ Less

Submitted 31 July, 2020; originally announced August 2020.

Comments: Accepted in ICPR2020

arXiv:2006.00782 [pdf, other]

Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition

Authors: Sanket Shah, Basil Abraham, Gurunath Reddy M, Sunayana Sitaram, Vikas Joshi

Abstract: Recently, there has been significant progress made in Automatic Speech Recognition (ASR) of code-switched speech, leading to gains in accuracy on code-switched datasets in many language pairs. Code-switched speech co-occurs with monolingual speech in one or both languages being mixed. In this work, we show that fine-tuning ASR models on code-switched speech harms performance on monolingual speech.… ▽ More Recently, there has been significant progress made in Automatic Speech Recognition (ASR) of code-switched speech, leading to gains in accuracy on code-switched datasets in many language pairs. Code-switched speech co-occurs with monolingual speech in one or both languages being mixed. In this work, we show that fine-tuning ASR models on code-switched speech harms performance on monolingual speech. We point out the need to optimize models for code-switching while also ensuring that monolingual performance is not sacrificed. Monolingual models may be trained on thousands of hours of speech which may not be available for re-training a new model. We propose using the Learning Without Forgetting (LWF) framework for code-switched ASR when we only have access to a monolingual model and do not have the data it was trained on. We show that it is possible to train models using this framework that perform well on both code-switched and monolingual test sets. In cases where we have access to monolingual training data as well, we propose regularization strategies for fine-tuning models for code-switching without sacrificing monolingual accuracy. We report improvements in Word Error Rate (WER) in monolingual and code-switched test sets compared to baselines that use pooled data and simple fine-tuning. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: 5 pages (4 pages + 1 page references), 5 tables, 1 figure, 1 algorithm, 16 references

arXiv:2003.12017 [pdf]

Prediction of number of cases expected and estimation of the final size of coronavirus epidemic in India using the logistic model and genetic algorithm

Authors: Ganesh Kumar M, Soman K. P, Gopalakrishnan E. A, Vijay Krishna Menon, Sowmya V

Abstract: In this paper, we have applied the logistic growth regression model and genetic algorithm to predict the number of coronavirus infected cases that can be expected in upcoming days in India and also estimated the final size and its peak time of the coronavirus epidemic in India. In this paper, we have applied the logistic growth regression model and genetic algorithm to predict the number of coronavirus infected cases that can be expected in upcoming days in India and also estimated the final size and its peak time of the coronavirus epidemic in India. △ Less

Submitted 26 March, 2020; originally announced March 2020.

arXiv:1909.04406 [pdf, ps, other]

doi 10.1109/TSP.2020.3018665

Subspace clustering without knowing the number of clusters: A parameter free approach

Authors: Vishnu Menon, Gokularam M, Sheetal Kalyani

Abstract: Subspace clustering, the task of clustering high dimensional data when the data points come from a union of subspaces is one of the fundamental tasks in unsupervised machine learning. Most of the existing algorithms for this task require prior knowledge of the number of clusters along with few additional parameters which need to be set or tuned apriori according to the type of data to be clustered… ▽ More Subspace clustering, the task of clustering high dimensional data when the data points come from a union of subspaces is one of the fundamental tasks in unsupervised machine learning. Most of the existing algorithms for this task require prior knowledge of the number of clusters along with few additional parameters which need to be set or tuned apriori according to the type of data to be clustered. In this work, a parameter free method for subspace clustering is proposed, where the data points are clustered on the basis of the difference in statistical distribution of the angles subtended by the data points within a subspace and those by points belonging to different subspaces. Given an initial fine clustering, the proposed algorithm merges the clusters until a final clustering is obtained. This, unlike many existing methods, does not require the number of clusters apriori. Also, the proposed algorithm does not involve the use of an unknown parameter or tuning for one. %through cross validation. A parameter free method for producing a fine initial clustering is also discussed, making the whole process of subspace clustering parameter free. The comparison of proposed algorithm's performance with that of the existing state-of-the-art techniques in synthetic and real data sets, shows the significance of the proposed method. △ Less

Submitted 20 June, 2020; v1 submitted 10 September, 2019; originally announced September 2019.

arXiv:1905.09231 [pdf, other]

Separating Overlap** Tissue Layers from Microscopy Images

Authors: Zahra Montazeri, Gopi M

Abstract: Manual preparation of tissue slices for microscopy imaging can introduce tissue tears and overlaps. Typically, further digital processing algorithms such as registration and 3D reconstruction from tissue image stacks cannot handle images with tissue tear/overlap artifacts, and so such images are usually discarded. In this paper, we propose an imaging model and an algorithm to digitally separate ov… ▽ More Manual preparation of tissue slices for microscopy imaging can introduce tissue tears and overlaps. Typically, further digital processing algorithms such as registration and 3D reconstruction from tissue image stacks cannot handle images with tissue tear/overlap artifacts, and so such images are usually discarded. In this paper, we propose an imaging model and an algorithm to digitally separate overlap** tissue data of mouse brain images into two layers. We show the correctness of our model and the algorithm by comparing our results with the ground truth. △ Less

Submitted 22 May, 2019; originally announced May 2019.

arXiv:1904.09765 [pdf, other]

hf0: A hybrid pitch extraction method for multimodal voice

Authors: Pradeep Rengaswamy, Gurunath Reddy M, Krothapalli Sreenivasa Rao

Abstract: Pitch or fundamental frequency (f0) extraction is a fundamental problem studied extensively for its potential applications in speech and clinical applications. In literature, explicit mode specific (modal speech or singing voice or emotional/ expressive speech or noisy speech) signal processing and deep learning f0 extraction methods that exploit the quasi periodic nature of the signal in time, ha… ▽ More Pitch or fundamental frequency (f0) extraction is a fundamental problem studied extensively for its potential applications in speech and clinical applications. In literature, explicit mode specific (modal speech or singing voice or emotional/ expressive speech or noisy speech) signal processing and deep learning f0 extraction methods that exploit the quasi periodic nature of the signal in time, harmonic property in spectral or combined form to extract the pitch is developed. Hence, there is no single unified method which can reliably extract the pitch from various modes of the acoustic signal. In this work, we propose a hybrid f0 extraction method which seamlessly extracts the pitch across modes of speech production with very high accuracy required for many applications. The proposed hybrid model exploits the advantages of deep learning and signal processing methods to minimize the pitch detection error and adopts to various modes of acoustic signal. Specifically, we propose an ordinal regression convolutional neural networks to map the periodicity rich input representation to obtain the nominal pitch classes which drastically reduces the number of classes required for pitch detection unlike other deep learning approaches. Further, the accurate f0 is estimated from the nominal pitch class labels by filtering and autocorrelation. We show that the proposed method generalizes to the unseen modes of voice production and various noises for large scale datasets. Also, the proposed hybrid model significantly reduces the learning parameters required to train the deep model compared to other methods. Furthermore,the evaluation measures showed that the proposed method is significantly better than the state-of-the-art signal processing and deep learning approaches. △ Less

Submitted 22 April, 2019; originally announced April 2019.

Comments: Pitch Extraction, F0 extraction, harmonic signals, speech, monophonic songs, Convolutional Neural Network, 5 pages, 5 figures

arXiv:1811.09956 [pdf, other]

Glottal Closure Instants Detection From Pathological Acoustic Speech Signal Using Deep Learning

Authors: Gurunath Reddy M, Tanumay Mandal, Krothapalli Sreenivasa Rao

Abstract: In this paper, we propose a classification based glottal closure instants (GCI) detection from pathological acoustic speech signal, which finds many applications in vocal disorder analysis. Till date, GCI for pathological disorder is extracted from laryngeal (glottal source) signal recorded from Electroglottograph, a dedicated device designed to measure the vocal folds vibration around the larynx.… ▽ More In this paper, we propose a classification based glottal closure instants (GCI) detection from pathological acoustic speech signal, which finds many applications in vocal disorder analysis. Till date, GCI for pathological disorder is extracted from laryngeal (glottal source) signal recorded from Electroglottograph, a dedicated device designed to measure the vocal folds vibration around the larynx. We have created a pathological dataset which consists of simultaneous recordings of glottal source and acoustic speech signal of six different disorders from vocal disordered patients. The GCI locations are manually annotated for disorder analysis and supervised learning. We have proposed convolutional neural network based GCI detection method by fusing deep acoustic speech and linear prediction residual features for robust GCI detection. The experimental results showed that the proposed method is significantly better than the state-of-the-art GCI detection methods. △ Less

Submitted 25 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/39

arXiv:1501.01364 [pdf]

Leader Follower Formation Control of Ground Vehicles Using Camshift Based Guidance

Authors: S. M. Vaitheeswaran, Bharath M. K., Gokul M

Abstract: Autonomous ground vehicles have been designed for the purpose of that relies on ranging and bearing information received from forward looking camera on the Formation control . A visual guidance control algorithm is designed where real time image processing is used to provide feedback signals. The vision subsystem and control subsystem work in parallel to accomplish formation control. A proportiona… ▽ More Autonomous ground vehicles have been designed for the purpose of that relies on ranging and bearing information received from forward looking camera on the Formation control . A visual guidance control algorithm is designed where real time image processing is used to provide feedback signals. The vision subsystem and control subsystem work in parallel to accomplish formation control. A proportional navigation and line of sight guidance laws are used to estimate the range and bearing information from the leader vehicle using the vision subsystem. The algorithms for vision detection and localization used here are similar to approaches for many computer vision tasks such as face tracking and detection that are based color-and texture based features, and non-parametric Continuously Adaptive Mean-shift algorithms to keep track of the leader. This is being proposed for the first time in the leader follower framework. The algorithms are simple but effective for real time and provide an alternate approach to traditional based approaches like the Viola Jones algorithm. Further to stabilize the follower to the leader trajectory, the sliding mode controller is used to dynamically track the leader. The performance of the results is demonstrated in simulation and in practical experiments. △ Less

Submitted 6 January, 2015; originally announced January 2015.

arXiv:1410.7654 [pdf]

XML Information Retrieval:An overview

Authors: Suma D., U. Dinesh Acharya, Geetha M., Raviraja Holla M

Abstract: Locating and distilling the valuable relevant information continued to be the major challenges of Information Retrieval (IR) Systems owing to the explosive growth of online web information. These challenges can be considered the XML Information Retrieval challenges as XML has become a de facto standard over the Web. The research on XML IR starts with the classical IR strategies customized to XML I… ▽ More Locating and distilling the valuable relevant information continued to be the major challenges of Information Retrieval (IR) Systems owing to the explosive growth of online web information. These challenges can be considered the XML Information Retrieval challenges as XML has become a de facto standard over the Web. The research on XML IR starts with the classical IR strategies customized to XML IR. Later novel IR strategies specific to XML IR are evolved. Meanwhile literatures reveal development of the rapid and intelligent IR systems. Despite their success in their specified constrained domains, they have additional limitations in the complex information space. The effectiveness of IR systems is thus unsolved in satisfying the most. This article attemptsan overview of earlier efforts and the gaps in XML IR. △ Less

Submitted 27 October, 2014; originally announced October 2014.

Comments: 7 pages, 0 figures

Journal ref: International Global Journal For Engineering Research, Volume 10 Issue 1, 2014 pg. 26-32

arXiv:1312.3787 [pdf]

Analysis and Understanding of Various Models for Efficient Representation and Accurate Recognition of Human Faces

Authors: Dharini S., Guru Prasad M., Hari haran. V., Kiran Tej J. L., Kunal Ghosh

Abstract: In this paper we have tried to compare the various face recognition models against their classical problems. We look at the methods followed by these approaches and evaluate to what extent they are able to solve the problems. All methods proposed have some drawbacks under certain conditions. To overcome these drawbacks we propose a multi-model approach In this paper we have tried to compare the various face recognition models against their classical problems. We look at the methods followed by these approaches and evaluate to what extent they are able to solve the problems. All methods proposed have some drawbacks under certain conditions. To overcome these drawbacks we propose a multi-model approach △ Less

Submitted 14 February, 2015; v1 submitted 13 December, 2013; originally announced December 2013.

Comments: Proceedings of National Conference on "Emerging Trends in IT" - eit10, March 2010

arXiv:1305.3213 [pdf]

The Product Promotion and Consumer Retention Gap in Online Shop**

Authors: Senthur Balan S, Sowmyan Jegatheesan, Sakthi Ganesh M

Abstract: As the number of online shop** websites increases day by day, so are the online advertisement strategies and promotional techniques. The number of people who uses internet keeps on increasing daily and it has become a vast marketplace to promote products, surely it will be a prime reason to drive any companies growth in the future.This paper primarily focuses on the areas on which online shoppin… ▽ More As the number of online shop** websites increases day by day, so are the online advertisement strategies and promotional techniques. The number of people who uses internet keeps on increasing daily and it has become a vast marketplace to promote products, surely it will be a prime reason to drive any companies growth in the future.This paper primarily focuses on the areas on which online shop** lags product promotion and customer retention. Sellers must concentrate on the areas in which online marketing lags product promotion techniques; also they should introduce new strategies to increase their market share to gain customers attention towards their products. △ Less

Submitted 14 May, 2013; originally announced May 2013.

Comments: 4 Pages,1 Table, 2012 4th International Conference on Electronics Computer Technology (ICECT 2012) 978-1-4673-1850-1/12 2012 IEEE Page 158-161

Showing 1–24 of 24 results for author: M, G