Search | arXiv e-print repository

arXiv:2405.20735 [pdf, other]

Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images

Authors: Mansi Kakkar, Dattesh Shanbhag, Chandan Aladahalli, Gurunath Reddy M

Abstract: Vision-language models have emerged as a powerful tool for previously challenging multi-modal classification problem in the medical domain. This development has led to the exploration of automated image description generation for multi-modal clinical scans, particularly for radiology report generation. Existing research has focused on clinical descriptions for specific modalities or body regions,… ▽ More Vision-language models have emerged as a powerful tool for previously challenging multi-modal classification problem in the medical domain. This development has led to the exploration of automated image description generation for multi-modal clinical scans, particularly for radiology report generation. Existing research has focused on clinical descriptions for specific modalities or body regions, leaving a gap for a model providing entire-body multi-modal descriptions. In this paper, we address this gap by automating the generation of standardized body station(s) and list of organ(s) across the whole body in multi-modal MR and CT radiological images. Leveraging the versatility of the Contrastive Language-Image Pre-training (CLIP), we refine and augment the existing approach through multiple experiments, including baseline model fine-tuning, adding station(s) as a superset for better correlation between organs, along with image and language augmentations. Our proposed approach demonstrates 47.6% performance improvement over baseline PubMedCLIP. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: $©$ 2024 IEEE. Accepted in 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2024

arXiv:2402.14693 [pdf, ps, other]

Joint AP-UE Association and Power Factor Optimization for Distributed Massive MIMO

Authors: Mohd Saif Ali Khan, Samar Agnihotri, Karthik R. M

Abstract: The uplink sum-throughput of distributed massive multiple-input-multiple-output (mMIMO) networks depends majorly on Access point (AP)-User Equipment (UE) association and power control. The AP-UE association and power control both are important problems in their own right in distributed mMIMO networks to improve scalability and reduce front-haul load of the network, and to enhance the system perfor… ▽ More The uplink sum-throughput of distributed massive multiple-input-multiple-output (mMIMO) networks depends majorly on Access point (AP)-User Equipment (UE) association and power control. The AP-UE association and power control both are important problems in their own right in distributed mMIMO networks to improve scalability and reduce front-haul load of the network, and to enhance the system performance by mitigating the interference and boosting the desired signals, respectively. Unlike previous studies, which focused primarily on addressing these two problems separately, this work addresses the uplink sum-throughput maximization problem in distributed mMIMO networks by solving the joint AP-UE association and power control problem, while maintaining Quality-of-Service (QoS) requirements for each UE. To improve scalability, we present an l1-penalty function that delicately balances the trade-off between spectral efficiency (SE) and front-haul signaling load. Our proposed methodology leverages fractional programming, Lagrangian dual formation, and penalty functions to provide an elegant and effective iterative solution with guaranteed convergence. Extensive numerical simulations validate the efficacy of the proposed technique for maximizing sum-throughput while considering the joint AP-UE association and power control problem, demonstrating its superiority over approaches that address these problems individually. Furthermore, the results show that the introduced penalty function can help us effectively control the maximum front-haul load. △ Less

Submitted 28 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: Accepted at the IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) 2024

arXiv:2402.07912 [pdf, other]

Spatial Computing: Concept, Applications, Challenges and Future Directions

Authors: Gokul Yenduri, Ramalingam M, Praveen Kumar Reddy Maddikunta, Thippa Reddy Gadekallu, Rutvij H Jhaveri, Ajay Bandi, Junxin Chen, Wei Wang, Adarsh Arunkumar Shirawalmath, Raghav Ravishankar, Weizheng Wang

Abstract: Spatial computing is a technological advancement that facilitates the seamless integration of devices into the physical environment, resulting in a more natural and intuitive digital world user experience. Spatial computing has the potential to become a significant advancement in the field of computing. From GPS and location-based services to healthcare, spatial computing technologies have influen… ▽ More Spatial computing is a technological advancement that facilitates the seamless integration of devices into the physical environment, resulting in a more natural and intuitive digital world user experience. Spatial computing has the potential to become a significant advancement in the field of computing. From GPS and location-based services to healthcare, spatial computing technologies have influenced and improved our interactions with the digital world. The use of spatial computing in creating interactive digital environments has become increasingly popular and effective. This is explained by its increasing significance among researchers and industrial organisations, which motivated us to conduct this review. This review provides a detailed overview of spatial computing, including its enabling technologies and its impact on various applications. Projects related to spatial computing are also discussed. In this review, we also explored the potential challenges and limitations of spatial computing. Furthermore, we discuss potential solutions and future directions. Overall, this paper aims to provide a comprehensive understanding of spatial computing, its enabling technologies, their impact on various applications, emerging challenges, and potential solutions. △ Less

Submitted 30 January, 2024; originally announced February 2024.

Comments: Submitted to peer reviewe

arXiv:2401.05422 [pdf, ps, other]

Machine Learning (ML)-assisted Beam Management in millimeter (mm)Wave Distributed Multiple Input Multiple Output (D-MIMO) systems

Authors: Karthik R M, Dhiraj Nagaraja Hegde, Muris Sarajlic, Abhishek Sarkar

Abstract: Beam management (BM) protocols are critical for establishing and maintaining connectivity between network radio nodes and User Equipments (UEs). In Distributed Multiple Input Multiple Output systems (D-MIMO), a number of access points (APs), coordinated by a central processing unit (CPU), serves a number of UEs. At mmWave frequencies, the problem of finding the best AP and beam to serve the UEs is… ▽ More Beam management (BM) protocols are critical for establishing and maintaining connectivity between network radio nodes and User Equipments (UEs). In Distributed Multiple Input Multiple Output systems (D-MIMO), a number of access points (APs), coordinated by a central processing unit (CPU), serves a number of UEs. At mmWave frequencies, the problem of finding the best AP and beam to serve the UEs is challenging due to a large number of beams that need to be sounded with Downlink (DL) reference signals. The objective of this paper is to investigate whether the best AP/beam can be reliably inferred from sounding only a small subset of beams and leveraging AI/ML for inference of best beam/AP. We use Random Forest (RF), MissForest (MF) and conditional Generative Adversarial Networks (c-GAN) for demonstrating the performance benefits of inference. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2401.02472 [pdf, ps, other]

Code Generation for a Variety of Accelerators for a Graph DSL

Authors: Ashwina Kumar, M. Venkata Krishna, Prasanna Bartakke, Rahul Kumar, Rajesh Pandian M, Nibedita Behera, Rupesh Nasre

Abstract: Sparse graphs are ubiquitous in real and virtual worlds. With the phenomenal growth in semi-structured and unstructured data, sizes of the underlying graphs have witnessed a rapid growth over the years. Analyzing such large structures necessitates parallel processing, which is challenged by the intrinsic irregularity of sparse computation, memory access, and communication. It would be ideal if pro… ▽ More Sparse graphs are ubiquitous in real and virtual worlds. With the phenomenal growth in semi-structured and unstructured data, sizes of the underlying graphs have witnessed a rapid growth over the years. Analyzing such large structures necessitates parallel processing, which is challenged by the intrinsic irregularity of sparse computation, memory access, and communication. It would be ideal if programmers and domain-experts get to focus only on the sequential computation and a compiler takes care of auto-generating the parallel code. On the other side, there is a variety in the number of target hardware devices, and achieving optimal performance often demands coding in specific languages or frameworks. Our goal in this work is to focus on a graph DSL which allows the domain-experts to write almost-sequential code, and generate parallel code for different accelerators from the same algorithmic specification. In particular, we illustrate code generation from the StarPlat graph DSL for NVIDIA, AMD, and Intel GPUs using CUDA, OpenCL, SYCL, and OpenACC programming languages. Using a suite of ten large graphs and four popular algorithms, we present the efficacy of StarPlat's versatile code generator. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2305.03317

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.05609 [pdf, other]

Comprehensive Analysis of BB84, A Quantum Key Distribution Protocol

Authors: SujayKumar Reddy M, Chandra Mohan B

Abstract: Quantum Key Distribution (QKD) is a technique that enables secure communication between two parties by sharing a secret key. One of the most well-known QKD protocols is the BB84 protocol, proposed by Charles Bennett and Gilles Brassard in 1984. In this protocol, Alice and Bob use a quantum channel to exchange qubits, allowing them to generate a shared key that is resistant to eavesdrop**. This p… ▽ More Quantum Key Distribution (QKD) is a technique that enables secure communication between two parties by sharing a secret key. One of the most well-known QKD protocols is the BB84 protocol, proposed by Charles Bennett and Gilles Brassard in 1984. In this protocol, Alice and Bob use a quantum channel to exchange qubits, allowing them to generate a shared key that is resistant to eavesdrop**. This paper presents a comparative study of existing QKD schemes, including the BB84 protocol, and highlights the advancements made in the BB84 protocol over the years. The study aims to provide a comprehensive overview of the different QKD schemes and their strengths and weaknesses and demonstrate QKDs working principles through existing simulations and implementations. Through this study, we show that the BB84 protocol is a highly secure QKD scheme that has been extensively studied and implemented in various settings. Furthermore, we discuss the improvements made to the BB84 protocol to enhance its security and practicality, including the use of decoy states and advanced error correction techniques. Overall, this paper provides a comprehensive analysis of QKD schemes, focusing on the BB84 protocol in secure communication technologies. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: 16 pages, 17 figures

arXiv:2310.18642 [pdf]

One-shot Localization and Segmentation of Medical Images with Foundation Models

Authors: Deepa Anand, Gurunath Reddy M, Vanika Singhal, Dattesh D. Shanbhag, Shriram KS, Uday Patil, Chitresh Bhushan, Kavitha Manickam, Dawei Gui, Rakesh Mullick, Avinash Gopal, Parminder Bhatia, Taha Kass-Hout

Abstract: Recent advances in Vision Transformers (ViT) and Stable Diffusion (SD) models with their ability to capture rich semantic features of the image have been used for image correspondence tasks on natural images. In this paper, we examine the ability of a variety of pre-trained ViT (DINO, DINOv2, SAM, CLIP) and SD models, trained exclusively on natural images, for solving the correspondence problems o… ▽ More Recent advances in Vision Transformers (ViT) and Stable Diffusion (SD) models with their ability to capture rich semantic features of the image have been used for image correspondence tasks on natural images. In this paper, we examine the ability of a variety of pre-trained ViT (DINO, DINOv2, SAM, CLIP) and SD models, trained exclusively on natural images, for solving the correspondence problems on medical images. While many works have made a case for in-domain training, we show that the models trained on natural images can offer good performance on medical images across different modalities (CT,MR,Ultrasound) sourced from various manufacturers, over multiple anatomical regions (brain, thorax, abdomen, extremities), and on wide variety of tasks. Further, we leverage the correspondence with respect to a template image to prompt a Segment Anything (SAM) model to arrive at single shot segmentation, achieving dice range of 62%-90% across tasks, using just one image as reference. We also show that our single-shot method outperforms the recently proposed few-shot segmentation method - UniverSeg (Dice range 47%-80%) on most of the semantic segmentation tasks(six out of seven) across medical imaging modalities. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: Accepted at NeurIPS 2023 R0-FoMo Workshop

arXiv:2310.07264 [pdf, other]

Classification of Dysarthria based on the Levels of Severity. A Systematic Review

Authors: Afnan Al-Ali, Somaya Al-Maadeed, Moutaz Saleh, Rani Chinnappa Naidu, Zachariah C Alex, Prakash Ramachandran, Rajeev Khoodeeram, Rajesh Kumar M

Abstract: Dysarthria is a neurological speech disorder that can significantly impact affected individuals' communication abilities and overall quality of life. The accurate and objective classification of dysarthria and the determination of its severity are crucial for effective therapeutic intervention. While traditional assessments by speech-language pathologists (SLPs) are common, they are often subjecti… ▽ More Dysarthria is a neurological speech disorder that can significantly impact affected individuals' communication abilities and overall quality of life. The accurate and objective classification of dysarthria and the determination of its severity are crucial for effective therapeutic intervention. While traditional assessments by speech-language pathologists (SLPs) are common, they are often subjective, time-consuming, and can vary between practitioners. Emerging machine learning-based models have shown the potential to provide a more objective dysarthria assessment, enhancing diagnostic accuracy and reliability. This systematic review aims to comprehensively analyze current methodologies for classifying dysarthria based on severity levels. Specifically, this review will focus on determining the most effective set and type of features that can be used for automatic patient classification and evaluating the best AI techniques for this purpose. We will systematically review the literature on the automatic classification of dysarthria severity levels. Sources of information will include electronic databases and grey literature. Selection criteria will be established based on relevance to the research questions. Data extraction will include methodologies used, the type of features extracted for classification, and AI techniques employed. The findings of this systematic review will contribute to the current understanding of dysarthria classification, inform future research, and support the development of improved diagnostic tools. The implications of these findings could be significant in advancing patient care and improving therapeutic outcomes for individuals affected by dysarthria. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: no comments

arXiv:2309.15709 [pdf, ps, other]

Distributed Pilot Assignment for Distributed Massive-MIMO Networks

Authors: Mohd Saif Ali Khan, Samar Agnihotri, Karthik R. M

Abstract: Pilot contamination is a critical issue in distributed massive MIMO networks, where the reuse of pilot sequences due to limited availability of orthogonal pilots for channel estimation leads to performance degradation. In this work, we propose a novel distributed pilot assignment scheme to effectively mitigate the impact of pilot contamination. Our proposed scheme not only reduces signaling overhe… ▽ More Pilot contamination is a critical issue in distributed massive MIMO networks, where the reuse of pilot sequences due to limited availability of orthogonal pilots for channel estimation leads to performance degradation. In this work, we propose a novel distributed pilot assignment scheme to effectively mitigate the impact of pilot contamination. Our proposed scheme not only reduces signaling overhead, but it also enhances fault-tolerance. Extensive numerical simulations are conducted to evaluate the performance of the proposed scheme. Our results establish that the proposed scheme outperforms existing centralized and distributed schemes in terms of mitigating pilot contamination and significantly enhancing network throughput. △ Less

Submitted 1 July, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: Presented at the IEEE Wireless Communications and Networking Conference (WCNC) 2024

arXiv:2307.16745 [pdf, other]

Advancing Smart Malnutrition Monitoring: A Multi-Modal Learning Approach for Vital Health Parameter Estimation

Authors: Ashish Marisetty, Prathistith Raj M, Praneeth Nemani, Venkanna Udutalapally, Debanjan Das

Abstract: Malnutrition poses a significant threat to global health, resulting from an inadequate intake of essential nutrients that adversely impacts vital organs and overall bodily functioning. Periodic examinations and mass screenings, incorporating both conventional and non-invasive techniques, have been employed to combat this challenge. However, these approaches suffer from critical limitations, such a… ▽ More Malnutrition poses a significant threat to global health, resulting from an inadequate intake of essential nutrients that adversely impacts vital organs and overall bodily functioning. Periodic examinations and mass screenings, incorporating both conventional and non-invasive techniques, have been employed to combat this challenge. However, these approaches suffer from critical limitations, such as the need for additional equipment, lack of comprehensive feature representation, absence of suitable health indicators, and the unavailability of smartphone implementations for precise estimations of Body Fat Percentage (BFP), Basal Metabolic Rate (BMR), and Body Mass Index (BMI) to enable efficient smart-malnutrition monitoring. To address these constraints, this study presents a groundbreaking, scalable, and robust smart malnutrition-monitoring system that leverages a single full-body image of an individual to estimate height, weight, and other crucial health parameters within a multi-modal learning framework. Our proposed methodology involves the reconstruction of a highly precise 3D point cloud, from which 512-dimensional feature embeddings are extracted using a headless-3D classification network. Concurrently, facial and body embeddings are also extracted, and through the application of learnable parameters, these features are then utilized to estimate weight accurately. Furthermore, essential health metrics, including BMR, BFP, and BMI, are computed to conduct a comprehensive analysis of the subject's health, subsequently facilitating the provision of personalized nutrition plans. While being robust to a wide range of lighting conditions across multiple devices, our model achieves a low Mean Absolute Error (MAE) of $\pm$ 4.7 cm and $\pm$ 5.3 kg in estimating height and weight. △ Less

Submitted 31 July, 2023; originally announced July 2023.

arXiv:2307.06659 [pdf, other]

A Comprehensive Analysis of Blockchain Applications for Securing Computer Vision Systems

Authors: Ramalingam M, Chemmalar Selvi, Nancy Victor, Rajeswari Chengoden, Sweta Bhattacharya, Praveen Kumar Reddy Maddikunta, Duehee Lee, Md. Jalil Piran, Neelu Khare, Gokul Yendri, Thippa Reddy Gadekallu

Abstract: Blockchain (BC) and Computer Vision (CV) are the two emerging fields with the potential to transform various sectors.The ability of BC can help in offering decentralized and secure data storage, while CV allows machines to learn and understand visual data. This integration of the two technologies holds massive promise for develo** innovative applications that can provide solutions to the challen… ▽ More Blockchain (BC) and Computer Vision (CV) are the two emerging fields with the potential to transform various sectors.The ability of BC can help in offering decentralized and secure data storage, while CV allows machines to learn and understand visual data. This integration of the two technologies holds massive promise for develo** innovative applications that can provide solutions to the challenges in various sectors such as supply chain management, healthcare, smart cities, and defense. This review explores a comprehensive analysis of the integration of BC and CV by examining their combination and potential applications. It also provides a detailed analysis of the fundamental concepts of both technologies, highlighting their strengths and limitations. This paper also explores current research efforts that make use of the benefits offered by this combination. The effort includes how BC can be used as an added layer of security in CV systems and also ensure data integrity, enabling decentralized image and video analytics using BC. The challenges and open issues associated with this integration are also identified, and appropriate potential future directions are also proposed. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2305.18734 [pdf, other]

IcSDE+ -- An Indicator for Constrained Multi-Objective Optimization

Authors: Oladayo S. Ajani, Rammohan Mallipeddi, Sri Srinivasa Raju M

Abstract: The effectiveness of Constrained Multi-Objective Evolutionary Algorithms (CMOEAs) depends on their ability to reach the different feasible regions during evolution, by exploiting the information present in infeasible solutions, in addition to optimizing the several conflicting objectives. Over the years, researchers have proposed several CMOEAs to handle CMOPs. However, among the different CMOEAs… ▽ More The effectiveness of Constrained Multi-Objective Evolutionary Algorithms (CMOEAs) depends on their ability to reach the different feasible regions during evolution, by exploiting the information present in infeasible solutions, in addition to optimizing the several conflicting objectives. Over the years, researchers have proposed several CMOEAs to handle CMOPs. However, among the different CMOEAs proposed most of them are either decomposition-based or Pareto-based, with little focus on indicator-based CMOEAs. In literature, most indicator-based CMOEAs employ - a) traditional indicators used to solve unconstrained multi-objective problems to find the indicator values using objectives values and combine them with overall constraint violation to solve Constrained Multi-objective Optimization Problem (CMOP) as a single objective constraint problem, or b) consider each constraint or the overall constraint violation as objective(s) in addition to the actual objectives. In this paper, we propose an effective single-population indicator-based CMOEA referred to as IcSDE+ that can explore the different feasible regions in the search space. IcSDE+ is an (I)ndicator, that is an efficient fusion of constraint violation (c), shift-based density estimation (SDE) and sum of objectives (+). The performance of CMOEA with IcSDE+ is favorably compared against 9 state-of-the-art CMOEAs on 6 different benchmark suites with diverse characteristics △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: 13 pages, 2 main figues

arXiv:2305.10435 [pdf, other]

Generative Pre-trained Transformer: A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions

Authors: Gokul Yenduri, Ramalingam M, Chemmalar Selvi G, Supriya Y, Gautam Srivastava, Praveen Kumar Reddy Maddikunta, Deepti Raj G, Rutvij H Jhaveri, Prabadevi B, Weizheng Wang, Athanasios V. Vasilakos, Thippa Reddy Gadekallu

Abstract: The Generative Pre-trained Transformer (GPT) represents a notable breakthrough in the domain of natural language processing, which is propelling us toward the development of machines that can understand and communicate using language in a manner that closely resembles that of humans. GPT is based on the transformer architecture, a deep neural network designed for natural language processing tasks.… ▽ More The Generative Pre-trained Transformer (GPT) represents a notable breakthrough in the domain of natural language processing, which is propelling us toward the development of machines that can understand and communicate using language in a manner that closely resembles that of humans. GPT is based on the transformer architecture, a deep neural network designed for natural language processing tasks. Due to their impressive performance on natural language processing tasks and ability to effectively converse, GPT have gained significant popularity among researchers and industrial communities, making them one of the most widely used and effective models in natural language processing and related fields, which motivated to conduct this review. This review provides a detailed overview of the GPT, including its architecture, working process, training procedures, enabling technologies, and its impact on various applications. In this review, we also explored the potential challenges and limitations of a GPT. Furthermore, we discuss potential solutions and future directions. Overall, this paper aims to provide a comprehensive understanding of GPT, enabling technologies, their impact on various applications, emerging challenges, and potential solutions. △ Less

Submitted 21 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: Submitted to peer review

arXiv:2305.03317 [pdf, other]

StarPlat: A Versatile DSL for Graph Analytics

Authors: Nibedita Behera, Ashwina Kumar, Ebenezer Rajadurai T, Sai Nitish, Rajesh Pandian M, Rupesh Nasre

Abstract: Graphs model several real-world phenomena. With the growth of unstructured and semi-structured data, parallelization of graph algorithms is inevitable. Unfortunately, due to inherent irregularity of computation, memory access, and communication, graph algorithms are traditionally challenging to parallelize. To tame this challenge, several libraries, frameworks, and domain-specific languages (DSLs)… ▽ More Graphs model several real-world phenomena. With the growth of unstructured and semi-structured data, parallelization of graph algorithms is inevitable. Unfortunately, due to inherent irregularity of computation, memory access, and communication, graph algorithms are traditionally challenging to parallelize. To tame this challenge, several libraries, frameworks, and domain-specific languages (DSLs) have been proposed to reduce the parallel programming burden of the users, who are often domain experts. However, existing frameworks to model graph algorithms typically target a single architecture. In this paper, we present a graph DSL, named StarPlat, that allows programmers to specify graph algorithms in a high-level format, but generates code for three different backends from the same algorithmic specification. In particular, the DSL compiler generates OpenMP for multi-core, MPI for distributed, and CUDA for many-core GPUs. Since these three are completely different parallel programming paradigms, binding them together under the same language is challenging. We share our experience with the language design. Central to our compiler is an intermediate representation which allows a common representation of the high-level program, from which individual backend code generations begin. We demonstrate the expressiveness of StarPlat by specifying four graph algorithms: betweenness centrality computation, page rank computation, single-source shortest paths, and triangle counting. We illustrate the effectiveness of our approach by comparing the performance of the generated codes with that obtained with hand-crafted library codes. We find that the generated code is competitive to library-based codes in many cases. More importantly, we show the feasibility to generate efficient codes for different target architectures from the same algorithmic specification of graph algorithms. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: 30 pages, 21 figures

arXiv:2301.10015 [pdf, other]

Deep Attention-Based Alignment Network for Melody Generation from Incomplete Lyrics

Authors: Gurunath Reddy M, Zhe Zhang, Yi Yu, Florian Harscoet, Simon Canales, Suhua Tang

Abstract: We propose a deep attention-based alignment network, which aims to automatically predict lyrics and melody with given incomplete lyrics as input in a way similar to the music creation of humans. Most importantly, a deep neural lyrics-to-melody net is trained in an encoder-decoder way to predict possible pairs of lyrics-melody when given incomplete lyrics (few keywords). The attention mechanism is… ▽ More We propose a deep attention-based alignment network, which aims to automatically predict lyrics and melody with given incomplete lyrics as input in a way similar to the music creation of humans. Most importantly, a deep neural lyrics-to-melody net is trained in an encoder-decoder way to predict possible pairs of lyrics-melody when given incomplete lyrics (few keywords). The attention mechanism is exploited to align the predicted lyrics with the melody during the lyrics-to-melody generation. The qualitative and quantitative evaluation metrics reveal that the proposed method is indeed capable of generating proper lyrics and corresponding melody for composing new songs given a piece of incomplete seed lyrics. △ Less

Submitted 22 January, 2023; originally announced January 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2011.06380

arXiv:2211.01338 [pdf, other]

Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

Authors: Anusha Prakash, Arun Kumar, Ashish Seth, Bhagyashree Mukherjee, Ishika Gupta, Jom Kuriakose, Jordan Fernandes, K V Vikram, Mano Ranjith Kumar M, Metilda Sagaya Mary, Mohammad Wajahat, Mohana N, Mudit Batra, Navina K, Nihal John George, Nithya Ravi, Pruthwik Mishra, Sudhanshu Srivastava, Vasista Sai Lodagala, Vandan Mujadia, Kada Sai Venkata Vineeth, Vrunda Sukhadia, Dipti Sharma, Hema Murthy, Pushpak Bhattacharya , et al. (2 additional authors not shown)

Abstract: Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages… ▽ More Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages belong to different language families, resulting in differences in generated audio duration. This is further compounded by the original speaker's rhythm, especially for extempore speech. This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically. A prototype is developed for dubbing lectures into 9 Indian languages. A mean-opinion-score (MOS) is obtained for two languages, Hindi and Tamil, on two different courses. The output video is compared with the original video in terms of MOS (1-5) and lip synchronisation with scores of 4.09 and 3.74, respectively. The human effort also reduces by 75%. △ Less

Submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.08629 [pdf, other]

A Note On $\ell$-Rauzy Graphs for the Infinite Fibonacci Word

Authors: Rajavel Praveen M, Rama R

Abstract: The $\ell$-Rauzy graph of order $k$ for any infinite word is a directed graph in which an arc $(v_1,v_2)$ is formed if the concatenation of the word $v_1$ and the suffix of $v_2$ of length $k-\ell$ is a subword of the infinite word. In this paper, we consider one of the important aperiodic recurrent words, the infinite Fibonacci word for discussion. We prove a few basic properties of the $\ell$-Ra… ▽ More The $\ell$-Rauzy graph of order $k$ for any infinite word is a directed graph in which an arc $(v_1,v_2)$ is formed if the concatenation of the word $v_1$ and the suffix of $v_2$ of length $k-\ell$ is a subword of the infinite word. In this paper, we consider one of the important aperiodic recurrent words, the infinite Fibonacci word for discussion. We prove a few basic properties of the $\ell$-Rauzy graph of the infinite Fibonacci word. We also prove that the $\ell$-Rauzy graphs for the infinite Fibonacci word are strongly connected. △ Less

Submitted 27 October, 2022; v1 submitted 16 October, 2022; originally announced October 2022.

Comments: 10 pages, 4 figures

arXiv:2210.03948 [pdf, other]

Optimizing the Placement and Beamforming of RIS in Cellular Networks: A System-Level Modeling Perspective

Authors: Pavan Reddy M., SaiDhiraj Amuru, Kiran Kuchi

Abstract: In this letter, we present in detail the system-level modeling of reconfigurable intelligent surface (RIS)-assisted cellular systems by considering a 3-dimensional channel model between base station, RIS, and user. We prove that the optimal placement of RIS to achieve wider coverage is exactly opposite to the base station, under the constraint of single RIS in each sector. We propose a novel beamf… ▽ More In this letter, we present in detail the system-level modeling of reconfigurable intelligent surface (RIS)-assisted cellular systems by considering a 3-dimensional channel model between base station, RIS, and user. We prove that the optimal placement of RIS to achieve wider coverage is exactly opposite to the base station, under the constraint of single RIS in each sector. We propose a novel beamforming design for RIS-assisted cellular systems and derive the achievable sum rate in the presence of ideal, discrete, and random phase shifters at RIS. Through extensive system-level evaluations, we then show that the proposed beamforming design achieves significant improvements as compared to the state-of-the-art algorithms. △ Less

Submitted 2 May, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

arXiv:2210.03908 [pdf]

doi 10.14445/22315381/IJETT-V70I8P237

Variability Analysis of Isolated Intersections Through Case Study

Authors: Savithramma R M, R Sumathi, Sudhira H S

Abstract: Population and economic growth of urban areas have led to intensive use of private vehicles, thereby increasing traffic volume and congestion on roads. The traffic management in the city is a challenge for concerned authorities, and the signalized intersections are the primary interest of traffic management. Interpreting traffic patterns and current traffic signal operations can provide thorough i… ▽ More Population and economic growth of urban areas have led to intensive use of private vehicles, thereby increasing traffic volume and congestion on roads. The traffic management in the city is a challenge for concerned authorities, and the signalized intersections are the primary interest of traffic management. Interpreting traffic patterns and current traffic signal operations can provide thorough insights to take appropriate actions. In this view, a comprehensive study is conducted at selected intersections from Tumakuru (tier-2 city), Karnataka, India. Data estimates traffic parameters such as saturation flow, composition, volume, and volume-to-capacity ratio. The statistical results currently confirm the stable traffic condition but do not ensure sustainability. The volume-to-capacity ratio is greater than 0.73 along three major arterial roads of study intersections, indicating congestion in the future as the traffic volume is increasing gradually, as per the Directorate of Urban Land Use and Transportation, Government of Karnataka. The statistical results obtained through the current study uphold the report. The empirical results showed 40% of green time wastage at one of the study intersections, which results in additional waiting delays, thereby increasing fuel consumption and emissions. The overall service level of the study intersections is of class C based on computed delay and volume-to-capacity ratio. The study suggests possible treatments for improving the service level at the intersection operations and sustaining the city's stable traffic condition. The study supports city traffic management authorities in identifying suitable treatment and implementing accordingly. △ Less

Submitted 8 October, 2022; originally announced October 2022.

arXiv:2205.02645 [pdf, other]

Discovering stochastic dynamical equations from biological time series data

Authors: Arshed Nabeel, Ashwin Karichannavar, Shuaib Palathingal, Jitesh Jhawar, David B. Brückner, Danny Raj M., Vishwesha Guttal

Abstract: Stochastic differential equations (SDEs) are an important framework to model dynamics with randomness, as is common in most biological systems. The inverse problem of integrating these models with empirical data remains a major challenge. Here, we present an equation discovery methodology that takes time series data as an input, analyses fine scale fluctuations and outputs an interpretable SDE tha… ▽ More Stochastic differential equations (SDEs) are an important framework to model dynamics with randomness, as is common in most biological systems. The inverse problem of integrating these models with empirical data remains a major challenge. Here, we present an equation discovery methodology that takes time series data as an input, analyses fine scale fluctuations and outputs an interpretable SDE that can correctly capture long-time dynamics of data. We achieve this by combining traditional approaches from stochastic calculus literature with state-of-the-art equation discovery techniques. We validate our approach on synthetic datasets, and demonstrate the generality and applicability of the method on two real-world datasets of vastly different spatiotemporal scales: (i) collective movement of fish school where stochasticity plays a crucial role, and (ii) confined migration of a single cell, primarily following a relaxed oscillation. We make the method available as an easy-to-use, open-source Python package, PyDaddy (Python Library for Data Driven Dynamics). △ Less

Submitted 17 February, 2024; v1 submitted 5 May, 2022; originally announced May 2022.

Comments: Updates: v3: Significantly reorganized the paper and added a section analysis of a cell migration dataset. v4: Update arXiv title to match the updated title of the manuscript. v5: Added sections detailing the limitations of the approach

arXiv:2202.01078 [pdf, other]

Melody Extraction from Polyphonic Music by Deep Learning Approaches: A Review

Authors: Gurunath Reddy M, K. Sreenivasa Rao, Partha Pratim Das

Abstract: Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment wit… ▽ More Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment with the vocals makes extracting the melody from the mixture signal much more challenging. Until recently, classical signal processing-based melody extraction methods were quite popular among melody extraction researchers. The ability of the deep learning models to model large-scale data and the ability of the models to learn automatic features by exploiting spatial and temporal dependencies inspired many researchers to adopt deep learning models for melody extraction. In this paper, an attempt has been made to review the up-to-date data-driven deep learning approaches for melody extraction from polyphonic music. The available deep models have been categorized based on the type of neural network used and the output representation they use for predicting melody. Further, the architectures of the 25 melody extraction models are briefly presented. The loss functions used to optimize the model parameters of the melody extraction models are broadly categorized into four categories and briefly describe the loss functions used by various melody extraction models. Also, the various input representations adopted by the melody extraction models and the parameter settings are deeply described. A section describing the explainability of the block-box melody extraction deep neural networks is included. The performance of 25 melody extraction methods is compared. The possible future directions to explore/improve the melody extraction methods are also presented in the paper. △ Less

Submitted 2 February, 2022; originally announced February 2022.

Comments: 72 pages

arXiv:2201.02129 [pdf, other]

Spectral and Energy Efficient User Pairing for RIS-assisted Uplink NOMA Systems with Imperfect Phase Compensation

Authors: Kusuma Priya P., Pavan Reddy M., Abhinav Kumar

Abstract: Non-orthogonal multiple access (NOMA) is considered a key technology for improving the spectral efficiency of fifth-generation (5G) and beyond 5G cellular networks. NOMA is beneficial when the channel vectors of the users are in the same direction, which is not always possible in conventional wireless systems. With the help of a reconfigurable intelligent surface (RIS), the base station can contro… ▽ More Non-orthogonal multiple access (NOMA) is considered a key technology for improving the spectral efficiency of fifth-generation (5G) and beyond 5G cellular networks. NOMA is beneficial when the channel vectors of the users are in the same direction, which is not always possible in conventional wireless systems. With the help of a reconfigurable intelligent surface (RIS), the base station can control the directions of the channel vectors of the users. Thus, by combining both technologies, the RIS-assisted NOMA systems are expected to achieve greater improvements in the network throughput. However, ideal phase control at the RIS is unrealizable in practice because of the imperfections in the channel estimations and the hardware limitations. This imperfection in phase control can have a significant impact on the system performance. Motivated by this, in this paper, we consider an RIS-assisted uplink NOMA system in the presence of imperfect phase compensation. We formulate the criterion for pairing the users that achieves minimum required data rates. We propose adaptive user pairing algorithms that maximize spectral or energy efficiency. We then derive various bounds on power allocation factors for the paired users. Through extensive simulation results, we show that the proposed algorithms significantly outperform the state-of-the-art algorithms in terms of spectral and energy efficiency. △ Less

Submitted 6 January, 2022; originally announced January 2022.

arXiv:2111.04003 [pdf]

Predictive Model for Gross Community Production Rate of Coral Reefs using Ensemble Learning Methodologies

Authors: Umanandini S, Rishivardhan M, Aouthithiye Barathwaj SR Y, Jasline Augusta J, Shrirang Sapate, Reenasree S, Vigneash M

Abstract: Coral reefs play a vital role in maintaining the ecological balance of the marine ecosystem. Various marine organisms depend on coral reefs for their existence and their natural processes. Coral reefs provide the necessary habitat for reproduction and growth for various exotic species of the marine ecosystem. In this article, we discuss the most important parameters which influence the lifecycle o… ▽ More Coral reefs play a vital role in maintaining the ecological balance of the marine ecosystem. Various marine organisms depend on coral reefs for their existence and their natural processes. Coral reefs provide the necessary habitat for reproduction and growth for various exotic species of the marine ecosystem. In this article, we discuss the most important parameters which influence the lifecycle of coral and coral reefs such as ocean acidification, deoxygenation and other physical parameters such as flow rate and surface area. Ocean acidification depends on the amount of dissolved Carbon dioxide (CO2). This is due to the release of H+ ions upon the reaction of the dissolved CO2 gases with the calcium carbonate compounds in the ocean. Deoxygenation is another problem that leads to hypoxia which is characterized by a lesser amount of dissolved oxygen in water than the required amount for the existence of marine organisms. In this article, we highlight the importance of physical parameters such as flow rate which influence gas exchange, heat dissipation, bleaching sensitivity, nutrient supply, feeding, waste and sediment removal, growth and reproduction. In this paper, we also bring out these important parameters and propose an ensemble machine learning-based model for analyzing these parameters and provide better rates that can help us to understand and suitably improve the ocean composition which in turn can eminently improve the sustainability of the marine ecosystem, mainly the coral reefs △ Less

Submitted 23 January, 2023; v1 submitted 7 November, 2021; originally announced November 2021.

Comments: 8 pages, 18 figures

MSC Class: 68T20 ACM Class: I.2.8

arXiv:2110.05864 [pdf, other]

doi 10.1063/5.0093682

Disentangling intrinsic motion from neighbourhood effects in heterogeneous collective motion

Authors: Arshed Nabeel, Danny Raj M

Abstract: Most real world collectives, including active particles, living cells, and grains, are heterogeneous, where individuals with differing properties interact. The differences among individuals in their intrinsic properties have emergent effects at the group level. It is often of interest to infer how the intrinsic properties differ among the individuals, based on their observed movement patterns. How… ▽ More Most real world collectives, including active particles, living cells, and grains, are heterogeneous, where individuals with differing properties interact. The differences among individuals in their intrinsic properties have emergent effects at the group level. It is often of interest to infer how the intrinsic properties differ among the individuals, based on their observed movement patterns. However, the true individual properties may be masked by emergent effects in the collective. We investigate the inference problem in the context of a bidisperse collective with two types of agents, where the goal is to observe the motion of the collective and classify the agents according to their types. Since collective effects such as jamming and clustering affect individual motion, an agent's own movement does not have sufficient information to perform the classification well: a simple observer algorithm, based only on individual velocities cannot accurately estimate the level of heterogeneity of the system, and often misclassifies agents. We propose a novel approach to the classification problem, where collective effects on an agent's motion is explicitly accounted for. We use insights about the physics of collective motion to quantify the effect of the neighbourhood on an agent using a neighbourhood parameter. Such an approach can distinguish between agents of two types, even when their observed motion is identical. This approach estimates the level of heterogeneity much more accurately, and achieves significant improvements in classification. Our results demonstrate that explicitly accounting for neighbourhood effects is often necessary to correctly infer intrinsic properties of individuals. △ Less

Submitted 5 March, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

Comments: Supplementary movies can be found in: https://www.dannyraj.com/obsinf-supp-info

Journal ref: Chaos 32, 063119 (2022)

arXiv:2106.07938 [pdf, ps, other]

User Pairing and Power Allocation for IRS-Assisted NOMA Systems with Imperfect Phase Compensation

Authors: Pavan Reddy M., Abhinav Kumar

Abstract: In this letter, we analyze the performance of the intelligent reflecting surface (IRS) assisted downlink non-orthogonal multiple access (NOMA) systems in the presence of imperfect phase compensation. We derive an upper bound on the imperfect phase compensation to achieve minimum required data rates for each user. Using this bound, we propose an adaptive user pairing algorithm to maximize the netwo… ▽ More In this letter, we analyze the performance of the intelligent reflecting surface (IRS) assisted downlink non-orthogonal multiple access (NOMA) systems in the presence of imperfect phase compensation. We derive an upper bound on the imperfect phase compensation to achieve minimum required data rates for each user. Using this bound, we propose an adaptive user pairing algorithm to maximize the network throughput. We then derive bounds on the power allocation factors and propose power allocation algorithms for the paired users to achieve the maximum sum rate or ensure fairness. Through extensive simulations, we show that the proposed algorithms significantly outperform the state-of-the-art algorithms. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2104.13049 [pdf, other]

doi 10.1109/IOTM.0001.2000191

Toward Blockchain for Edge-of-Things: A New Paradigm, Opportunities, and Future Directions

Authors: Prabadevi B, N Deepa, Quoc-Viet Pham, Dinh C. Nguyen, Praveen Kumar Reddy M, Thippa Reddy G, Pubudu N. Pathirana, Octavia Dobre

Abstract: Blockchain is gaining momentum as a promising technology for many application domains, one of them being the Edge-of- Things (EoT) that is enabled by the integration of edge computing and the Internet-of-Things (IoT). Particularly, the amalgamation of blockchain and EoT leads to a new paradigm, called blockchain enabled EoT (BEoT) that is crucial for enabling future low-latency and high-security s… ▽ More Blockchain is gaining momentum as a promising technology for many application domains, one of them being the Edge-of- Things (EoT) that is enabled by the integration of edge computing and the Internet-of-Things (IoT). Particularly, the amalgamation of blockchain and EoT leads to a new paradigm, called blockchain enabled EoT (BEoT) that is crucial for enabling future low-latency and high-security services and applications. This article envisions a novel BEoT architecture for supporting industrial applications under the management of blockchain at the network edge in a wide range of IoT use cases such as smart home, smart healthcare, smart grid, and smart transportation. The potentials of BEoT in providing security services are also explored, including access authentication, data privacy preservation, attack detection, and trust management. Finally, we point out some key research challenges and future directions in this emerging area. △ Less

Submitted 27 April, 2021; originally announced April 2021.

Comments: Accepted at the IEEE Internet of Things Magazine

arXiv:2101.00798 [pdf, other]

Fusion of Federated Learning and Industrial Internet of Things: A Survey

Authors: Parimala M, Swarna Priya R M, Quoc-Viet Pham, Kapal Dev, Praveen Kumar Reddy Maddikunta, Thippa Reddy Gadekallu, Thien Huynh-The

Abstract: Industrial Internet of Things (IIoT) lays a new paradigm for the concept of Industry 4.0 and paves an insight for new industrial era. Nowadays smart machines and smart factories use machine learning/deep learning based models for incurring intelligence. However, storing and communicating the data to the cloud and end device leads to issues in preserving privacy. In order to address this issue, fed… ▽ More Industrial Internet of Things (IIoT) lays a new paradigm for the concept of Industry 4.0 and paves an insight for new industrial era. Nowadays smart machines and smart factories use machine learning/deep learning based models for incurring intelligence. However, storing and communicating the data to the cloud and end device leads to issues in preserving privacy. In order to address this issue, federated learning (FL) technology is implemented in IIoT by the researchers nowadays to provide safe, accurate, robust and unbiased models. Integrating FL in IIoT ensures that no local sensitive data is exchanged, as the distribution of learning models over the edge devices has become more common with FL. Therefore, only the encrypted notifications and parameters are communicated to the central server. In this paper, we provide a thorough overview on integrating FL with IIoT in terms of privacy, resource and data management. The survey starts by articulating IIoT characteristics and fundamentals of distributive and FL. The motivation behind integrating IIoT and FL for achieving data privacy preservation and on-device learning are summarized. Then we discuss the potential of using machine learning, deep learning and blockchain techniques for FL in secure IIoT. Further we analyze and summarize the ways to handle the heterogeneous and huge data. Comprehensive background on data and resource management are then presented, followed by applications of IIoT with FL in healthcare and automobile industry. Finally, we shed light on challenges, some possible solutions and potential directions for future research. △ Less

Submitted 4 January, 2021; originally announced January 2021.

Comments: This work has been submitted for possible publication. Any comments and suggestions are appreciated

arXiv:2011.04297 [pdf, other]

Knowledge Distillation for Singing Voice Detection

Authors: Soumava Paul, Gurunath Reddy M, K Sreenivasa Rao, Partha Pratim Das

Abstract: Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for C… ▽ More Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for CNN and 65.7K for RNN) and hence not suitable for deployment on devices like smartphones or embedded sensors with limited capacity in terms of memory and computation power. The most popular method to address this issue is known as knowledge distillation in deep learning literature (in addition to model compression) where a large pre-trained network known as the teacher is used to train a smaller student network. Given the wide applications of SVD in music information retrieval, to the best of our knowledge, model compression for practical deployment has not yet been explored. In this paper, efforts have been made to investigate this issue using both conventional as well as ensemble knowledge distillation techniques. △ Less

Submitted 19 August, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: Accepted at INTERSPEECH 2021. 5 pages, 3 figures

arXiv:2009.05783 [pdf, other]

doi 10.1109/ACCESS.2020.3028595

Multiclass Model for Agriculture development using Multivariate Statistical method

Authors: N Deepa, Mohammad Zubair Khan, Prabadevi B, Durai Raj Vincent P M, Praveen Kumar Reddy Maddikunta, Thippa Reddy Gadekallu

Abstract: Mahalanobis taguchi system (MTS) is a multi-variate statistical method extensively used for feature selection and binary classification problems. The calculation of orthogonal array and signal-to-noise ratio in MTS makes the algorithm complicated when more number of factors are involved in the classification problem. Also the decision is based on the accuracy of normal and abnormal observations of… ▽ More Mahalanobis taguchi system (MTS) is a multi-variate statistical method extensively used for feature selection and binary classification problems. The calculation of orthogonal array and signal-to-noise ratio in MTS makes the algorithm complicated when more number of factors are involved in the classification problem. Also the decision is based on the accuracy of normal and abnormal observations of the dataset. In this paper, a multiclass model using Improved Mahalanobis Taguchi System (IMTS) is proposed based on normal observations and Mahalanobis distance for agriculture development. Twenty-six input factors relevant to crop cultivation have been identified and clustered into six main factors for the development of the model. The multiclass model is developed with the consideration of the relative importance of the factors. An objective function is defined for the classification of three crops, namely paddy, sugarcane and groundnut. The classification results are verified against the results obtained from the agriculture experts working in the field. The proposed classifier provides 100% accuracy, recall, precision and 0% error rate when compared with other traditional classifier models. △ Less

Submitted 7 October, 2020; v1 submitted 12 September, 2020; originally announced September 2020.

Comments: in IEEE Access

arXiv:2007.06021 [pdf, other]

NISP: A Multi-lingual Multi-accent Dataset for Speaker Profiling

Authors: Shareef Babu Kalluri, Deepu Vijayasenan, Sriram Ganapathy, Ragesh Rajan M, Prashant Krishnan

Abstract: Many commercial and forensic applications of speech demand the extraction of information about the speaker characteristics, which falls into the broad category of speaker profiling. The speaker characteristics needed for profiling include physical traits of the speaker like height, age, and gender of the speaker along with the native language of the speaker. Many of the datasets available have onl… ▽ More Many commercial and forensic applications of speech demand the extraction of information about the speaker characteristics, which falls into the broad category of speaker profiling. The speaker characteristics needed for profiling include physical traits of the speaker like height, age, and gender of the speaker along with the native language of the speaker. Many of the datasets available have only partial information for speaker profiling. In this paper, we attempt to overcome this limitation by develo** a new dataset which has speech data from five different Indian languages along with English. The metadata information for speaker profiling applications like linguistic information, regional information, and physical characteristics of a speaker are also collected. We call this dataset as NITK-IISc Multilingual Multi-accent Speaker Profiling (NISP) dataset. The description of the dataset, potential applications, and baseline results for speaker profiling on this dataset are provided in this paper. △ Less

Submitted 12 July, 2020; originally announced July 2020.

Comments: 5pages, Initial version submitted to Interspeech2020

arXiv:2007.05853 [pdf]

Complex Wavelet SSIM based Image Data Augmentation

Authors: Ritin Raveendran, Aviral Singh, Rajesh Kumar M

Abstract: One of the biggest problems in neural learning networks is the lack of training data available to train the network. Data augmentation techniques over the past few years, have therefore been developed, aiming to increase the amount of artificial training data with the limited number of real world samples. In this paper, we look particularly at the MNIST handwritten dataset an image dataset used fo… ▽ More One of the biggest problems in neural learning networks is the lack of training data available to train the network. Data augmentation techniques over the past few years, have therefore been developed, aiming to increase the amount of artificial training data with the limited number of real world samples. In this paper, we look particularly at the MNIST handwritten dataset an image dataset used for digit recognition, and the methods of data augmentation done on this data set. We then take a detailed look into one of the most popular augmentation techniques used for this data set elastic deformation; and highlight its demerit of degradation in the quality of data, which introduces irrelevant data to the training set. To decrease this irrelevancy, we propose to use a similarity measure called Complex Wavelet Structural Similarity Index Measure (CWSSIM) to selectively filter out the irrelevant data before we augment the data set. We compare our observations with the existing augmentation technique and find our proposed method works yields better results than the existing technique. △ Less

Submitted 11 July, 2020; originally announced July 2020.

arXiv:2006.14107 [pdf, other]

Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation

Authors: Jogendra Nath Kundu, Siddharth Seth, Rahul M V, Mugalodi Rakesh, R. Venkatesh Babu, Anirban Chakraborty

Abstract: Estimation of 3D human pose from monocular image has gained considerable attention, as a key step to several human-centric applications. However, generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable, as these models often perform unsatisfactorily on unseen in-the-wild environments. Though weakly-supervised models have b… ▽ More Estimation of 3D human pose from monocular image has gained considerable attention, as a key step to several human-centric applications. However, generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable, as these models often perform unsatisfactorily on unseen in-the-wild environments. Though weakly-supervised models have been proposed to address this shortcoming, performance of such models relies on availability of paired supervision on some related tasks, such as 2D pose or multi-view image pairs. In contrast, we propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions. Our pose estimation framework relies on a minimal set of prior knowledge that defines the underlying kinematic 3D structure, such as skeletal joint connectivity information with bone-length ratios in a fixed canonical scale. The proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation. This design not only acts as a suitable bottleneck stimulating effective pose disentanglement but also yields interpretable latent pose representations avoiding training of an explicit latent embedding to pose mapper. Furthermore, devoid of unstable adversarial setup, we re-utilize the decoder to formalize an energy-based loss, which enables us to learn from in-the-wild videos, beyond laboratory settings. Comprehensive experiments demonstrate our state-of-the-art unsupervised and weakly-supervised pose estimation performance on both Human3.6M and MPI-INF-3DHP datasets. Qualitative results on unseen environments further establish our superior generalization ability. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: AAAI 2020 (Oral)

arXiv:2006.00782 [pdf, other]

Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition

Authors: Sanket Shah, Basil Abraham, Gurunath Reddy M, Sunayana Sitaram, Vikas Joshi

Abstract: Recently, there has been significant progress made in Automatic Speech Recognition (ASR) of code-switched speech, leading to gains in accuracy on code-switched datasets in many language pairs. Code-switched speech co-occurs with monolingual speech in one or both languages being mixed. In this work, we show that fine-tuning ASR models on code-switched speech harms performance on monolingual speech.… ▽ More Recently, there has been significant progress made in Automatic Speech Recognition (ASR) of code-switched speech, leading to gains in accuracy on code-switched datasets in many language pairs. Code-switched speech co-occurs with monolingual speech in one or both languages being mixed. In this work, we show that fine-tuning ASR models on code-switched speech harms performance on monolingual speech. We point out the need to optimize models for code-switching while also ensuring that monolingual performance is not sacrificed. Monolingual models may be trained on thousands of hours of speech which may not be available for re-training a new model. We propose using the Learning Without Forgetting (LWF) framework for code-switched ASR when we only have access to a monolingual model and do not have the data it was trained on. We show that it is possible to train models using this framework that perform well on both code-switched and monolingual test sets. In cases where we have access to monolingual training data as well, we propose regularization strategies for fine-tuning models for code-switching without sacrificing monolingual accuracy. We report improvements in Word Error Rate (WER) in monolingual and code-switched test sets compared to baselines that use pooled data and simple fine-tuning. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: 5 pages (4 pages + 1 page references), 5 tables, 1 figure, 1 algorithm, 16 references

arXiv:2005.00229

doi 10.1729/Journal.22894

Deepfake Forensics Using Recurrent Neural Networks

Authors: Rahul U, Ragul M, Raja Vignesh K, Tejeswinee K

Abstract: As of late an AI based free programming device has made it simple to make authentic face swaps in recordings that leaves barely any hints of control, in what are known as "deepfake" recordings. Situations where these genuine istic counterfeit recordings are utilized to make political pain, extort somebody or phony fear based oppression occasions are effectively imagined. This paper proposes a tran… ▽ More As of late an AI based free programming device has made it simple to make authentic face swaps in recordings that leaves barely any hints of control, in what are known as "deepfake" recordings. Situations where these genuine istic counterfeit recordings are utilized to make political pain, extort somebody or phony fear based oppression occasions are effectively imagined. This paper proposes a transient mindful pipeline to automat-ically recognize deepfake recordings. Our framework utilizes a convolutional neural system (CNN) to remove outline level highlights. These highlights are then used to prepare a repetitive neural net-work (RNN) that figures out how to characterize if a video has been sub-ject to control or not. We assess our technique against a huge arrangement of deepfake recordings gathered from different video sites. We show how our framework can accomplish aggressive outcomes in this assignment while utilizing a basic design. △ Less

Submitted 1 May, 2020; originally announced May 2020.

Comments: This submission has been removed by arXiv administrators due to copyright infringement

arXiv:2004.14178

doi 10.35940/ijrte.F9747.038620

Deepfake Video Forensics based on Transfer Learning

Authors: Rahul U, Ragul M, Raja Vignesh K, Tejeswinee K

Abstract: Deeplearning has been used to solve complex problems in various domains. As it advances, it also creates applications which become a major threat to our privacy, security and even to our Democracy. Such an application which is being developed recently is the "Deepfake". Deepfake models can create fake images and videos that humans cannot differentiate them from the genuine ones. Therefore, the cou… ▽ More Deeplearning has been used to solve complex problems in various domains. As it advances, it also creates applications which become a major threat to our privacy, security and even to our Democracy. Such an application which is being developed recently is the "Deepfake". Deepfake models can create fake images and videos that humans cannot differentiate them from the genuine ones. Therefore, the counter application to automatically detect and analyze the digital visual media is necessary in today world. This paper details retraining the image classification models to apprehend the features from each deepfake video frames. After feeding different sets of deepfake clips of video fringes through a pretrained layer of bottleneck in the neural network is made for every video frame, already stated layer contains condense data for all images and exposes artificial manipulations in Deepfake videos. When checking Deepfake videos, this technique received more than 87 per cent accuracy. This technique has been tested on the Face Forensics dataset and obtained good accuracy in detection. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Comments: This submission has been removed by arXiv administrators due to copyright infringement

Report number: F9747038620

arXiv:2004.04393 [pdf, other]

Universal Source-Free Domain Adaptation

Authors: Jogendra Nath Kundu, Naveen Venkat, Rahul M V, R. Venkatesh Babu

Abstract: There is a strong incentive to develop versatile learning techniques that can transfer the knowledge of class-separability from a labeled source domain to an unlabeled target domain in the presence of a domain-shift. Existing domain adaptation (DA) approaches are not equipped for practical DA scenarios as a result of their reliance on the knowledge of source-target label-set relationship (e.g. Clo… ▽ More There is a strong incentive to develop versatile learning techniques that can transfer the knowledge of class-separability from a labeled source domain to an unlabeled target domain in the presence of a domain-shift. Existing domain adaptation (DA) approaches are not equipped for practical DA scenarios as a result of their reliance on the knowledge of source-target label-set relationship (e.g. Closed-set, Open-set or Partial DA). Furthermore, almost all prior unsupervised DA works require coexistence of source and target samples even during deployment, making them unsuitable for real-time adaptation. Devoid of such impractical assumptions, we propose a novel two-stage learning process. 1) In the Procurement stage, we aim to equip the model for future source-free deployment, assuming no prior knowledge of the upcoming category-gap and domain-shift. To achieve this, we enhance the model's ability to reject out-of-source distribution samples by leveraging the available source data, in a novel generative classifier framework. 2) In the Deployment stage, the goal is to design a unified adaptation algorithm capable of operating across a wide range of category-gaps, with no access to the previously seen source samples. To this end, in contrast to the usage of complex adversarial training regimes, we define a simple yet effective source-free adaptation objective by utilizing a novel instance-level weighting mechanism, named as Source Similarity Metric (SSM). A thorough evaluation shows the practical usability of the proposed learning framework with superior DA performance even over state-of-the-art source-dependent approaches. △ Less

Submitted 9 April, 2020; originally announced April 2020.

Comments: CVPR 2020. Code available at https://github.com/val-iisc/usfda

arXiv:2004.04388 [pdf, other]

Towards Inheritable Models for Open-Set Domain Adaptation

Authors: Jogendra Nath Kundu, Naveen Venkat, Ambareesh Revanur, Rahul M V, R. Venkatesh Babu

Abstract: There has been a tremendous progress in Domain Adaptation (DA) for visual recognition tasks. Particularly, open-set DA has gained considerable attention wherein the target domain contains additional unseen categories. Existing open-set DA approaches demand access to a labeled source dataset along with unlabeled target instances. However, this reliance on co-existing source and target data is highl… ▽ More There has been a tremendous progress in Domain Adaptation (DA) for visual recognition tasks. Particularly, open-set DA has gained considerable attention wherein the target domain contains additional unseen categories. Existing open-set DA approaches demand access to a labeled source dataset along with unlabeled target instances. However, this reliance on co-existing source and target data is highly impractical in scenarios where data-sharing is restricted due to its proprietary nature or privacy concerns. Addressing this, we introduce a practical DA paradigm where a source-trained model is used to facilitate adaptation in the absence of the source dataset in future. To this end, we formalize knowledge inheritability as a novel concept and propose a simple yet effective solution to realize inheritable models suitable for the above practical paradigm. Further, we present an objective way to quantify inheritability to enable the selection of the most suitable source model for a given target domain, even in the absence of the source data. We provide theoretical insights followed by a thorough empirical evaluation demonstrating state-of-the-art open-set domain adaptation performance. △ Less

Submitted 9 April, 2020; originally announced April 2020.

Comments: CVPR 2020 (Oral). Code available at https://github.com/val-iisc/inheritune

arXiv:2002.02370 [pdf]

Data hiding in speech signal using steganography and encryption

Authors: Hanisha Chowdary N, Karan K, Bharath K P, Rajesh Kumar M

Abstract: Data privacy and data security are always on highest priority in the world. We need a reliable method to encrypt the data so that it reaches the destination safely. Encryption is a simple yet effective way to protect our data while transmitting it to a destination. The proposed method has state of art technology of steganography and encryption. This paper puts forward a different approach for data… ▽ More Data privacy and data security are always on highest priority in the world. We need a reliable method to encrypt the data so that it reaches the destination safely. Encryption is a simple yet effective way to protect our data while transmitting it to a destination. The proposed method has state of art technology of steganography and encryption. This paper puts forward a different approach for data hiding in speech signals. A ten-digit number within speech signal using audio steganography and encrypting it with a unique key for better security. At the receiver end the same unique key is used to decrypt the received signal and then hidden numbers are extracted. The proposed approach performance can be evaluated by PSNR, MSE, SSIM and bit-error rate. The simulation results give better performance compared to existing approach. △ Less

Submitted 13 January, 2020; originally announced February 2020.

arXiv:2001.10094 [pdf]

OMAP-L138 LCDK Development Kit

Authors: Bharath K P, Sylash K, Pravina K, Rajesh Kumar M

Abstract: Low cost and low power consumption processor play a vital role in the field of Digital Signal Processing (DSP). The OMAP-L138 development kit which is low cost, low power consumption, ease and speed, with a wide variety of applications includes Digital signal processing, Image processing and video processing. This paper represents the basic introduction to OMAP-L138 processor and quick procedural… ▽ More Low cost and low power consumption processor play a vital role in the field of Digital Signal Processing (DSP). The OMAP-L138 development kit which is low cost, low power consumption, ease and speed, with a wide variety of applications includes Digital signal processing, Image processing and video processing. This paper represents the basic introduction to OMAP-L138 processor and quick procedural steps for real time and non-real time implementations with a set of programs. The real time experiments are based on audio in the applications of audio loopback, delay and echo. Whereas the non-real time experiments are generation of a sine wave, low pass and high pass filter. △ Less

Submitted 13 January, 2020; originally announced January 2020.

arXiv:2001.04215 [pdf]

Radial Based Analysis of GRNN in Non-Textured Image Inpainting

Authors: Karthik R, Anvita Dwivedi, Haripriya M, Bharath K P, Rajesh Kumar M

Abstract: Image inpainting algorithms are used to restore some damaged or missing information region of an image based on the surrounding information. The method proposed in this paper applies the radial based analysis of image inpainting on GRNN. The damaged areas are first isolated from rest of the areas and then arranged by their size and then inpainted using GRNN. The training of the neural network is d… ▽ More Image inpainting algorithms are used to restore some damaged or missing information region of an image based on the surrounding information. The method proposed in this paper applies the radial based analysis of image inpainting on GRNN. The damaged areas are first isolated from rest of the areas and then arranged by their size and then inpainted using GRNN. The training of the neural network is done using different radii to achieve a better outcome. A comparative analysis is done for different regression-based algorithms. The overall results are compared with the results achieved by the other algorithms as LS-SVM with reference to the PSNR value. △ Less

Submitted 13 January, 2020; originally announced January 2020.

arXiv:2001.04208 [pdf]

Handwritten Character Recognition Using Unique Feature Extraction Technique

Authors: Sai Abhishikth Ayyadevara, P N V Sai Ram Teja, Bharath K P, Rajesh Kumar M

Abstract: One of the most arduous and captivating domains under image processing is handwritten character recognition. In this paper we have proposed a feature extraction technique which is a combination of unique features of geometric, zone-based hybrid, gradient features extraction approaches and three different neural networks namely the Multilayer Perceptron network using Backpropagation algorithm (MLP… ▽ More One of the most arduous and captivating domains under image processing is handwritten character recognition. In this paper we have proposed a feature extraction technique which is a combination of unique features of geometric, zone-based hybrid, gradient features extraction approaches and three different neural networks namely the Multilayer Perceptron network using Backpropagation algorithm (MLP BP), the Multilayer Perceptron network using Levenberg-Marquardt algorithm (MLP LM) and the Convolutional neural network (CNN) which have been implemented along with the Minimum Distance Classifier (MDC). The procedures lead to the conclusion that the proposed feature extraction algorithm is more accurate than its individual counterparts and also that Convolutional Neural Network is the most efficient neural network of the three in consideration. △ Less

Submitted 13 January, 2020; originally announced January 2020.

arXiv:1905.05520 [pdf, other]

A Novel Beamformed Control Channel Design for LTE with Full Dimension-MIMO

Authors: Pavan Reddy M., Harish Kumar D., Saidhiraj Amuru, Kiran Kuchi

Abstract: The Full Dimension-MIMO (FD-MIMO) technology is capable of achieving huge improvements in network throughput with simultaneous connectivity of a large number of mobile wireless devices, unmanned aerial vehicles, and the Internet of Things (IoT). In FD-MIMO, with a large number of antennae at the base station and the ability to perform beamforming, the capacity of the physical downlink shared chann… ▽ More The Full Dimension-MIMO (FD-MIMO) technology is capable of achieving huge improvements in network throughput with simultaneous connectivity of a large number of mobile wireless devices, unmanned aerial vehicles, and the Internet of Things (IoT). In FD-MIMO, with a large number of antennae at the base station and the ability to perform beamforming, the capacity of the physical downlink shared channel (PDSCH) has increased a lot. However, the current specifications of the 3rd Generation Partnership Project (3GPP) does not allow the base station to perform beamforming techniques for the physical downlink control channel (PDCCH), and hence, PDCCH has neither the capacity nor the coverage of PDSCH. Therefore, PDCCH capacity will still limit the performance of a network as it dictates the number of users that can be scheduled at a given time instant. In Release 11, 3GPP introduced enhanced PDCCH (EPDCCH) to increase the PDCCH capacity at the cost of sacrificing the PDSCH resources. The problem of enhancing the PDCCH capacity within the available control channel resources has not been addressed yet in the literature. Hence, in this paper, we propose a novel beamformed PDCCH (BF-PDCCH) design which is aligned to the 3GPP specifications and requires simple software changes at the base station. We rely on the sounding reference signals transmitted in the uplink to decide the best beam for a user and ingeniously schedule the users in PDCCH. We perform system level simulations to evaluate the performance of the proposed design and show that the proposed BF-PDCCH achieves larger network throughput when compared with the current state of art algorithms, PDCCH and EPDCCH schemes. △ Less

Submitted 14 May, 2019; originally announced May 2019.

arXiv:1904.09765 [pdf, other]

hf0: A hybrid pitch extraction method for multimodal voice

Authors: Pradeep Rengaswamy, Gurunath Reddy M, Krothapalli Sreenivasa Rao

Abstract: Pitch or fundamental frequency (f0) extraction is a fundamental problem studied extensively for its potential applications in speech and clinical applications. In literature, explicit mode specific (modal speech or singing voice or emotional/ expressive speech or noisy speech) signal processing and deep learning f0 extraction methods that exploit the quasi periodic nature of the signal in time, ha… ▽ More Pitch or fundamental frequency (f0) extraction is a fundamental problem studied extensively for its potential applications in speech and clinical applications. In literature, explicit mode specific (modal speech or singing voice or emotional/ expressive speech or noisy speech) signal processing and deep learning f0 extraction methods that exploit the quasi periodic nature of the signal in time, harmonic property in spectral or combined form to extract the pitch is developed. Hence, there is no single unified method which can reliably extract the pitch from various modes of the acoustic signal. In this work, we propose a hybrid f0 extraction method which seamlessly extracts the pitch across modes of speech production with very high accuracy required for many applications. The proposed hybrid model exploits the advantages of deep learning and signal processing methods to minimize the pitch detection error and adopts to various modes of acoustic signal. Specifically, we propose an ordinal regression convolutional neural networks to map the periodicity rich input representation to obtain the nominal pitch classes which drastically reduces the number of classes required for pitch detection unlike other deep learning approaches. Further, the accurate f0 is estimated from the nominal pitch class labels by filtering and autocorrelation. We show that the proposed method generalizes to the unseen modes of voice production and various noises for large scale datasets. Also, the proposed hybrid model significantly reduces the learning parameters required to train the deep model compared to other methods. Furthermore,the evaluation measures showed that the proposed method is significantly better than the state-of-the-art signal processing and deep learning approaches. △ Less

Submitted 22 April, 2019; originally announced April 2019.

Comments: Pitch Extraction, F0 extraction, harmonic signals, speech, monophonic songs, Convolutional Neural Network, 5 pages, 5 figures

arXiv:1903.01902 [pdf, other]

BacSoft: A Tool to Archive Data on Bacteria

Authors: Amay Agrawal, Dixita Limbachiya, Ravikumar M., Taslimarif Saiyed, Manish K. Gupta

Abstract: Recently, DNA data storage systems have attracted many researchers worldwide. Motivated by the success stories of such systems, in this work we propose a software called BacSoft to clone the data in a bacterial plasmid by using the concept of genetic engineering. We consider the encoding schemes such that it satisfies constraints significant for bacterial data storage. Recently, DNA data storage systems have attracted many researchers worldwide. Motivated by the success stories of such systems, in this work we propose a software called BacSoft to clone the data in a bacterial plasmid by using the concept of genetic engineering. We consider the encoding schemes such that it satisfies constraints significant for bacterial data storage. △ Less

Submitted 5 March, 2019; originally announced March 2019.

Comments: 8 pages, 13 figures, poster abstract DNA Computing and Molecular Programming, DNA24 conference, **an, China, Oct 2018

arXiv:1811.09956 [pdf, other]

Glottal Closure Instants Detection From Pathological Acoustic Speech Signal Using Deep Learning

Authors: Gurunath Reddy M, Tanumay Mandal, Krothapalli Sreenivasa Rao

Abstract: In this paper, we propose a classification based glottal closure instants (GCI) detection from pathological acoustic speech signal, which finds many applications in vocal disorder analysis. Till date, GCI for pathological disorder is extracted from laryngeal (glottal source) signal recorded from Electroglottograph, a dedicated device designed to measure the vocal folds vibration around the larynx.… ▽ More In this paper, we propose a classification based glottal closure instants (GCI) detection from pathological acoustic speech signal, which finds many applications in vocal disorder analysis. Till date, GCI for pathological disorder is extracted from laryngeal (glottal source) signal recorded from Electroglottograph, a dedicated device designed to measure the vocal folds vibration around the larynx. We have created a pathological dataset which consists of simultaneous recordings of glottal source and acoustic speech signal of six different disorders from vocal disordered patients. The GCI locations are manually annotated for disorder analysis and supervised learning. We have proposed convolutional neural network based GCI detection method by fusing deep acoustic speech and linear prediction residual features for robust GCI detection. The experimental results showed that the proposed method is significantly better than the state-of-the-art GCI detection methods. △ Less

Submitted 25 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/39

arXiv:1810.06635 [pdf, other]

doi 10.21437/Interspeech.2018-2486

Semi-supervised and Active-learning Scenarios: Efficient Acoustic Model Refinement for a Low Resource Indian Language

Authors: Maharajan Chellapriyadharshini, Anoop Toffy, Srinivasa Raghavan K. M., V Ramasubramanian

Abstract: We address the problem of efficient acoustic-model refinement (continuous retraining) using semi-supervised and active learning for a low resource Indian language, wherein the low resource constraints are having i) a small labeled corpus from which to train a baseline `seed' acoustic model and ii) a large training corpus without orthographic labeling or from which to perform a data selection for m… ▽ More We address the problem of efficient acoustic-model refinement (continuous retraining) using semi-supervised and active learning for a low resource Indian language, wherein the low resource constraints are having i) a small labeled corpus from which to train a baseline `seed' acoustic model and ii) a large training corpus without orthographic labeling or from which to perform a data selection for manual labeling at low costs. The proposed semi-supervised learning decodes the unlabeled large training corpus using the seed model and through various protocols, selects the decoded utterances with high reliability using confidence levels (that correlate to the WER of the decoded utterances) and iterative bootstrap**. The proposed active learning protocol uses confidence level based metric to select the decoded utterances from the large unlabeled corpus for further labeling. The semi-supervised learning protocols can offer a WER reduction, from a poorly trained seed model, by as much as 50% of the best WER-reduction realizable from the seed model's WER, if the large corpus were labeled and used for acoustic-model training. The active learning protocols allow that only 60% of the entire training corpus be manually labeled, to reach the same performance as the entire data. △ Less

Submitted 2 October, 2018; originally announced October 2018.

Journal ref: Proc. Interspeech 2018

arXiv:1809.04154 [pdf]

Intensity and Rescale Invariant Copy Move Forgery Detection Techniques

Authors: Tejas K, Swathi C, Rajesh Kumar M

Abstract: In this contemporary world digital media such as videos and images behave as an active medium to carry valuable information across the globe on all fronts. However there are several techniques evolved to tamper the image which has made their authenticity untrustworthy. CopyMove Forgery CMF is one of the most common forgeries present in an image where a cluster of pixels are duplicated in the same… ▽ More In this contemporary world digital media such as videos and images behave as an active medium to carry valuable information across the globe on all fronts. However there are several techniques evolved to tamper the image which has made their authenticity untrustworthy. CopyMove Forgery CMF is one of the most common forgeries present in an image where a cluster of pixels are duplicated in the same image with potential postprocessing techniques. Various state-of-art techniques are developed in the recent years which are effective in detecting passive image forgery. However most methods do fail when the copied image is rescaled or added with certain intensity before being pasted due to de-synchronization of pixels in the searching process. To tackle this problem the paper proposes distinct novel algorithms which recognize a unique approach of using Hus invariant moments and Discreet Cosine Transformations DCT to attain the desired rescale invariant and intensity invariant CMF detection techniques respectively. The experiments conducted quantitatively and qualitatively demonstrate the effectiveness of the algorithm. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Comments: Further research is active on this paper in VIT University. Hence, the paper is yet not published

arXiv:1806.02907 [pdf]

Copy Move Forgery using Hus Invariant Moments and Log Polar Transformations

Authors: Tejas K, Swathi C, Rajesh Kumar M

Abstract: With the increase in interchange of data, there is a growing necessity of security. Considering the volumes of digital data that is transmitted, they are in need to be secure. Among the many forms of tampering possible, one widespread technique is Copy Move Forgery CMF. This forgery occurs when parts of the image are copied and duplicated elsewhere in the same image. There exist a number of algori… ▽ More With the increase in interchange of data, there is a growing necessity of security. Considering the volumes of digital data that is transmitted, they are in need to be secure. Among the many forms of tampering possible, one widespread technique is Copy Move Forgery CMF. This forgery occurs when parts of the image are copied and duplicated elsewhere in the same image. There exist a number of algorithms to detect such a forgery in which the primary step involved is feature extraction. The feature extraction techniques employed must have lesser time and space complexity involved for an efficient and faster processing of media. Also, majority of the existing state of art techniques often tend to falsely match similar genuine objects as copy move forged during the detection process. To tackle these problems, the paper proposes a novel algorithm that recognizes a unique approach of using Hus Invariant Moments and Log polar Transformations to reduce feature vector dimension to one feature per block simultaneously detecting CMF among genuine similar objects in an image. The qualitative and quantitative results obtained demonstrate the effectiveness of this algorithm. △ Less

Submitted 7 June, 2018; originally announced June 2018.

Comments: This paper was submitted, accepted and presented in the 3rd International Conference on RTEICT, IEEE Conference

arXiv:1802.06288 [pdf]

Implementation of Neural Network and feature extraction to classify ECG signals

Authors: R Karthik, Dhruv Tyagi, Amogh Raut, Soumya Saxena, Rajesh Kumar M

Abstract: This paper presents a suitable and efficient implementation of a feature extraction algorithm (Pan Tompkins algorithm) on electrocardiography (ECG) signals, for detection and classification of four cardiac diseases: Sleep Apnea, Arrhythmia, Supraventricular Arrhythmia and Long Term Atrial Fibrillation (AF) and differentiating them from the normal heart beat by using pan Tompkins RR detection follo… ▽ More This paper presents a suitable and efficient implementation of a feature extraction algorithm (Pan Tompkins algorithm) on electrocardiography (ECG) signals, for detection and classification of four cardiac diseases: Sleep Apnea, Arrhythmia, Supraventricular Arrhythmia and Long Term Atrial Fibrillation (AF) and differentiating them from the normal heart beat by using pan Tompkins RR detection followed by feature extraction for classification purpose .The paper also presents a new approach towards signal classification using the existing neural networks classifiers. △ Less

Submitted 17 February, 2018; originally announced February 2018.

Comments: SPRINGER LNEE

Showing 1–50 of 60 results for author: M, R