-
Image Based Character Recognition, Documentation System To Decode Inscription From Temple
Authors:
Velmathi G,
Shangavelan M,
Harish D,
Krithikshun M S
Abstract:
This project undertakes the training and analysis of optical character recognition OCR methods applied to 10th century ancient Tamil inscriptions discovered on the walls of the Brihadeeswarar Temple.The chosen OCR methods include Tesseract,a widely used OCR engine,using modern ICR techniques to pre process the raw data and a box editing software to finetune our model.The analysis with Tesseract ai…
▽ More
This project undertakes the training and analysis of optical character recognition OCR methods applied to 10th century ancient Tamil inscriptions discovered on the walls of the Brihadeeswarar Temple.The chosen OCR methods include Tesseract,a widely used OCR engine,using modern ICR techniques to pre process the raw data and a box editing software to finetune our model.The analysis with Tesseract aims to evaluate their effectiveness in accurately deciphering the nuances of the ancient Tamil characters.The performance of our model for the dataset are determined by their accuracy rate where the evaluated dataset divided into training set and testing set.By addressing the unique challenges posed by the script's historical context,this study seeks to contribute valuable insights to the broader field of OCR,facilitating improved preservation and interpretation of ancient inscriptions
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis
Authors:
Vinotha R,
Hepsiba D,
L. D. Vijay Anand,
Deepak John Reji
Abstract:
Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech using neural networks. One of the most remarkable features of TTS synthesis is its capability to produce speech in the voice of different speakers. This paper introduces voice cloning and speech synthesis https://pypi.org/project/voice-cloning/ an open-source python package for hel** speech disorders to commu…
▽ More
Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech using neural networks. One of the most remarkable features of TTS synthesis is its capability to produce speech in the voice of different speakers. This paper introduces voice cloning and speech synthesis https://pypi.org/project/voice-cloning/ an open-source python package for hel** speech disorders to communicate more effectively as well as for professionals seeking to integrate voice cloning or speech synthesis capabilities into their projects. This package aims to generate synthetic speech that sounds like the natural voice of an individual, but it does not replace the natural human voice. The architecture of the system comprises a speaker verification system, a synthesizer, a vocoder, and noise reduction. Speaker verification system trained on a varied set of speakers to achieve optimal generalization performance without relying on transcriptions. Synthesizer is trained using both audio and transcriptions that generate Mel spectrogram from a text and vocoder which converts the generated Mel Spectrogram into corresponding audio signal. Then the audio signal is processed by a noise reduction algorithm to eliminate unwanted noise and enhance speech clarity. The performance of synthesized speech from seen and unseen speakers are then evaluated using subjective and objective evaluation such as Mean Opinion Score (MOS), Gross Pitch Error (GPE), and Spectral distortion (SD). The model can create speech in distinct voices by including speaker characteristics that are chosen randomly.
△ Less
Submitted 16 February, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Smart Summarizer for Blind People
Authors:
Mona teja K,
Mohan Sai. S,
H S S S Raviteja D,
Sai Kushagra P V
Abstract:
In today's world, time is a very important resource. In our busy lives, most of us hardly have time to read the complete news so what we have to do is just go through the headlines and satisfy ourselves with that. As a result, we might miss a part of the news or misinterpret the complete thing. The situation is even worse for the people who are visually impaired or have lost their ability to see.…
▽ More
In today's world, time is a very important resource. In our busy lives, most of us hardly have time to read the complete news so what we have to do is just go through the headlines and satisfy ourselves with that. As a result, we might miss a part of the news or misinterpret the complete thing. The situation is even worse for the people who are visually impaired or have lost their ability to see. The inability of these people to read text has a huge impact on their lives. There are a number of methods for blind people to read the text. Braille script, in particular, is one of the examples, but it is a highly inefficient method as it is really time taking and requires a lot of practice. So, we present a method for visually impaired people based on the sense of sound which is obviously better and more accurate than the sense of touch. This paper deals with an efficient method to summarize news into important keywords so as to save the efforts to go through the complete text every single time. This paper deals with many API's and modules like the tesseract, GTTS, and many algorithms that have been discussed and implemented in detail such as Luhn's Algorithm, Latent Semantic Analysis Algorithm, Text Ranking Algorithm. And the other functionality that this paper deals with is converting the summarized text to speech so that the system can aid even the blind people.
△ Less
Submitted 1 January, 2020;
originally announced January 2020.
-
A Novel Beamformed Control Channel Design for LTE with Full Dimension-MIMO
Authors:
Pavan Reddy M.,
Harish Kumar D.,
Saidhiraj Amuru,
Kiran Kuchi
Abstract:
The Full Dimension-MIMO (FD-MIMO) technology is capable of achieving huge improvements in network throughput with simultaneous connectivity of a large number of mobile wireless devices, unmanned aerial vehicles, and the Internet of Things (IoT). In FD-MIMO, with a large number of antennae at the base station and the ability to perform beamforming, the capacity of the physical downlink shared chann…
▽ More
The Full Dimension-MIMO (FD-MIMO) technology is capable of achieving huge improvements in network throughput with simultaneous connectivity of a large number of mobile wireless devices, unmanned aerial vehicles, and the Internet of Things (IoT). In FD-MIMO, with a large number of antennae at the base station and the ability to perform beamforming, the capacity of the physical downlink shared channel (PDSCH) has increased a lot. However, the current specifications of the 3rd Generation Partnership Project (3GPP) does not allow the base station to perform beamforming techniques for the physical downlink control channel (PDCCH), and hence, PDCCH has neither the capacity nor the coverage of PDSCH. Therefore, PDCCH capacity will still limit the performance of a network as it dictates the number of users that can be scheduled at a given time instant. In Release 11, 3GPP introduced enhanced PDCCH (EPDCCH) to increase the PDCCH capacity at the cost of sacrificing the PDSCH resources. The problem of enhancing the PDCCH capacity within the available control channel resources has not been addressed yet in the literature. Hence, in this paper, we propose a novel beamformed PDCCH (BF-PDCCH) design which is aligned to the 3GPP specifications and requires simple software changes at the base station. We rely on the sounding reference signals transmitted in the uplink to decide the best beam for a user and ingeniously schedule the users in PDCCH. We perform system level simulations to evaluate the performance of the proposed design and show that the proposed BF-PDCCH achieves larger network throughput when compared with the current state of art algorithms, PDCCH and EPDCCH schemes.
△ Less
Submitted 14 May, 2019;
originally announced May 2019.
-
Parametic Classification of Handvein Patterns Based on Texture Features
Authors:
Harbi AlMahafzah,
Mohammad Imranand,
Supreetha Gowda H. D.
Abstract:
In this paper, we have developed Biometric recognition system adopting hand based modality Handvein, which has the unique pattern for each individual and it is impossible to counterfeit and fabricate as it is an internal feature. We have opted in choosing feature extraction algorithms such as LBP-visual descriptor ,LPQ-blur insensitive texture operator, Log-Gabor-Texture descriptor. We have chosen…
▽ More
In this paper, we have developed Biometric recognition system adopting hand based modality Handvein, which has the unique pattern for each individual and it is impossible to counterfeit and fabricate as it is an internal feature. We have opted in choosing feature extraction algorithms such as LBP-visual descriptor ,LPQ-blur insensitive texture operator, Log-Gabor-Texture descriptor. We have chosen well known classifiers such as KNN and SVM for classification. We have experimented and tabulated results of single algorithm recognition rate for Handvein under different distance measures and kernel options. The feature level fusion is carried out which increased the performance level.
△ Less
Submitted 21 March, 2019;
originally announced March 2019.
-
Dynamic smooth compressed quadtrees (Fullversion)
Authors:
Ivor Hoog v. d.,
Elena Khramtcova,
Maarten Löffler
Abstract:
We introduce dynamic smooth (a.k.a. balanced) compressed quadtrees with worst-case constant time updates in constant dimensions. We distinguish two versions of the problem. First, we show that quadtrees as a space-division data structure can be made smooth and dynamic subject to split and merge operations on the quadtree cells. Second, we show that quadtrees used to store a set of points in…
▽ More
We introduce dynamic smooth (a.k.a. balanced) compressed quadtrees with worst-case constant time updates in constant dimensions. We distinguish two versions of the problem. First, we show that quadtrees as a space-division data structure can be made smooth and dynamic subject to split and merge operations on the quadtree cells. Second, we show that quadtrees used to store a set of points in $\mathbb{R}^d$ can be made smooth and dynamic subject to insertions and deletions of points. The second version uses the first but must additionally deal with compression and alignment of quadtree components. In both cases our updates take $2^{\mathcal{O}(d\log d )}$ time, except for the point location part in the second version which has a lower bound of $Θ(\log n)$---but if a pointer (finger) to the correct quadtree cell is given, the rest of the updates take worst-case constant time. Our result implies that several classic and recent results (ranging from ray tracing to planar point location) in computational geometry which use quadtrees can deal with arbitrary point sets on a real RAM pointer machine.
△ Less
Submitted 22 February, 2018; v1 submitted 15 December, 2017;
originally announced December 2017.