Search | arXiv e-print repository

doi 10.1109/ICASSP49357.2023.10095168

Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

Authors: Esaú Villatoro-Tello, Srikanth Madikeri, Juan Zuluaga-Gomez, Bidisha Sharma, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Petr Motlicek, Alexei V. Ivanov, Aravind Ganapathiraju

Abstract: In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable perfo… ▽ More In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable performance of different state-of-the-art SLU systems under different circumstances, e.g., automatically- vs. manually-generated transcripts. We evaluate the systems on the publicly available SLURP spoken language resource corpus. Our results indicate that using richer forms of Automatic Speech Recognition (ASR) outputs, namely word-consensus-networks, allows the SLU system to improve in comparison to the 1-best setup (5.5% relative improvement). However, crossmodal approaches, i.e., learning from acoustic and text embeddings, obtains performance similar to the oracle setup, a relative improvement of 17.8% over the 1-best configuration, being a recommended alternative to overcome the limitations of working with automatically generated transcripts. △ Less

Submitted 17 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

Comments: Accepted in ICASSP 2023

ACM Class: I.2.7

Journal ref: ICASSP 2023

arXiv:2211.01731 [pdf]

Data Converter Design Space Exploration for IoT Applications: An Overview of Challenges and Future Directions

Authors: Buddhi Prakash Sharma, Anu Gupta, Chandra Shekhar

Abstract: Human lives are improving with the widespread use of cutting-edge digital technology like the Internet of Things (IoT). Recently, the pandemic has shown the demand for more digitally advanced IoT-based devices. International Data Corporation (IDC) forecasts that by 2025, there will be approximately 42 billion of these devices in use, capable of producing around 80 ZB (zettabytes) of data. So data… ▽ More Human lives are improving with the widespread use of cutting-edge digital technology like the Internet of Things (IoT). Recently, the pandemic has shown the demand for more digitally advanced IoT-based devices. International Data Corporation (IDC) forecasts that by 2025, there will be approximately 42 billion of these devices in use, capable of producing around 80 ZB (zettabytes) of data. So data acquisition, processing, communication, and visualization are necessary from a functional standpoint. Indicating sensors & data converters are the key components for IoT-based applications. The efficiency of such applications is truly measured in terms of latency, power, and resolution of data converters motivating designers to perform efficiently. Sensors capture and covert physical features from their chosen environment into detectable quantities. Data converter gives meaningful information and connects the real analog world to the digital component of the devices. The received data is interpreted and analyzed with the digital processing circuitry. Ultimately, it is used as information by a network of internet-connected smart devices. Because IoT technologies are adaptable to nearly any technology that may provide its operational activity and environmental conditions. But the challenges occur with power consumption as the complete IoT framework is battery operated and replacing a battery is a daunting task. So the goal of this chapter is to unveil the requirements to design energy-efficient data converters for IoT applications. △ Less

Submitted 3 November, 2022; originally announced November 2022.

arXiv:2205.01606 [pdf, ps, other]

GITz: Graphene-assisted IRS Design for THz Communication

Authors: Bhupendra Sharma, Anirudh Agarwal, Deepak Mishra, Soumitra Debnath

Abstract: Graphene-based intelligent reflecting surface (GIRS) has been proved to provide a promising propagation environment to enhance the quality of high frequency terahertz (THz) wireless communication. In this paper, we characterize GIRS for THz communication (GITz) using material specific parameters of graphene to tune the reflection of the incident wave at IRS. In particular, we propose a GITz design… ▽ More Graphene-based intelligent reflecting surface (GIRS) has been proved to provide a promising propagation environment to enhance the quality of high frequency terahertz (THz) wireless communication. In this paper, we characterize GIRS for THz communication (GITz) using material specific parameters of graphene to tune the reflection of the incident wave at IRS. In particular, we propose a GITz design model considering the incident signal frequency material level parameters like conductivity, Fermi-level, patch width to control the reflection amplitude (RA) at the communication receiver. We have obtained the closed-form expression of RA for an accurate design and characterization of GIRS, which is incomplete in the existing research due to the inclusion of only phase-shift. The numerical simulation results demonstrate the effectiveness of the proposed characterization by providing key insights. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: 5 pages, 8 figures, 1 table, accepted n IEEE VTC 2022 (ETTCOM Workshop)

arXiv:2112.05944 [pdf, ps, other]

Circuit Characterization of IRS to Control Beamforming Design for Efficient Wireless Communication

Authors: Bhupendra Sharma, Anirudh Agarwal, Deepak Mishra, Soumitra Debnath

Abstract: Intelligent reflecting surface (IRS) has emerged as a transforming solution to enrich wireless communications by efficiently reconfiguring the propagation environment. In this paper, a novel IRS circuit characterization model is proposed for practical beamforming design incorporating various electrical parameters of the meta-surface unit cell. Specifically, we have modelled the IRS control paramet… ▽ More Intelligent reflecting surface (IRS) has emerged as a transforming solution to enrich wireless communications by efficiently reconfiguring the propagation environment. In this paper, a novel IRS circuit characterization model is proposed for practical beamforming design incorporating various electrical parameters of the meta-surface unit cell. Specifically, we have modelled the IRS control parameters, phase shift (PS) and reflection amplitude (RA) at the communication receiver, in addition to the circuit level parameter, variable effective capacitance $C$ of IRS unit cell. We have obtained closed-form expressions of PS, RA and $C$ in terms of transmission frequency of signal incident to IRS and various electrical parameters of IRS circuit, with a novel touch towards an accurate analytical model for a better beamforming design perspective. Numerical results demonstrate the efficacy of the proposed characterization thereby providing key design insights. △ Less

Submitted 11 December, 2021; originally announced December 2021.

Comments: 6 pages, 5 figures, Accepted in 2022 IEEE Wireless Communications and Networking Conference (WCNC)

arXiv:2108.02539 [pdf, other]

SLoClas: A Database for Joint Sound Localization and Classification

Authors: Xinyuan Qian, Bidisha Sharma, Amine El Abridi, Haizhou Li

Abstract: In this work, we present the development of a new database, namely Sound Localization and Classification (SLoClas) corpus, for studying and analyzing sound localization and classification. The corpus contains a total of 23.27 hours of data recorded using a 4-channel microphone array. 10 classes of sounds are played over a loudspeaker at 1.5 meters distance from the array by varying the Direction-o… ▽ More In this work, we present the development of a new database, namely Sound Localization and Classification (SLoClas) corpus, for studying and analyzing sound localization and classification. The corpus contains a total of 23.27 hours of data recorded using a 4-channel microphone array. 10 classes of sounds are played over a loudspeaker at 1.5 meters distance from the array by varying the Direction-of-Arrival (DoA) from 1 degree to 360 degree at an interval of 5 degree. To facilitate the study of noise robustness, 6 types of outdoor noise are recorded at 4 DoAs, using the same devices. Moreover, we propose a baseline method, namely Sound Localization and Classification Network (SLCnet) and present the experimental results and analysis conducted on the collected SLoClas database. We achieve the accuracy of 95.21% and 80.01% for sound localization and classification, respectively. We publicly release this database and the source code for research purpose. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: Submitted to O-COCOSDA 2021

arXiv:2107.00297 [pdf, ps, other]

doi 10.1109/TASLP.2016.2641901

Sonority Measurement Using System, Source, and Suprasegmental Information

Authors: Bidisha Sharma, S. R. Mahadeva Prasanna

Abstract: Sonorant sounds are characterized by regions with prominent formant structure, high energy and high degree of periodicity. In this work, the vocal-tract system, excitation source and suprasegmental features derived from the speech signal are analyzed to measure the sonority information present in each of them. Vocal-tract system information is extracted from the Hilbert envelope of numerator of gr… ▽ More Sonorant sounds are characterized by regions with prominent formant structure, high energy and high degree of periodicity. In this work, the vocal-tract system, excitation source and suprasegmental features derived from the speech signal are analyzed to measure the sonority information present in each of them. Vocal-tract system information is extracted from the Hilbert envelope of numerator of group delay function. It is derived from zero time windowed speech signal that provides better resolution of the formants. A five-dimensional feature set is computed from the estimated formants to measure the prominence of the spectral peaks. A feature representing strength of excitation is derived from the Hilbert envelope of linear prediction residual, which represents the source information. Correlation of speech over ten consecutive pitch periods is used as the suprasegmental feature representing periodicity information. The combination of evidences from the three different aspects of speech provides better discrimination among different sonorant classes, compared to the baseline MFCC features. The usefulness of the proposed sonority feature is demonstrated in the tasks of phoneme recognition and sonorant classification. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 25, Issue: 3, March 2017)

arXiv:2012.00337 [pdf, other]

NHSS: A Speech and Singing Parallel Database

Authors: Bidisha Sharma, Xiaoxue Gao, Karthika Vijayan, Xiaohai Tian, Haizhou Li

Abstract: We present a database of parallel recordings of speech and singing, collected and released by the Human Language Technology (HLT) laboratory at the National University of Singapore (NUS), that is called NUS-HLT Speak-Sing (NHSS) database. We release this database to the public to support research activities, that include, but not limited to comparative studies of acoustic attributes of speech and… ▽ More We present a database of parallel recordings of speech and singing, collected and released by the Human Language Technology (HLT) laboratory at the National University of Singapore (NUS), that is called NUS-HLT Speak-Sing (NHSS) database. We release this database to the public to support research activities, that include, but not limited to comparative studies of acoustic attributes of speech and singing signals, cooperative synthesis of speech and singing voices, and speech-to-singing conversion. This database consists of recordings of sung vocals of English pop songs, the spoken counterpart of lyrics of the songs read by the singers in their natural reading manner, and manually prepared utterance-level and word-level annotations. The audio recordings in the NHSS database correspond to 100 songs sung and spoken by 10 singers, resulting in a total of 7 hours of audio data. There are 5 male and 5 female singers, singing and reading the lyrics of 10 songs each. In this paper, we discuss the design methodology of the database, analyse the similarities and dissimilarities in characteristics of speech and singing voices, and provide some strategies to address relationships between these characteristics for converting one to another. We develop benchmark systems, which can be used as reference for speech-to-singing alignment, spectral map**, and conversion using the NHSS database. △ Less

Submitted 5 August, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: Accepted to Speech Communication

Showing 1–7 of 7 results for author: Sharma, B