Skip to main content

Showing 1–50 of 56 results for author: Chowdhury, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17337  [pdf, other

    eess.SY

    Robust Pareto Design of GaN HEMTs for Millimeter-Wave Applications

    Authors: Rafael Perez Martinez, Stephen Boyd, Srabanti Chowdhury

    Abstract: This paper introduces a robust Pareto design approach for selecting Gallium Nitride (GaN) High Electron Mobility Transistors (HEMTs), particularly for power amplifier (PA) and low-noise amplifier (LNA) designs in 5G applications. We consider five key design variables and two settings (PAs and LNAs) where we have multiple objectives. We assess designs based on three critical objectives, evaluating… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.16355  [pdf, other

    cs.LG eess.SY

    Compact Model Parameter Extraction via Derivative-Free Optimization

    Authors: Rafael Perez Martinez, Masaya Iwamoto, Kelly Woo, Zhengliang Bian, Roberto Tinti, Stephen Boyd, Srabanti Chowdhury

    Abstract: In this paper, we address the problem of compact model parameter extraction to simultaneously extract tens of parameters via derivative-free optimization. Traditionally, parameter extraction is performed manually by dividing the complete set of parameters into smaller subsets, each targeting different operational regions of the device, a process that can take several days or even weeks. Our approa… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2406.16099  [pdf, other

    cs.SD eess.AS

    Speech Representation Analysis based on Inter- and Intra-Model Similarities

    Authors: Yassine El Kheir, Ahmed Ali, Shammur Absar Chowdhury

    Abstract: Self-supervised models have revolutionized speech processing, achieving new levels of performance in a wide variety of tasks with limited resources. However, the inner workings of these models are still opaque. In this paper, we aim to analyze the encoded contextual representation of these foundation models based on their inter- and intra-model similarity, independent of any external annotation an… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 5 pages, Accepted to appear in ICASSP XAI-SA Workshop

  4. arXiv:2406.14850  [pdf, other

    eess.AS

    DExter: Learning and Controlling Performance Expression with Diffusion Models

    Authors: Huan Zhang, Shreyan Chowdhury, Carlos Eduardo Cancino-Chacón, **hua Liang, Simon Dixon, Gerhard Widmer

    Abstract: In the pursuit of develo** expressive music performance models using artificial intelligence, this paper introduces DExter, a new approach leveraging diffusion probabilistic models to render Western classical piano performances. In this approach, performance parameters are represented in a continuous expression space and a diffusion model is trained to predict these continuous parameters while b… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: in submission to appsci special session

  5. arXiv:2406.13431  [pdf, other

    cs.CL cs.SD eess.AS

    Children's Speech Recognition through Discrete Token Enhancement

    Authors: Vrunda N. Sukhadia, Shammur Absar Chowdhury

    Abstract: Children's speech recognition is considered a low-resource task mainly due to the lack of publicly available data. There are several reasons for such data scarcity, including expensive data collection and annotation processes, and data privacy, among others. Transforming speech signals into discrete tokens that do not carry sensitive information but capture both linguistic and acoustic information… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  6. arXiv:2406.04673  [pdf, other

    cs.CV cs.AI cs.MM eess.AS

    MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

    Authors: Sanjoy Chowdhury, Sayan Nag, K J Joseph, Balaji Vasan Srinivasan, Dinesh Manocha

    Abstract: Music is a universal language that can communicate emotions and feelings. It forms an essential part of the whole spectrum of creative media, ranging from movies to social media posts. Machine learning models that can synthesize music are predominantly conditioned on textual descriptions of it. Inspired by how musicians compose music not just from a movie script, but also through visualizations, w… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted at CVPR 2024 as Highlight paper. Webpage: https://schowdhury671.github.io/melfusion_cvpr2024/

  7. arXiv:2405.10435  [pdf, other

    eess.SY

    Two-Stage Stochastic Optimal Power Flow for Microgrids With Uncertain Wildfire Effects

    Authors: Sifat Chowdhury, Yu Zhang

    Abstract: Large-scale power outages caused by extreme weather events are one of the major factors weakening grid resilience. In order to prevent the critical infrastructure from cascading failure, power lines are often proactively de-energized under the threat of a progressing wildfire. In this context, the potential of microgrid (MG) functioning in islanded mode can be exploited to enhance the resiliency o… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  8. arXiv:2403.07125  [pdf, other

    eess.SY

    Learning-Aided Control of Robotic Tether-Net with Maneuverable Nodes to Capture Large Space Debris

    Authors: Achira Boonrath, Feng Liu, Elenora M. Botta, Souma Chowdhury

    Abstract: Maneuverable tether-net systems launched from an unmanned spacecraft offer a promising solution for the active removal of large space debris. Guaranteeing the successful capture of such space debris is dependent on the ability to reliably maneuver the tether-net system -- a flexible, many-DoF (thus complex) system -- for a wide range of launch scenarios. Here, scenarios are defined by the relative… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: This paper was accepted for presentation in proceedings of IEEE International Conference on Robotics and Automation 2024

  9. arXiv:2403.01789  [pdf, other

    cs.CR eess.SY

    DECOR: Enhancing Logic Locking Against Machine Learning-Based Attacks

    Authors: Yinghua Hu, Kaixin Yang, Subhajit Dutta Chowdhury, Pierluigi Nuzzo

    Abstract: Logic locking (LL) has gained attention as a promising intellectual property protection measure for integrated circuits. However, recent attacks, facilitated by machine learning (ML), have shown the potential to predict the correct key in multiple LL schemes by exploiting the correlation of the correct key value with the circuit structure. This paper presents a generic LL enhancement method based… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 8 pages. Accepted at the International Symposium on Quality Electronic Design (ISQED), 2024

  10. arXiv:2401.14826  [pdf, other

    cs.SD cs.IR eess.AS

    Expressivity-aware Music Performance Retrieval using Mid-level Perceptual Features and Emotion Word Embeddings

    Authors: Shreyan Chowdhury, Gerhard Widmer

    Abstract: This paper explores a specific sub-task of cross-modal music retrieval. We consider the delicate task of retrieving a performance or rendition of a musical piece based on a description of its style, expressive character, or emotion from a set of different performances of the same piece. We observe that a general purpose cross-modal system trained to learn a common text-audio embedding space does n… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: Presented at FIRE 2023 (Forum for Information Retrieval Evaluation) conference, Goa, India

  11. arXiv:2401.04864  [pdf

    eess.SY physics.ins-det

    Microgravity Mass Gauging with Capacitance Sensing: Sensor Design and Experiment

    Authors: M. A. Charleston, S. M. Chowdhury, Q. M. Marashdeh, B. J. Straiton, F. L. Teixeira

    Abstract: The use of capacitance sensors for fuel mass gauging has been in consideration since the early days of manned space flight. However, certain difficulties arise when considering tanks in microgravity environments. Surface tension effects lead to fluid wetting of the interior surface of the tank, leaving large interior voids, while thrust/settling effects can lead to dispersed two-phase mixtures. Wi… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 19 pages, 26 figures, 5 tables

  12. arXiv:2310.13974  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Pronunciation Assessment -- A Review

    Authors: Yassine El Kheir, Ahmed Ali, Shammur Absar Chowdhury

    Abstract: Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challeng… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: 9 pages, accepted to EMNLP Findings

  13. arXiv:2309.15674  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Speech collage: code-switched audio generation by collaging monolingual corpora

    Authors: Amir Hussein, Dorsa Zeinali, Ondřej Klejch, Matthew Wiesner, Brian Yan, Shammur Chowdhury, Ahmed Ali, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from monolingual corpora by splicing audio segments. We further improve the smoothness quality of audio generation using an overlap-add approach. We… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  14. arXiv:2309.07739  [pdf, other

    cs.CL cs.SD eess.AS

    The complementary roles of non-verbal cues for Robust Pronunciation Assessment

    Authors: Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: Research on pronunciation assessment systems focuses on utilizing phonetic and phonological aspects of non-native (L2) speech, often neglecting the rich layer of information hidden within the non-verbal cues. In this study, we proposed a novel pronunciation assessment framework, IntraVerbalPA. % The framework innovatively incorporates both fine-grained frame- and abstract utterance-level non-verba… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 5 pages, submitted to ICASSP 2024

  15. arXiv:2309.07719  [pdf, other

    cs.CL cs.SD eess.AS

    L1-aware Multilingual Mispronunciation Detection Framework

    Authors: Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: The phonological discrepancies between a speaker's native (L1) and the non-native language (L2) serves as a major factor for mispronunciation. This paper introduces a novel multilingual MDD architecture, L1-MultiMDD, enriched with L1-aware speech representation. An end-to-end speech encoder is trained on the input signal and its corresponding reference phoneme sequence. First, an attention mechani… ▽ More

    Submitted 21 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: 5 papers, submitted to ICASSP 2024

  16. arXiv:2308.12370  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    AdVerb: Visually Guided Audio Dereverberation

    Authors: Sanjoy Chowdhury, Sreyan Ghosh, Subhrajyoti Dasgupta, Anton Ratnarajah, Utkarsh Tyagi, Dinesh Manocha

    Abstract: We present AdVerb, a novel audio-visual dereverberation framework that uses visual cues in addition to the reverberant sound to estimate clean audio. Although audio-only dereverberation is a well-studied problem, our approach incorporates the complementary visual modality to perform audio dereverberation. Given an image of the environment where the reverberated sound signal has been recorded, AdVe… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023. For project page, see https://gamma.umd.edu/researchdirections/speech/adverb

  17. arXiv:2308.02503  [pdf, other

    eess.AS cs.CL cs.SD

    MyVoice: Arabic Speech Resource Collaboration Platform

    Authors: Yousseif Elshahawy, Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: We introduce MyVoice, a crowdsourcing platform designed to collect Arabic speech to enhance dialectal speech technologies. This platform offers an opportunity to design large dialectal speech datasets; and makes them publicly available. MyVoice allows contributors to select city/country-level fine-grained dialect and record the displayed utterances. Users can switch roles between contributors and… ▽ More

    Submitted 23 July, 2023; originally announced August 2023.

    Comments: 2 pages, accepted at InterSpeech23 Show and Tell Session

  18. arXiv:2307.03061  [pdf, other

    eess.SY

    Learning Constrained Corner Node Trajectories of a Tether Net System for Space Debris Capture

    Authors: Feng Liu, Achira Boonrath, Prajit KrisshnaKumar, Elenora M. Botta, Souma Chowdhury

    Abstract: The earth's orbit is becoming increasingly crowded with debris that poses significant safety risks to the operation of existing and new spacecraft and satellites. The active tether-net system, which consists of a flexible net with maneuverable corner nodes launched from a small autonomous spacecraft, is a promising solution for capturing and disposing of such space debris. The requirement of auton… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: This paper was presented at AIAA Aviation 2023 Forum

  19. arXiv:2306.01845  [pdf, other

    cs.SD eess.AS

    Multi-View Multi-Task Representation Learning for Mispronunciation Detection

    Authors: Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: The disparity in phonology between learner's native (L1) and target (L2) language poses a significant challenge for mispronunciation detection and diagnosis (MDD) systems. This challenge is further intensified by lack of annotated L2 data. This paper proposes a novel MDD architecture that exploits multiple `views' of the same input data assisted by auxiliary tasks to learn more distinctive phoneti… ▽ More

    Submitted 7 August, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: 5 pages, Accepted SLaTE23

  20. arXiv:2305.09688  [pdf

    eess.AS cs.CL cs.LG

    OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking

    Authors: Fazle Rabbi Rakib, Souhardya Saha Dip, Samiul Alam, Nazia Tasnim, Md. Istiak Hossain Shihab, Md. Nazmuddoha Ansary, Syed Mobassir Hossen, Marsia Haque Meghla, Mamunur Mamun, Farig Sadeque, Sayma Sultana Chowdhury, Tahsin Reasat, Asif Sushmit, Ahmed Imtiaz Humayun

    Abstract: We present OOD-Speech, the first out-of-distribution (OOD) benchmarking dataset for Bengali automatic speech recognition (ASR). Being one of the most spoken languages globally, Bengali portrays large diversity in dialects and prosodic features, which demands ASR frameworks to be robust towards distribution shifts. For example, islamic religious sermons in Bengali are delivered with a tonality that… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  21. arXiv:2305.07790  [pdf

    cond-mat.mtrl-sci cs.CV eess.IV

    Automated Grain Boundary (GB) Segmentation and Microstructural Analysis in 347H Stainless Steel Using Deep Learning and Multimodal Microscopy

    Authors: Shoieb Ahmed Chowdhury, M. F. N. Taufique, **g Wang, Marissa Masden, Madison Wenzlick, Ram Devanathan, Alan L Schemer-Kohrn, Keerti S Kappagantula

    Abstract: Austenitic 347H stainless steel offers superior mechanical properties and corrosion resistance required for extreme operating conditions such as high temperature. The change in microstructure due to composition and process variations is expected to impact material properties. Identifying microstructural features such as grain boundaries thus becomes an important task in the process-microstructure-… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  22. arXiv:2305.07445  [pdf, other

    eess.AS cs.CL cs.SD

    QVoice: Arabic Speech Pronunciation Learning Application

    Authors: Yassine El Kheir, Fouad Khnaisser, Shammur Absar Chowdhury, Hamdy Mubarak, Shazia Afzal, Ahmed Ali

    Abstract: This paper introduces a novel Arabic pronunciation learning application QVoice, powered with end-to-end mispronunciation detection and feedback generator module. The application is designed to support non-native Arabic speakers in enhancing their pronunciation skills, while also hel** native speakers mitigate any potential influence from regional dialects on their Modern Standard Arabic (MSA) pr… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: 2 pages, Accepted InterSpeech23 Show & Tell Demo Session

    Journal ref: InterSpeech 2023

  23. arXiv:2304.00649  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Multilingual Word Error Rate Estimation: e-WER3

    Authors: Shammur Absar Chowdhury, Ahmed Ali

    Abstract: The success of the multilingual automatic speech recognition systems empowered many voice-driven applications. However, measuring the performance of such systems remains a major challenge, due to its dependency on manually transcribed speech data in both mono- and multilingual scenarios. In this paper, we propose a novel multilingual framework -- eWER3 -- jointly trained on acoustic and lexical re… ▽ More

    Submitted 2 April, 2023; originally announced April 2023.

    Comments: Accepted in ICASSP, Multilingual WER estimation, End-to-End systems, multilingual model, automatic word error rate estimation

  24. arXiv:2303.01875  [pdf, other

    cs.SD eess.AS

    Decoding and Visualising Intended Emotion in an Expressive Piano Performance

    Authors: Shreyan Chowdhury, Gerhard Widmer

    Abstract: Expert musicians can mould a musical piece to convey specific emotions that they intend to communicate. In this paper, we place a mid-level features based music emotion model in this performer-to-listener communication scenario, and demonstrate via a small visualisation music emotion decoding in real time. We also extend the existing set of mid-level features using analogues of perceptual speed an… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: Extended version of Late-Breaking Demo Session paper accepted at ISMIR 2022 (23rd Int. Society for Music Information Retrieval Conf., Bengaluru, India, 2022)

  25. arXiv:2302.13921  [pdf, other

    eess.IV eess.SP

    Autonomous Polycrystalline Material Decomposition for Hyperspectral Neutron Tomography

    Authors: Mohammad Samin Nur Chowdhury, Diyu Yang, Shimin Tang, Singanallur V. Venkatakrishnan, Hassina Z. Bilheux, Gregery T. Buzzard, Charles A. Bouman

    Abstract: Hyperspectral neutron tomography is an effective method for analyzing crystalline material samples with complex compositions in a non-destructive manner. Since the counts in the hyperspectral neutron radiographs directly depend on the neutron cross-sections, materials may exhibit contrasting neutron responses across wavelengths. Therefore, it is possible to extract the unique signatures associated… ▽ More

    Submitted 21 August, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  26. arXiv:2212.00647  [pdf, other

    eess.IV physics.med-ph

    An Edge Alignment-based Orientation Selection Method for Neutron Tomography

    Authors: Diyu Yang, Shimin Tang, Singanallur V. Venkatakrishnan, Mohammad S. N. Chowdhury, Yuxuan Zhang, Hassina Z. Bilheux, Gregery T. Buzzard, Charles A. Bouman

    Abstract: Neutron computed tomography (nCT) is a 3D characterization technique used to image the internal morphology or chemical composition of samples in biology and materials sciences. A typical workflow involves placing the sample in the path of a neutron beam, acquiring projection data at a predefined set of orientations, and processing the resulting data using an analytic reconstruction algorithm. Typi… ▽ More

    Submitted 8 March, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

  27. arXiv:2211.16319  [pdf, other

    eess.AS cs.CL cs.SD

    Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition

    Authors: Injy Hamed, Amir Hussein, Oumnia Chellah, Shammur Chowdhury, Hamdy Mubarak, Sunayana Sitaram, Nizar Habash, Ahmed Ali

    Abstract: Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition. In this paper, we focus on the question of robust and fair evaluation metrics. To that end, we develop a reference benchmark data set of code-switching speech recognition hypotheses with human judgments. We define clear guidelines for minimal editing of automatic hypotheses. We validate the… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: Accepted to SLT 2022

  28. arXiv:2211.00923  [pdf, other

    cs.SD cs.CL eess.AS

    SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation

    Authors: Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali, Hamdy Mubarak, Shazia Afzal

    Abstract: The lack of labeled second language (L2) speech data is a major challenge in designing mispronunciation detection models. We introduce SpeechBlender - a fine-grained data augmentation pipeline for generating mispronunciation errors to overcome such data scarcity. The SpeechBlender utilizes varieties of masks to target different regions of phonetic units, and use the mixing factors to linearly inte… ▽ More

    Submitted 12 July, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: 5 pages

  29. arXiv:2206.08835  [pdf, other

    cs.CL cs.SD eess.AS

    What can Speech and Language Tell us About the Working Alliance in Psychotherapy

    Authors: Sebastian P. Bayerl, Gabriel Roccabruna, Shammur Absar Chowdhury, Tommaso Ciulli, Morena Danieli, Korbinian Riedhammer, Giuseppe Riccardi

    Abstract: We are interested in the problem of conversational analysis and its application to the health domain. Cognitive Behavioral Therapy is a structured approach in psychotherapy, allowing the therapist to help the patient to identify and modify the malicious thoughts, behavior, or actions. This cooperative effort can be evaluated using the Working Alliance Inventory Observer-rated Shortened - a 12 item… ▽ More

    Submitted 27 June, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted at Interspeech 2022

  30. arXiv:2205.11992  [pdf, other

    eess.SY

    Co-optimization of Battery Routing and Load Restoration for Microgrids with Mobile Energy Storage Systems

    Authors: Shourya Bose, Sifat Chowdhury, Yu Zhang

    Abstract: Mobile energy storage systems (MESS) offer great operational flexibility to enhance the resiliency of distribution systems in an emergency condition. The optimal placement and sizing of those units are pivotal for quickly restoring the curtailed loads. In this paper, we propose a model for load restoration in a microgrid while concurrently optimizing the MESS routes required for the same. The mode… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: PRES GM 2022 Conference

  31. arXiv:2201.02550  [pdf, other

    cs.CL cs.SD eess.AS

    Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition

    Authors: Amir Hussein, Shammur Absar Chowdhury, Ahmed Abdelali, Najim Dehak, Ahmed Ali, Sanjeev Khudanpur

    Abstract: The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language. Designing a CS-ASR system has many challenges, mainly due to data scarcity, grammatical structure complexity, and domain mismatch. The most common method for addressing CS is to train an ASR system with the available transcribed CS speech, along with mono… ▽ More

    Submitted 11 January, 2023; v1 submitted 7 January, 2022; originally announced January 2022.

  32. arXiv:2110.00728  [pdf

    eess.SY cs.LG

    Implementation of MPPT Technique of Solar Module with Supervised Machine Learning

    Authors: Ruhi Sharmin, Sayeed Shafayet Chowdhury, Farihal Abedin, Kazi Mujibur Rahman

    Abstract: In this paper, we proposed a method using supervised ML in solar PV system for MPPT analysis. For this purpose, an overall schematic diagram of a PV system is designed and simulated to create a dataset in MATLAB/ Simulink. Thus, by analyzing the output characteristics of a solar cell, an improved MPPT algorithm on the basis of neural network (NN) method is put forward to track the maximum power po… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

    Comments: 11 pages, 11 Figures, 5 Tables

  33. Moving Object Detection for Event-based vision using Graph Spectral Clustering

    Authors: Anindya Mondal, Shashant R, Jhony H. Giraldo, Thierry Bouwmans, Ananda S. Chowdhury

    Abstract: Moving object detection has been a central topic of discussion in computer vision for its wide range of applications like in self-driving cars, video surveillance, security, and enforcement. Neuromorphic Vision Sensors (NVS) are bio-inspired sensors that mimic the working of the human eye. Unlike conventional frame-based cameras, these sensors capture a stream of asynchronous 'events' that pose mu… ▽ More

    Submitted 2 December, 2021; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: Ten pages, five figures, Published in 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada

  34. arXiv:2108.08952  [pdf, other

    cs.LG eess.SP

    Mitigating Greenhouse Gas Emissions Through Generative Adversarial Networks Based Wildfire Prediction

    Authors: Sifat Chowdhury, Kai Zhu, Yu Zhang

    Abstract: Over the past decade, the number of wildfire has increased significantly around the world, especially in the State of California. The high-level concentration of greenhouse gas (GHG) emitted by wildfires aggravates global warming that further increases the risk of more fires. Therefore, an accurate prediction of wildfire occurrence greatly helps in preventing large-scale and long-lasting wildfires… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

  35. arXiv:2107.13231  [pdf, other

    cs.SD eess.AS

    On Perceived Emotion in Expressive Piano Performance: Further Experimental Evidence for the Relevance of Mid-level Perceptual Features

    Authors: Shreyan Chowdhury, Gerhard Widmer

    Abstract: Despite recent advances in audio content-based music emotion recognition, a question that remains to be explored is whether an algorithm can reliably discern emotional or expressive qualities between different performances of the same piece. In the present work, we analyze several sets of features on their effectiveness in predicting arousal and valence of six different performances (by six famous… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

    Comments: In Proceedings of the 22nd International Society for Music Information Retrieval (ISMIR) Conference, Online, 2021

  36. arXiv:2107.01573  [pdf, other

    cs.CL cs.SD eess.AS

    Arabic Code-Switching Speech Recognition using Monolingual Data

    Authors: Ahmed Ali, Shammur Chowdhury, Amir Hussein, Yasser Hifny

    Abstract: Code-switching in automatic speech recognition (ASR) is an important challenge due to globalization. Recent research in multilingual ASR shows potential improvement over monolingual systems. We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments. Our innovative framework deploys a multi-graph approach in the weighted finite state transducers (W… ▽ More

    Submitted 4 July, 2021; originally announced July 2021.

    Comments: Accepted in Interspeech 2021, speech recognition, code-switching, ASR, transformer, WFST, graph approach

  37. arXiv:2107.00439  [pdf, other

    cs.CL cs.SD eess.AS

    What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis

    Authors: Shammur Absar Chowdhury, Nadir Durrani, Ahmed Ali

    Abstract: Deep neural networks are inherently opaque and challenging to interpret. Unlike hand-crafted feature-based models, we struggle to comprehend the concepts learned and how they interact within these models. This understanding is crucial not only for debugging purposes but also for ensuring fairness in ethical decision-making. In our study, we conduct a post-hoc functional interpretability analysis o… ▽ More

    Submitted 10 July, 2023; v1 submitted 1 July, 2021; originally announced July 2021.

    Comments: Accepted in CSL journal. Keywords: Speech, Neuron Analysis, Interpretibility, Diagnostic Classifier, AI explainability, End-to-End Architecture

  38. arXiv:2106.13000  [pdf, other

    cs.CL cs.SD eess.AS

    QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus

    Authors: Hamdy Mubarak, Amir Hussein, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: We introduce the largest transcribed Arabic speech corpus, QASR, collected from the broadcast domain. This multi-dialect speech dataset contains 2,000 hours of speech sampled at 16kHz crawled from Aljazeera news channel. The dataset is released with lightly supervised transcriptions, aligned with the audio segments. Unlike previous datasets, QASR contains linguistically motivated segmentation, pun… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: Speech Corpus, Spoken Conversation, ASR, Dialect Identification, Punctuation Restoration, Speaker Verification, NER, Named Entity, Arabic, Speaker gender, Turn-taking Accepted in ACL 2021

  39. arXiv:2106.07787  [pdf, other

    cs.SD cs.LG eess.AS

    Tracing Back Music Emotion Predictions to Sound Sources and Intuitive Perceptual Qualities

    Authors: Shreyan Chowdhury, Verena Praher, Gerhard Widmer

    Abstract: Music emotion recognition is an important task in MIR (Music Information Retrieval) research. Owing to factors like the subjective nature of the task and the variation of emotional cues between musical genres, there are still significant challenges in develo** reliable and generalizable models. One important step towards better models would be to understand what a model is actually learning from… ▽ More

    Submitted 16 June, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: In Proceedings of the 18th Sound and Music Computing Conference (SMC 2021)

  40. arXiv:2106.05885  [pdf, other

    cs.CL cs.SD eess.AS

    Balanced End-to-End Monolingual pre-training for Low-Resourced Indic Languages Code-Switching Speech Recognition

    Authors: Amir Hussein, Shammur Chowdhury, Najim Dehak, Ahmed Ali

    Abstract: The success in designing Code-Switching (CS) ASR often depends on the availability of the transcribed CS resources. Such dependency harms the development of ASR in low-resourced languages such as Bengali and Hindi. In this paper, we exploit the transfer learning approach to design End-to-End (E2E) CS ASR systems for the two low-resourced language pairs using different monolingual speech data and a… ▽ More

    Submitted 15 February, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

  41. arXiv:2105.14779  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR

    Authors: Shammur Absar Chowdhury, Amir Hussein, Ahmed Abdelali, Ahmed Ali

    Abstract: With the advent of globalization, there is an increasing demand for multilingual automatic speech recognition (ASR), handling language and dialectal variation of spoken content. Recent studies show its efficacy over monolingual systems. In this study, we design a large multilingual end-to-end ASR using self-attention based conformer architecture. We trained the system using Arabic (Ar), English (E… ▽ More

    Submitted 5 July, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: Accepted in INTERSPEECH 2021, Multilingual ASR, Multi-dialectal ASR, Code-Switching ASR, Arabic ASR, Conformer, Transformer, E2E ASR, Speech Recognition, ASR, Arabic, English, French

  42. arXiv:2104.12528  [pdf, other

    cs.LG eess.IV

    Spatio-Temporal Pruning and Quantization for Low-latency Spiking Neural Networks

    Authors: Sayeed Shafayet Chowdhury, Isha Garg, Kaushik Roy

    Abstract: Spiking Neural Networks (SNNs) are a promising alternative to traditional deep learning methods since they perform event-driven information processing. However, a major drawback of SNNs is high inference latency. The efficiency of SNNs could be enhanced using compression methods such as pruning and quantization. Notably, SNNs, unlike their non-spiking counterparts, consist of a temporal dimension,… ▽ More

    Submitted 28 April, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

  43. arXiv:2102.13479  [pdf, other

    cs.SD cs.LG eess.AS

    Towards Explaining Expressive Qualities in Piano Recordings: Transfer of Explanatory Features via Acoustic Domain Adaptation

    Authors: Shreyan Chowdhury, Gerhard Widmer

    Abstract: Emotion and expressivity in music have been topics of considerable interest in the field of music information retrieval. In recent years, mid-level perceptual features have been suggested as means to explain computational predictions of musical emotion. We find that the diversity of musical styles and genres in the available dataset for learning these features is not sufficient for models to gener… ▽ More

    Submitted 26 February, 2021; originally announced February 2021.

    Comments: 5 pages, 3 figures; accepted for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)

  44. CovTANet: A Hybrid Tri-level Attention Based Network for Lesion Segmentation, Diagnosis, and Severity Prediction of COVID-19 Chest CT Scans

    Authors: Tanvir Mahmud, Md. Jahin Alam, Sakib Chowdhury, Shams Nafisa Ali, Md Maisoon Rahman, Shaikh Anowarul Fattah, Mohammad Saquib

    Abstract: Rapid and precise diagnosis of COVID-19 is one of the major challenges faced by the global community to control the spread of this overgrowing pandemic. In this paper, a hybrid neural network is proposed, named CovTANet, to provide an end-to-end clinical diagnostic tool for early diagnosis, lesion segmentation, and severity prediction of COVID-19 utilizing chest computer tomography (CT) scans. A m… ▽ More

    Submitted 3 January, 2021; originally announced January 2021.

    Comments: 10 Pages, 8 figures. This article has been published in IEEE Transactions on Industrial Informatics

  45. arXiv:2008.02194  [pdf, other

    cs.SD cs.IR eess.AS

    On the Characterization of Expressive Performance in Classical Music: First Results of the Con Espressione Game

    Authors: Carlos Cancino-Chacón, Silvan Peter, Shreyan Chowdhury, Anna Aljanaki, Gerhard Widmer

    Abstract: A piece of music can be expressively performed, or interpreted, in a variety of ways. With the help of an online questionnaire, the Con Espressione Game, we collected some 1,500 descriptions of expressive character relating to 45 performances of 9 excerpts from classical piano pieces, played by different famous pianists. More specifically, listeners were asked to describe, using freely chosen word… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

    Comments: 8 pages, 2 figures, accepted for the 21st International Society for Music Information Retrieval Conference (ISMIR 2020)

  46. arXiv:2007.13503  [pdf, other

    eess.AS cs.LG cs.SD

    Receptive-Field Regularized CNNs for Music Classification and Tagging

    Authors: Khaled Koutini, Hamid Eghbal-Zadeh, Verena Haunschmid, Paul Primus, Shreyan Chowdhury, Gerhard Widmer

    Abstract: Convolutional Neural Networks (CNNs) have been successfully used in various Music Information Retrieval (MIR) tasks, both as end-to-end models and as feature extractors for more complex systems. However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in combination with more complex modules such as attention, and/or techniques such as pre-training on l… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

  47. arXiv:2007.10812  [pdf

    eess.IV cs.CV

    Anomaly Detection in Unsupervised Surveillance Setting Using Ensemble of Multimodal Data with Adversarial Defense

    Authors: Sayeed Shafayet Chowdhury, Kaji Mejbaul Islam, Rouhan Noor

    Abstract: Autonomous aerial surveillance using drone feed is an interesting and challenging research domain. To ensure safety from intruders and potential objects posing threats to the zone being protected, it is crucial to be able to distinguish between normal and abnormal states in real-time. Additionally, we also need to consider any device malfunction. However, the inherent uncertainty embedded within t… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:2006.03733

  48. arXiv:2007.08607  [pdf

    physics.bio-ph eess.SY physics.optics

    Optimization of Surface Plasmon Resonance Biosensor for Analysis of Lipid Molecules

    Authors: Ehsan Kabir, Syed Mohammad Ashab Uddin, Sayeed Shafayet Chowdhury

    Abstract: Surface Plasmon Resonance (SPR) is an important bio-sensing technique for real-time label-free detection. However, it is pivotal to optimize various parameters of the sensor configuration for efficient and highly sensitive sensing. To that effect, we focus on optimizing two different SPR structures -- the basic Kretschmann configuration and narrow groove grating. Our analysis aims to detect two di… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

  49. arXiv:2006.03733  [pdf

    cs.CV cs.LG eess.IV

    Unsupervised Abnormality Detection Using Heterogeneous Autonomous Systems

    Authors: Sayeed Shafayet Chowdhury, Kazi Mejbaul Islam, Rouhan Noor

    Abstract: Anomaly detection (AD) in a surveillance scenario is an emerging and challenging field of research. For autonomous vehicles like drones or cars, it is immensely important to distinguish between normal and abnormal states in real-time. Additionally, we also need to detect any device malfunction. But the nature and degree of abnormality may vary depending upon the actual environment and adversary. A… ▽ More

    Submitted 14 July, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

  50. Fast Geometric Surface based Segmentation of Point Cloud from Lidar Data

    Authors: Aritra Mukherjee, Sourya Dipta Das, Jasorsi Ghosh, Ananda S. Chowdhury, Sanjoy Kumar Saha

    Abstract: Map** the environment has been an important task for robot navigation and Simultaneous Localization And Map** (SLAM). LIDAR provides a fast and accurate 3D point cloud map of the environment which helps in map building. However, processing millions of points in the point cloud becomes a computationally expensive task. In this paper, a methodology is presented to generate the segmented surfaces… ▽ More

    Submitted 6 May, 2020; originally announced May 2020.

    Comments: Accepted to PReMI 2019( Pattern Recognition and Machine Intelligence 2019). International Conference on Pattern Recognition and Machine Intelligence. Springer, Cham, 2019