Skip to main content

Showing 1–50 of 180 results for author: Kumar, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00537  [pdf, other

    eess.IV cs.CV cs.LG

    Accelerating Longitudinal MRI using Prior Informed Latent Diffusion

    Authors: Yonatan Urman, Zachary Shah, Ashwin Kumar, Bruno P. Soares, Kawin Setsompop

    Abstract: MRI is a widely used ionization-free soft-tissue imaging modality, often employed repeatedly over a patient's lifetime. However, prolonged scanning durations, among other issues, can limit availability and accessibility. In this work, we aim to substantially reduce scan times by leveraging prior scans of the same patient. These prior scans typically contain considerable shared information with the… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  2. arXiv:2406.11619  [pdf, other

    eess.AS cs.LG

    AV-CrossNet: an Audiovisual Complex Spectral Map** Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling

    Authors: Vahid Ahmadi Kalkhorani, Cheng Yu, Anurag Kumar, Ke Tan, Buye Xu, DeLiang Wang

    Abstract: Adding visual cues to audio-based speech separation can improve separation performance. This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet is extended from the CrossNet architecture, which is a recently proposed network that performs complex spectral map** for speech separation by lever… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 Figures, and 4 Tables

  3. arXiv:2406.04660  [pdf, other

    eess.AS cs.SD

    URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement

    Authors: Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian

    Abstract: The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE). However, most existing SE research has limitations on the coverage of SE sub-tasks, data diversity and amount, and evaluation metrics. To fill this gap and promote research toward universal SE, we establish a new SE challenge, named URGENT, to focus on the universality, robustness, and generaliza… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 6 pages, 3 figures, 3 tables. Accepted by Interspeech 2024. An extended version of the accepted manuscript with appendix

  4. arXiv:2405.20402  [pdf, other

    eess.AS cs.SD eess.SP

    Cross-Talk Reduction

    Authors: Zhong-Qiu Wang, Anurag Kumar, Shinji Watanabe

    Abstract: While far-field multi-talker mixtures are recorded, each speaker can wear a close-talk microphone so that close-talk mixtures can be recorded at the same time. Although each close-talk mixture has a high signal-to-noise ratio (SNR) of the wearer, it has a very limited range of applications, as it also contains significant cross-talk speech by other speakers and is not clean enough. In this context… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: in International Joint Conference on Artificial Intelligence (IJCAI), 2024

  5. arXiv:2405.01040  [pdf, other

    cs.CV cs.CL eess.IV

    Few Shot Class Incremental Learning using Vision-Language models

    Authors: Anurag Kumar, Chinmay Bharti, Saikat Dutta, Srikrishna Karanam, Biplab Banerjee

    Abstract: Recent advancements in deep learning have demonstrated remarkable performance comparable to human capabilities across various supervised computer vision tasks. However, the prevalent assumption of having an extensive pool of training data encompassing all classes prior to model training often diverges from real-world scenarios, where limited data availability for novel classes is the norm. The cha… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: under review at Pattern Recognition Letters

  6. arXiv:2405.00130  [pdf, other

    eess.IV cs.CV cs.LG

    A Flexible 2.5D Medical Image Segmentation Approach with In-Slice and Cross-Slice Attention

    Authors: Amarjeet Kumar, Hongxu Jiang, Muhammad Imran, Cyndi Valdes, Gabriela Leon, Dahyun Kang, Parvathi Nataraj, Yuyin Zhou, Michael D. Weiss, Wei Shao

    Abstract: Deep learning has become the de facto method for medical image segmentation, with 3D segmentation models excelling in capturing complex 3D structures and 2D models offering high computational efficiency. However, segmenting 2.5D images, which have high in-plane but low through-plane resolution, is a relatively unexplored challenge. While applying 2D models to individual slices of a 2.5D image is f… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  7. arXiv:2404.15371  [pdf, other

    eess.SP cs.AI

    Efficient Verification of a RADAR SoC Using Formal and Simulation-Based Methods

    Authors: Aman Kumar, Mark Litterick, Samuele Candido

    Abstract: As the demand for Internet of Things (IoT) and Human-to-Machine Interaction (HMI) increases, modern System-on-Chips (SoCs) offering such solutions are becoming increasingly complex. This intricate design poses significant challenges for verification, particularly when time-to-market is a crucial factor for consumer electronics products. This paper presents a case study based on our work to verify… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: Published in DVCon Europe 2023

  8. arXiv:2403.18821  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

    Authors: Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard

    Abstract: We present a new dataset called Real Acoustic Fields (RAF) that captures real acoustic room data from multiple modalities. The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms. We used this dataset to evaluate existing methods for novel-view acoustic synthes… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. Project site: https://facebookresearch.github.io/real-acoustic-fields/

  9. Intelligent fault diagnosis of worm gearbox based on adaptive CNN using amended gorilla troop optimization with quantum gate mutation strategy

    Authors: Govind Vashishtha, Sumika Chauhan, Surinder Kumar, Rajesh Kumar, Radoslaw Zimroz, Anil Kumar

    Abstract: The worm gearbox is a high-speed transmission system that plays a vital role in various industries. Therefore it becomes necessary to develop a robust fault diagnosis scheme for worm gearbox. Due to advancements in sensor technology, researchers from academia and industries prefer deep learning models for fault diagnosis purposes. The optimal selection of hyperparameters (HPs) of deep learning mod… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Journal ref: Knowledge-Based Systems Volume 280, 25 November 2023, 110984

  10. arXiv:2403.08261  [pdf, other

    cs.CV cs.AI eess.IV

    CoroNetGAN: Controlled Pruning of GANs via Hypernetworks

    Authors: Aman Kumar, Khushboo Anand, Shubham Mandloi, Ashutosh Mishra, Avinash Thakur, Neeraj Kasera, Prathosh A P

    Abstract: Generative Adversarial Networks (GANs) have proven to exhibit remarkable performance and are widely used across many generative computer vision applications. However, the unprecedented demand for the deployment of GANs on resource-constrained edge devices still poses a challenge due to huge number of parameters involved in the generation process. This has led to focused attention on the area of co… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  11. arXiv:2403.04781  [pdf

    cs.CR cs.CV cs.LG eess.IV

    Selective Encryption using Segmentation Mask with Chaotic Henon Map for Multidimensional Medical Images

    Authors: S Arut Prakash, Aditya Ganesh Kumar, Prabhu Shankar K. C., Lithicka Anandavel, Aditya Lakshmi Narayanan

    Abstract: A user-centric design and resource optimization should be at the center of any technology or innovation. The user-centric perspective gives the developer the opportunity to develop with task-based optimization. The user in the medical image field is a medical professional who analyzes the medical images and gives their diagnosis results to the patient. This scheme, having the medical professional… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  12. arXiv:2403.04333  [pdf, other

    eess.SP

    A Survey of Application of Machine Learning in Wireless Indoor Positioning Systems

    Authors: Amala Sonny, Abhinav Kumar, Linga Reddy Cenkeramaddi

    Abstract: Indoor human positioning has become increasingly important for applications such as health monitoring, breath monitoring, human identification, safety and rescue operations, and security surveillance. However, achieving robust indoor human positioning remains challenging due to various constraints. Numerous attempts have been made in the literature to develop efficient indoor positioning systems (… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  13. arXiv:2403.01369  [pdf, other

    eess.AS cs.AI cs.LG

    A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

    Authors: Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar

    Abstract: Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others. While the features are undeniably useful in speech recognition and associated tasks, their utility in speech enhancement systems is yet to be firmly established, and perhaps not properly understood. In this paper, we… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 8 pages; Shorter form accepted in ICASSP 2024

  14. arXiv:2402.18968  [pdf, other

    eess.AS cs.SD

    Ambisonics Networks -- The Effect Of Radial Functions Regularization

    Authors: Bar Shaybet, Anurag Kumar, Vladimir Tourbabin, Boaz Rafaely

    Abstract: Ambisonics, a popular format of spatial audio, is the spherical harmonic (SH) representation of the plane wave density function of a sound field. Many algorithms operate in the SH domain and utilize the Ambisonics as their input signal. The process of encoding Ambisonics from a spherical microphone array involves dividing by the radial functions, which may amplify noise at low frequencies. This ca… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: to be published in Icassp 2024

  15. arXiv:2401.13049  [pdf, other

    eess.IV cs.AI cs.CV cs.GT cs.LG

    CIS-UNet: Multi-Class Segmentation of the Aorta in Computed Tomography Angiography via Context-Aware Shifted Window Self-Attention

    Authors: Muhammad Imran, Jonathan R Krebs, Veera Rajasekhar Reddy Gopu, Brian Fazzone, Vishal Balaji Sivaraman, Amarjeet Kumar, Chelsea Viscardi, Robert Evans Heithaus, Benjamin Shickel, Yuyin Zhou, Michol A Cooper, Wei Shao

    Abstract: Advancements in medical imaging and endovascular grafting have facilitated minimally invasive treatments for aortic diseases. Accurate 3D segmentation of the aorta and its branches is crucial for interventions, as inaccurate segmentation can lead to erroneous surgical planning and endograft construction. Previous methods simplified aortic segmentation as a binary image segmentation problem, overlo… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  16. arXiv:2312.17279  [pdf, other

    cs.CL eess.AS

    Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition

    Authors: Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

    Abstract: In this paper, we propose an efficient and accurate streaming speech recognition model based on the FastConformer architecture. We adapted the FastConformer architecture for streaming applications through: (1) constraining both the look-ahead and past contexts in the encoder, and (2) introducing an activation caching mechanism to enable the non-autoregressive encoder to operate autoregressively du… ▽ More

    Submitted 2 May, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

    Comments: Shorter version accepted to ICASSP 2024

  17. arXiv:2312.05623  [pdf, other

    cs.IT eess.SP

    Impact of Urban Street Geometry on the Detection Probability of Automotive Radars

    Authors: Mohammad Taha Shah, Ankit Kumar, Gourab Ghatak, Shobha Sundar Ram

    Abstract: Prior works have analyzed the performance of millimeter wave automotive radars in the presence of diverse clutter and interference scenarios using stochastic geometry tools instead of more time-consuming measurement studies or system-level simulations. In these works, the distributions of radars or discrete clutter scatterers were modeled as Poisson point processes in the Euclidean space. However,… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: Submitted to IEEE Radar Conference 2024 (RadarConf24)

  18. arXiv:2311.12582  [pdf, other

    eess.IV cs.AI cs.CV

    Echocardiogram Foundation Model -- Application 1: Estimating Ejection Fraction

    Authors: Adil Dahlan, Cyril Zakka, Abhinav Kumar, Laura Tang, Rohan Shad, Robyn Fong, William Hiesinger

    Abstract: Cardiovascular diseases stand as the primary global cause of mortality. Among the various imaging techniques available for visualising the heart and evaluating its function, echocardiograms emerge as the preferred choice due to their safety and low cost. Quantifying cardiac function based on echocardiograms is very laborious, time-consuming and subject to high interoperator variability. In this wo… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  19. arXiv:2311.00482  [pdf

    physics.med-ph eess.SP

    A Portable Ultrasound Imaging Pipeline Implementation with GPU Acceleration on Nvidia CLARA AGX

    Authors: A. N. Madhavanunni, V. Arun Kumar, Mahesh Raveendranatha Panicker

    Abstract: In this paper, we present a GPU-accelerated prototype implementation of a portable ultrasound imaging pipeline on an Nvidia CLARA AGX development kit. The raw data is acquired with nonsteered plane wave transmit using a programmable handheld open platform that supports 128-channel transmit and 64-channel receive. The received signals are transferred to the Nvidia CLARA AGX developer platform throu… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: 5 pages, 4 figures

  20. arXiv:2310.17864  [pdf, other

    eess.AS cs.SD

    TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

    Authors: Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, **chuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

    Abstract: TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims to accelerate the research and development of audio and speech technologies by providing well-designed, easy-to-use, and performant PyTorch components. Its contributors routinely engage with users to understand their needs and fulfill them by develo** impactful features. Here, we survey TorchAudio's devel… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  21. arXiv:2310.12282  [pdf, ps, other

    cs.GT cs.MA eess.SY

    Mean-field games among teams

    Authors: Jayakumar Subramanian, Akshat Kumar, Aditya Mahajan

    Abstract: In this paper, we present a model of a game among teams. Each team consists of a homogeneous population of agents. Agents within a team are cooperative while the teams compete with other teams. The dynamics and the costs are coupled through the empirical distribution (or the mean field) of the state of agents in each team. This mean-field is assumed to be observed by all agents. Agents have asymme… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: 20 pages

  22. arXiv:2310.03884  [pdf, other

    cs.IT cs.LG eess.SP math.DG stat.ML

    Information Geometry for the Working Information Theorist

    Authors: Kumar Vijay Mishra, M. Ashok Kumar, Ting-Kam Leonard Wong

    Abstract: Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: 12 pages, 3 figures, 1 table

  23. arXiv:2309.15977  [pdf, other

    cs.SD cs.CV eess.AS

    Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields

    Authors: Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

    Abstract: Room impulse response (RIR), which measures the sound propagation within an environment, is critical for synthesizing high-fidelity audio for a given environment. Some prior work has proposed representing RIR as a neural field function of the sound emitter and receiver positions. However, these methods do not sufficiently consider the acoustic properties of an audio scene, leading to unsatisfactor… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  24. arXiv:2309.13445  [pdf, other

    cs.AR cs.AI eess.SP

    AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming

    Authors: Siva Satyendra Sahoo, Salim Ullah, Akash Kumar

    Abstract: With the increasing application of machine learning (ML) algorithms in embedded systems, there is a rising necessity to design low-cost computer arithmetic for these resource-constrained systems. As a result, emerging models of computation, such as approximate and stochastic computing, that leverage the inherent error-resilience of such algorithms are being actively explored for implementing ML in… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

    Comments: 23 pages, Under review at ACM TRETS

  25. arXiv:2309.12830  [pdf, other

    cs.AR cs.AI cs.LG eess.SP

    AxOCS: Scaling FPGA-based Approximate Operators using Configuration Supersampling

    Authors: Siva Satyendra Sahoo, Salim Ullah, Soumyo Bhattacharjee, Akash Kumar

    Abstract: The rising usage of AI and ML-based processing across application domains has exacerbated the need for low-cost ML implementation, specifically for resource-constrained embedded systems. To this end, approximate computing, an approach that explores the power, performance, area (PPA), and behavioral accuracy (BEHAV) trade-offs, has emerged as a possible solution for implementing embedded machine le… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: 11 pages, under review with IEEE TCAS-I

    ACM Class: B.2.4; J.6; J.7; I.2.1

  26. Non-parametric Ensemble Empirical Mode Decomposition for extracting weak features to identify bearing defects

    Authors: Anil Kumar, Yaakoub Berrouche, Radosław Zimroz, Govind Vashishtha, Sumika Chauhan, C. P. Gandhi, Hesheng Tang, Jiawei Xiang

    Abstract: A non-parametric complementary ensemble empirical mode decomposition (NPCEEMD) is proposed for identifying bearing defects using weak features. NPCEEMD is non-parametric because, unlike existing decomposition methods such as ensemble empirical mode decomposition, it does not require defining the ideal SNR of noise and the number of ensembles, every time while processing the signals. The simulation… ▽ More

    Submitted 2 October, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

    Journal ref: Measurement 211, 112615 (2023)

  27. arXiv:2309.05686  [pdf, other

    cs.LG cs.NI eess.SP

    Temporal Patience: Efficient Adaptive Deep Learning for Embedded Radar Data Processing

    Authors: Max Sponner, Julius Ott, Lorenzo Servadei, Bernd Waschneck, Robert Wille, Akash Kumar

    Abstract: Radar sensors offer power-efficient solutions for always-on smart devices, but processing the data streams on resource-constrained embedded platforms remains challenging. This paper presents novel techniques that leverage the temporal correlation present in streaming radar data to enhance the efficiency of Early Exit Neural Networks for Deep Learning inference on embedded devices. These networks a… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: CODAI 2023 Workshop Submission

  28. arXiv:2308.14645  [pdf, ps, other

    cs.IT eess.SP

    On the Achievable Rate of MIMO Narrowband PLC with Spatio-Temporal Correlated Noise

    Authors: Mohammadreza Bakhshizadeh Mohajer, Sadaf Moaveninejad, Atul Kumar, Mahmoud Elgenedy, Naofal Al-Dhahir, Luca Barletta, Maurizio Magarini

    Abstract: Narrowband power line communication (NB-PLC) systems are an attractive solution for supporting current and future smart grids. A technology proposed to enhance data rate in NB-PLC is multiple-input multiple-output (MIMO) transmission over multiple power line phases. To achieve reliable communication over MIMO NB-PLC, a key challenge is to take into account and mitigate the effects of temporally an… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 13 pages, 10 figures, submitted to IEEE Transactions on Communications

  29. arXiv:2308.10984  [pdf, other

    cs.CV eess.IV

    Debiasing Counterfactuals In the Presence of Spurious Correlations

    Authors: Amar Kumar, Nima Fathi, Raghav Mehta, Brennan Nichyporuk, Jean-Pierre R. Falet, Sotirios Tsaftaris, Tal Arbel

    Abstract: Deep learning models can perform well in complex medical imaging classification tasks, even when basing their conclusions on spurious correlations (i.e. confounders), should they be prevalent in the training dataset, rather than on the causal image markers of interest. This would thereby limit their ability to generalize across the population. Explainability based on counterfactual image generatio… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted to the FAIMI (Fairness of AI in Medical Imaging) workshop at MICCAI 2023

  30. arXiv:2308.00122  [pdf, other

    cs.CV cs.SD eess.AS

    DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

    Authors: Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu

    Abstract: We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner. While existing discriminative methods that perform mask regression have made remarkable progress in this field, they face limitations in capturing the complex data distribution required for high-quality separation of sounds from diverse… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

  31. arXiv:2307.09425  [pdf, other

    cs.SD eess.AS physics.pop-ph

    Musical Excellence of Mridangam: an introductory review

    Authors: Arvind Shankar Kumar

    Abstract: This is an introductory review of Musical Excellence of Mridangam by Dr. Umayalpuram K Sivaraman, Dr. T Ramasami and Dr. Naresh, which is a scientific treatise exploring the unique tonal properties of the ancient Indian classical percussive instrument -- the Mridangam. This review aims to bridge the gap between the primary intended audience of Musical Excellence of Mridangam - listeners, artistes… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  32. arXiv:2307.05324  [pdf, other

    cs.SD eess.AS

    ShredGP: Guitarist Style-Conditioned Tablature Generation

    Authors: Pedro Sarmento, Adarsh Kumar, Dekun Xie, CJ Carr, Zack Zukowski, Mathieu Barthet

    Abstract: GuitarPro format tablatures are a type of digital music notation that encapsulates information about guitar playing techniques and fingerings. We introduce ShredGP, a GuitarPro tablature generative Transformer-based model conditioned to imitate the style of four distinct iconic electric guitarists. In order to assess the idiosyncrasies of each guitar player, we adopt a computational musicology met… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: Accepted for publication at CMMR 2023

  33. arXiv:2307.00193  [pdf, other

    eess.SY cs.RO

    Fast, Smooth, and Safe: Implicit Control Barrier Functions through Reach-Avoid Differential Dynamic Programming

    Authors: Athindran Ramesh Kumar, Kai-Chieh Hsu, Peter J. Ramadge, Jaime F. Fisac

    Abstract: Safety is a central requirement for autonomous system operation across domains. Hamilton-Jacobi (HJ) reachability analysis can be used to construct "least-restrictive" safety filters that result in infrequent, but often extreme, control overrides. In contrast, control barrier function (CBF) methods apply smooth control corrections to guard the system against an often conservative safety boundary.… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

    Comments: Accepted in IEEE Control Systems Letters (L-CSS)

  34. arXiv:2306.06574  [pdf, other

    cs.NI cs.LG eess.SY

    Learnable Digital Twin for Efficient Wireless Network Evaluation

    Authors: Boning Li, Timofey Efimov, Abhishek Kumar, Jose Cortes, Gunjan Verma, Ananthram Swami, Santiago Segarra

    Abstract: Network digital twins (NDTs) facilitate the estimation of key performance indicators (KPIs) before physically implementing a network, thereby enabling efficient optimization of the network configuration. In this paper, we propose a learning-based NDT for network simulators. The proposed method offers a holistic representation of information flow in a wireless network by integrating node, edge, and… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

  35. arXiv:2305.05532  [pdf, other

    eess.SP cs.AI cs.LG stat.AP stat.ML

    An ensemble of convolution-based methods for fault detection using vibration signals

    Authors: Xian Yeow Lee, Aman Kumar, Lasitha Vidyaratne, Aniruddha Rajendra Rao, Ahmed Farahat, Chetan Gupta

    Abstract: This paper focuses on solving a fault detection problem using multivariate time series of vibration signals collected from planetary gearboxes in a test rig. Various traditional machine learning and deep learning methods have been proposed for multivariate time-series classification, including distance-based, functional data-oriented, feature-driven, and convolution kernel-based methods. Recent st… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: 12 Pages, 9 Figures, 2 Tables. Accepted at ICPHM 2023

    Journal ref: 2023 IEEE International Conference on Prognostics and Health Management (ICPHM)

  36. arXiv:2305.05084  [pdf, other

    eess.AS cs.SD

    Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

    Authors: Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

    Abstract: Conformer-based models have become the dominant end-to-end architecture for speech processing tasks. With the objective of enhancing the conformer architecture for efficient training and inference, we carefully redesigned Conformer with a novel downsampling schema. The proposed model, named Fast Conformer(FC), is 2.8x faster than the original Conformer, supports scaling to Billion parameters witho… ▽ More

    Submitted 30 September, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted at ASRU 2023

  37. arXiv:2305.01663  [pdf, other

    q-bio.QM cs.LG eess.IV

    A Novel Deep Learning based Model for Erythrocytes Classification and Quantification in Sickle Cell Disease

    Authors: Manish Bhatia, Balram Meena, Vipin Kumar Rathi, Prayag Tiwari, Amit Kumar Jaiswal, Shagaf M Ansari, Ajay Kumar, Pekka Marttinen

    Abstract: The shape of erythrocytes or red blood cells is altered in several pathological conditions. Therefore, identifying and quantifying different erythrocyte shapes can help diagnose various diseases and assist in designing a treatment strategy. Machine Learning (ML) can be efficiently used to identify and quantify distorted erythrocyte morphologies. In this paper, we proposed a customized deep convolu… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  38. arXiv:2304.08953  [pdf, other

    cs.SD cs.LG eess.AS

    From Words to Music: A Study of Subword Tokenization Techniques in Symbolic Music Generation

    Authors: Adarsh Kumar, Pedro Sarmento

    Abstract: Subword tokenization has been widely successful in text-based natural language processing (NLP) tasks with Transformer-based models. As Transformer models become increasingly popular in symbolic music-related studies, it is imperative to investigate the efficacy of subword tokenization in the symbolic music domain. In this paper, we explore subword tokenization techniques, such as byte-pair encodi… ▽ More

    Submitted 25 April, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

  39. arXiv:2304.08766  [pdf

    eess.IV

    Cashew dataset generation using augmentation and RaLSGAN and a transfer learning based tinyML approach towards disease detection

    Authors: Varsha Jayaprakash, Akilesh K, Ajay kumar, Balamurugan M. S, Manoj Kumar Rajagopal

    Abstract: Cashew is one of the most extensively consumed nuts in the world, and it is also known as a cash crop. A tree may generate a substantial yield in a few months and has a lifetime of around 70 to 80 years. Yet, in addition to the benefits, there are certain constraints to its cultivation. With the exception of parasites and algae, anthracnose is the most common disease affecting trees. When it comes… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

  40. arXiv:2304.01992  [pdf, other

    eess.IV cs.CV

    Cross-modulated Few-shot Image Generation for Colorectal Tissue Classification

    Authors: Amandeep Kumar, Ankan kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan

    Abstract: In this work, we propose a few-shot colorectal tissue image generation method for addressing the scarcity of histopathological training data for rare cancer tissues. Our few-shot generation method, named XM-GAN, takes one base and a pair of reference tissue images as input and generates high-quality yet diverse images. Within our XM-GAN, a novel controllable fusion block densely aggregates local r… ▽ More

    Submitted 4 July, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Early Accept in MICCAI 2023

  41. arXiv:2304.01448  [pdf, other

    eess.AS

    TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio

    Authors: Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, Buye Xu

    Abstract: Measuring quality and intelligibility of a speech signal is usually a critical step in development of speech processing systems. To enable this, a variety of metrics to measure quality and intelligibility under different assumptions have been developed. Through this paper, we introduce tools and a set of models to estimate such known metrics using deep neural networks. These models are made availa… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: ICASSP 2023

  42. arXiv:2303.13471  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Egocentric Audio-Visual Object Localization

    Authors: Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

    Abstract: Humans naturally perceive surrounding scenes by unifying sound and sight in a first-person view. Likewise, machines are advanced to approach human intelligence by learning with multisensory inputs from an egocentric perspective. In this paper, we explore the challenging egocentric audio-visual object localization task and observe that 1) egomotion commonly exists in first-person recordings, even w… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  43. arXiv:2303.11330  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    Legs as Manipulator: Pushing Quadrupedal Agility Beyond Locomotion

    Authors: Xuxin Cheng, Ashish Kumar, Deepak Pathak

    Abstract: Locomotion has seen dramatic progress for walking or running across challenging terrains. However, robotic quadrupeds are still far behind their biological counterparts, such as dogs, which display a variety of agile skills and can use the legs beyond locomotion to perform several basic manipulation tasks like interacting with objects and climbing. In this paper, we take a step towards bridging th… ▽ More

    Submitted 22 March, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted at ICRA 2023. Videos at https://robot-skills.github.io

  44. arXiv:2303.03397  [pdf, other

    eess.IV

    Malaria detection using Deep Convolution Neural Network

    Authors: Sumit Kumar, Harsh Vardhan, Sneha Priya, Ayush Kumar

    Abstract: The latest WHO report showed that the number of malaria cases climbed to 219 million last year, two million higher than last year. The global efforts to fight malaria have hit a plateau and the most significant underlying reason is international funding has declined. Malaria, which is spread to people through the bites of infected female mosquitoes, occurs in 91 countries but about 90% of the case… ▽ More

    Submitted 6 January, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

  45. arXiv:2302.13836  [pdf

    physics.app-ph eess.SY

    Two-Dimensional Wide Dynamic Range Displacement Sensor using Dielectric Resonator Coupled Microwave Circuit

    Authors: Premsai Regalla, A. V. Praveen Kumar

    Abstract: In this paper, the authors propose a two-dimensional, wide dynamic range, linear displacement sensor using microwave methods. The microwave sensor circuit employs a cylindrical dielectric resonator proximity coupled to a pair of orthogonal microstrip lines formed on a microwave substrate. The DR rests on the substrate and is free to be displaced between the strips on the 2D plane of the substrate.… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: This work has been submitted to IEEE for possible publication

  46. arXiv:2302.08095  [pdf, other

    cs.SD cs.CL eess.AS

    PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

    Authors: Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

    Abstract: Despite rapid advancement in recent years, current speech enhancement models often produce speech that differs in perceptual quality from real clean speech. We propose a learning objective that formalizes differences in perceptual quality, by using domain knowledge of acoustic-phonetics. We identify temporal acoustic parameters -- such as spectral tilt, spectral flux, shimmer, etc. -- that are non… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: Accepted at ICASSP 2023

  47. arXiv:2302.08088  [pdf, other

    cs.CL cs.SD eess.AS

    TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

    Authors: Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

    Abstract: Speech enhancement models have greatly progressed in recent years, but still show limits in perceptual quality of their speech outputs. We propose an objective for perceptual quality based on temporal acoustic parameters. These are fundamental speech features that play an essential role in various applications, including speaker recognition and paralinguistic analysis. We provide a differentiable… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: Accepted at ICASSP 2023

  48. arXiv:2302.05393  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    GTR-CTRL: Instrument and Genre Conditioning for Guitar-Focused Music Generation with Transformers

    Authors: Pedro Sarmento, Adarsh Kumar, Yu-Hua Chen, CJ Carr, Zack Zukowski, Mathieu Barthet

    Abstract: Recently, symbolic music generation with deep learning techniques has witnessed steady improvements. Most works on this topic focus on MIDI representations, but less attention has been paid to symbolic music generation using guitar tablatures (tabs) which can be used to encode multiple instruments. Tabs include information on expressive techniques and fingerings for fretted string instruments in a… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: This preprint is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). The Version of Record of this contribution is published in Proceedings of EvoMUSART: International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) 2023

    Journal ref: EvoMUSART: International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) 2023

  49. arXiv:2302.02088  [pdf, other

    cs.CV cs.GR cs.SD eess.AS

    AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

    Authors: Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

    Abstract: Can machines recording an audio-visual scene produce realistic, matching audio-visual experiences at novel positions and novel view directions? We answer it by studying a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning. Concretely, given a video recording of an audio-visual scene, the task is to synthesize new videos with s… ▽ More

    Submitted 16 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  50. arXiv:2301.04320  [pdf, other

    cs.SD cs.LG eess.AS

    Rethinking complex-valued deep neural networks for monaural speech enhancement

    Authors: Haibin Wu, Ke Tan, Buye Xu, Anurag Kumar, Daniel Wong

    Abstract: Despite multiple efforts made towards adopting complex-valued deep neural networks (DNNs), it remains an open question whether complex-valued DNNs are generally more effective than real-valued DNNs for monaural speech enhancement. This work is devoted to presenting a critical assessment by systematically examining complex-valued DNNs against their real-valued counterparts. Specifically, we investi… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.