Search | arXiv e-print repository

Enhancing Generalization in Audio Deepfake Detection: A Neural Collapse based Sampling and Training Approach

Authors: Mohammed Yousif, Jonat John Mathew, Huzaifa Pallan, Agamjeet Singh Padda, Syed Daniyal Shah, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan

Abstract: Generalization in audio deepfake detection presents a significant challenge, with models trained on specific datasets often struggling to detect deepfakes generated under varying conditions and unknown algorithms. While collectively training a model using diverse datasets can enhance its generalization ability, it comes with high computational costs. To address this, we propose a neural collapse-b… ▽ More Generalization in audio deepfake detection presents a significant challenge, with models trained on specific datasets often struggling to detect deepfakes generated under varying conditions and unknown algorithms. While collectively training a model using diverse datasets can enhance its generalization ability, it comes with high computational costs. To address this, we propose a neural collapse-based sampling approach applied to pre-trained models trained on distinct datasets to create a new training database. Using ASVspoof 2019 dataset as a proof-of-concept, we implement pre-trained models with Resnet and ConvNext architectures. Our approach demonstrates comparable generalization on unseen data while being computationally efficient, requiring less training data. Evaluation is conducted using the In-the-wild dataset. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2402.05983 [pdf, other]

Capability enhancement of the X-ray micro-tomography system via ML-assisted approaches

Authors: Dhruvi Shah, Shruti Mehta, Ashish Agrawal, Shishir Purohit, Bhaskar Chaudhury

Abstract: Ring artifacts in X-ray micro-CT images are one of the primary causes of concern in their accurate visual interpretation and quantitative analysis. The geometry of X-ray micro-CT scanners is similar to the medical CT machines, except the sample is rotated with a stationary source and detector. The ring artifacts are caused by a defect or non-linear responses in detector pixels during the MicroCT d… ▽ More Ring artifacts in X-ray micro-CT images are one of the primary causes of concern in their accurate visual interpretation and quantitative analysis. The geometry of X-ray micro-CT scanners is similar to the medical CT machines, except the sample is rotated with a stationary source and detector. The ring artifacts are caused by a defect or non-linear responses in detector pixels during the MicroCT data acquisition. Artifacts in MicroCT images can often be so severe that the images are no longer useful for further analysis. Therefore, it is essential to comprehend the causes of artifacts and potential solutions to maximize image quality. This article presents a convolution neural network (CNN)-based Deep Learning (DL) model inspired by UNet with a series of encoder and decoder units with skip connections for removal of ring artifacts. The proposed architecture has been evaluated using the Structural Similarity Index Measure (SSIM) and Mean Squared Error (MSE). Additionally, the results are compared with conventional filter-based non-ML techniques and are found to be better than the latter. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.03390 [pdf, other]

PixelGen: Rethinking Embedded Camera Systems

Authors: Kunjun Li, Manoj Gulati, Steven Waskito, Dhairya Shah, Shantanu Chakrabarty, Ambuj Varshney

Abstract: Embedded camera systems are ubiquitous, representing the most widely deployed example of a wireless embedded system. They capture a representation of the world - the surroundings illuminated by visible or infrared light. Despite their widespread usage, the architecture of embedded camera systems has remained unchanged, which leads to limitations. They visualize only a tiny portion of the world. Ad… ▽ More Embedded camera systems are ubiquitous, representing the most widely deployed example of a wireless embedded system. They capture a representation of the world - the surroundings illuminated by visible or infrared light. Despite their widespread usage, the architecture of embedded camera systems has remained unchanged, which leads to limitations. They visualize only a tiny portion of the world. Additionally, they are energy-intensive, leading to limited battery lifespan. We present PixelGen, which re-imagines embedded camera systems. Specifically, PixelGen combines sensors, transceivers, and low-resolution image and infrared vision sensors to capture a broader world representation. They are deliberately chosen for their simplicity, low bitrate, and power consumption, culminating in an energy-efficient platform. We show that despite the simplicity, the captured data can be processed using transformer-based image and language models to generate novel representations of the environment. For example, we demonstrate that it can allow the generation of high-definition images, while the camera utilises low-power, low-resolution monochrome cameras. Furthermore, the capabilities of PixelGen extend beyond traditional photography, enabling visualization of phenomena invisible to conventional cameras, such as sound waves. PixelGen can enable numerous novel applications, and we demonstrate that it enables unique visualization of the surroundings that are then projected on extended reality headsets. We believe, PixelGen goes beyond conventional cameras and opens new avenues for research and photography. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2305.16491 [pdf, other]

SAMoSSA: Multivariate Singular Spectrum Analysis with Stochastic Autoregressive Noise

Authors: Abdullah Alomar, Munther Dahleh, Sean Mann, Devavrat Shah

Abstract: The well-established practice of time series analysis involves estimating deterministic, non-stationary trend and seasonality components followed by learning the residual stochastic, stationary components. Recently, it has been shown that one can learn the deterministic non-stationary components accurately using multivariate Singular Spectrum Analysis (mSSA) in the absence of a correlated stationa… ▽ More The well-established practice of time series analysis involves estimating deterministic, non-stationary trend and seasonality components followed by learning the residual stochastic, stationary components. Recently, it has been shown that one can learn the deterministic non-stationary components accurately using multivariate Singular Spectrum Analysis (mSSA) in the absence of a correlated stationary component; meanwhile, in the absence of deterministic non-stationary components, the Autoregressive (AR) stationary component can also be learnt readily, e.g. via Ordinary Least Squares (OLS). However, a theoretical underpinning of multi-stage learning algorithms involving both deterministic and stationary components has been absent in the literature despite its pervasiveness. We resolve this open question by establishing desirable theoretical guarantees for a natural two-stage algorithm, where mSSA is first applied to estimate the non-stationary components despite the presence of a correlated stationary AR component, which is subsequently learned from the residual time series. We provide a finite-sample forecasting consistency bound for the proposed algorithm, SAMoSSA, which is data-driven and thus requires minimal parameter tuning. To establish theoretical guarantees, we overcome three hurdles: (i) we characterize the spectra of Page matrices of stable AR processes, thus extending the analysis of mSSA; (ii) we extend the analysis of AR process identification in the presence of arbitrary bounded perturbations; (iii) we characterize the out-of-sample or forecasting error, as opposed to solely considering model identification. Through representative empirical studies, we validate the superior performance of SAMoSSA compared to existing baselines. Notably, SAMoSSA's ability to account for AR noise structure yields improvements ranging from 5% to 37% across various benchmark datasets. △ Less

Submitted 26 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

arXiv:2303.13243 [pdf, other]

Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition

Authors: Kai Liu, Hailiang Xiong, Gangqiang Yang, Zhengfeng Du, Yewen Cao, Danyal Shah

Abstract: As one of the major branches of automatic speech recognition, attention-based models greatly improves the feature representation ability of the model. In particular, the multi-head mechanism is employed in the attention, ho** to learn speech features of more aspects in different attention subspaces. For speech recognition of complex languages, on the one hand, a small head size will lead to an o… ▽ More As one of the major branches of automatic speech recognition, attention-based models greatly improves the feature representation ability of the model. In particular, the multi-head mechanism is employed in the attention, ho** to learn speech features of more aspects in different attention subspaces. For speech recognition of complex languages, on the one hand, a small head size will lead to an obvious shortage of learnable aspects. On the other hand, we need to reduce the dimension of each subspace to keep the size of the overall feature space unchanged when we increase the number of heads, which will significantly weaken the ability to represent the feature of each subspace. Therefore, this paper explores how to use a small attention subspace to represent complete speech features while ensuring many heads. In this work we propose a novel neural network architecture, namely, pyramid multi-branch fusion DCNN with multi-head self-attention. The proposed architecture is inspired by Dilated Convolution Neural Networks (DCNN), it uses multiple branches with DCNN to extract the feature of the input speech under different receptive fields. To reduce the number of parameters, every two branches are merged until all the branches are merged into one. Thus, its shape is like a pyramid rotated 90 degrees. We demonstrate that on Aishell-1, a widely used Mandarin speech dataset, our model achieves a character error rate (CER) of 6.45% on the test sets. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.00983 [pdf, other]

Using simulation to quantify the performance of automotive perception systems

Authors: Zhenyi Liu, Devesh Shah, Alireza Rahimpour, Devesh Upadhyay, Joyce Farrell, Brian A Wandell

Abstract: The design and evaluation of complex systems can benefit from a software simulation - sometimes called a digital twin. The simulation can be used to characterize system performance or to test its performance under conditions that are difficult to measure (e.g., nighttime for automotive perception systems). We describe the image system simulation software tools that we use to evaluate the performan… ▽ More The design and evaluation of complex systems can benefit from a software simulation - sometimes called a digital twin. The simulation can be used to characterize system performance or to test its performance under conditions that are difficult to measure (e.g., nighttime for automotive perception systems). We describe the image system simulation software tools that we use to evaluate the performance of image systems for object (automobile) detection. We describe experiments with 13 different cameras with a variety of optics and pixel sizes. To measure the impact of camera spatial resolution, we designed a collection of driving scenes that had cars at many different distances. We quantified system performance by measuring average precision and we report a trend relating system resolution and object detection performance. We also quantified the large performance degradation under nighttime conditions, compared to daytime, for all cameras and a COCO pre-trained network. △ Less

Submitted 10 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

arXiv:2302.11768 [pdf, other]

A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement

Authors: Zhepei Wang, Ritwik Giri, Devansh Shah, Jean-Marc Valin, Michael M. Goodwin, Paris Smaragdis

Abstract: In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement. This is achieved by incorporating a frame-wise conditioning input that specifies the type of enhancement output. To improve the quality of the enhanced output and mitigate oversuppression, we experiment with re-weighting frames by the presen… ▽ More In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement. This is achieved by incorporating a frame-wise conditioning input that specifies the type of enhancement output. To improve the quality of the enhanced output and mitigate oversuppression, we experiment with re-weighting frames by the presence or absence of speech activity and applying augmentations to speaker embeddings. By training under a multi-task learning setting, we empirically show that the proposed unified model obtains promising results on both personalized and non-personalized speech enhancement benchmarks and reaches similar performance to models that are trained specialized for either task. The strong performance of the proposed method demonstrates that the unified model is a more economical alternative compared to kee** separate task-specific models during inference. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: Accepted by ICASSP 2023

arXiv:2211.09727 [pdf, other]

A Survey on Evaluation Metrics for Synthetic Material Micro-Structure Images from Generative Models

Authors: Devesh Shah, Anirudh Suresh, Alemayehu Admasu, Devesh Upadhyay, Kalyanmoy Deb

Abstract: The evaluation of synthetic micro-structure images is an emerging problem as machine learning and materials science research have evolved together. Typical state of the art methods in evaluating synthetic images from generative models have relied on the Fréchet Inception Distance. However, this and other similar methods, are limited in the materials domain due to both the unique features that char… ▽ More The evaluation of synthetic micro-structure images is an emerging problem as machine learning and materials science research have evolved together. Typical state of the art methods in evaluating synthetic images from generative models have relied on the Fréchet Inception Distance. However, this and other similar methods, are limited in the materials domain due to both the unique features that characterize physically accurate micro-structures and limited dataset sizes. In this study we evaluate a variety of methods on scanning electron microscope (SEM) images of graphene-reinforced polyurethane foams. The primary objective of this paper is to report our findings with regards to the shortcomings of existing methods so as to encourage the machine learning community to consider enhancements in metrics for assessing quality of synthetic images in the material science domain. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: Accepted in Neural Information Processing Systems (NeurIPS) 2022 Workshop on AI for Accelerated Materials Design (AI4Mat). Selected as spotlight paper for workshop

ACM Class: I.2.m; J.2

arXiv:2203.15916 [pdf, other]

Current Implicit Policies May Not Eradicate COVID-19

Authors: Ali Jadbabaie, Arnab Sarker, Devavrat Shah

Abstract: Successful predictive modeling of epidemics requires an understanding of the implicit feedback control strategies which are implemented by populations to modulate the spread of contagion. While this task of capturing endogenous behavior can be achieved through intricate modeling assumptions, we find that a population's reaction to case counts can be described through a second order affine dynamica… ▽ More Successful predictive modeling of epidemics requires an understanding of the implicit feedback control strategies which are implemented by populations to modulate the spread of contagion. While this task of capturing endogenous behavior can be achieved through intricate modeling assumptions, we find that a population's reaction to case counts can be described through a second order affine dynamical system with linear control which fits well to the data across different regions and times throughout the COVID-19 pandemic. The model fits the data well both in and out of sample across the 50 states of the United States, with comparable $R^2$ scores to state of the art ensemble predictions. In contrast to recent models of epidemics, rather than assuming that individuals directly control the contact rate which governs the spread of disease, we assume that individuals control the rate at which they vary their number of interactions, i.e. they control the derivative of the contact rate. We propose an implicit feedback law for this control input and verify that it correlates with policies taken throughout the pandemic. A key takeaway of the dynamical model is that the "stable" point of case counts is non-zero, i.e. COVID-19 will not be eradicated under the current collection of policies and strategies, and additional policies are needed to fully eradicate it quickly. Hence, we suggest alternative implicit policies which focus on making interventions (such as vaccinations and mobility restrictions) a function of cumulative case counts, for which our results suggest a better possibility of eradicating COVID-19. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2202.11271 [pdf, other]

doi 10.15607/RSS.2022.XVIII.019

ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints

Authors: Dhruv Shah, Sergey Levine

Abstract: Robotic navigation has been approached as a problem of 3D reconstruction and planning, as well as an end-to-end learning problem. However, long-range navigation requires both planning and reasoning about local traversability, as well as being able to utilize general knowledge about global geography, in the form of a roadmap, GPS, or other side information providing important cues. In this work, we… ▽ More Robotic navigation has been approached as a problem of 3D reconstruction and planning, as well as an end-to-end learning problem. However, long-range navigation requires both planning and reasoning about local traversability, as well as being able to utilize general knowledge about global geography, in the form of a roadmap, GPS, or other side information providing important cues. In this work, we propose an approach that integrates learning and planning, and can utilize side information such as schematic roadmaps, satellite maps and GPS coordinates as a planning heuristic, without relying on them being accurate. Our method, ViKiNG, incorporates a local traversability model, which looks at the robot's current camera observation and a potential subgoal to infer how easily that subgoal can be reached, as well as a heuristic model, which looks at overhead maps for hints and attempts to evaluate the appropriateness of these subgoals in order to reach the goal. These models are used by a heuristic planner to identify the best waypoint in order to reach the final destination. Our method performs no explicit geometric reconstruction, utilizing only a topological representation of the environment. Despite having never seen trajectories longer than 80 meters in its training dataset, ViKiNG can leverage its image-based learned controller and goal-directed heuristic to navigate to goals up to 3 kilometers away in previously unseen environments, and exhibit complex behaviors such as probing potential paths and backtracking when they are found to be non-viable. ViKiNG is also robust to unreliable maps and GPS, since the low-level controller ultimately makes decisions based on egocentric image observations, using maps only as planning heuristics. For videos of our experiments, please check out our project page https://sites.google.com/view/viking-release. △ Less

Submitted 9 January, 2023; v1 submitted 22 February, 2022; originally announced February 2022.

Comments: Best Systems Paper Finalist at XVII Robotics: Science and Systems (RSS 2022), New York City, USA. Project page https://sites.google.com/view/viking-release

arXiv:2010.02400 [pdf, ps, other]

doi 10.1088/2057-1976/abf9e6

A Generalized Framework for Analytic Regularization of Uniform Cubic B-spline Displacement Fields

Authors: Keyur D. Shah, James A. Shackleford, Nagarajan Kandasamy, Gregory C. Sharp

Abstract: Image registration is an inherently ill-posed problem that lacks the constraints needed for a unique map** between voxels of the two images being registered. As such, one must regularize the registration to achieve physically meaningful transforms. The regularization penalty is usually a function of derivatives of the displacement-vector field, and can be calculated either analytically or numeri… ▽ More Image registration is an inherently ill-posed problem that lacks the constraints needed for a unique map** between voxels of the two images being registered. As such, one must regularize the registration to achieve physically meaningful transforms. The regularization penalty is usually a function of derivatives of the displacement-vector field, and can be calculated either analytically or numerically. The numerical approach, however, is computationally expensive depending on the image size, and therefore a computationally efficient analytical framework has been developed. Using cubic B-splines as the registration transform, we develop a generalized mathematical framework that supports five distinct regularizers: diffusion, curvature, linear elastic, third-order, and total displacement. We validate our approach by comparing each with its numerical counterpart in terms of accuracy. We also provide benchmarking results showing that the analytic solutions run significantly faster -- up to two orders of magnitude -- than finite differencing based numerical implementations. △ Less

Submitted 5 April, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: 17 pages, 5 figures

Journal ref: https://iopscience.iop.org/article/10.1088/2057-1976/abf9e6

arXiv:2009.08645 [pdf]

Low Density Parity Check Code (LDPC Codes) Overview

Authors: Saumya Borwankar, Dhruv Shah

Abstract: This paper basically expresses the core fundamentals and brief overview of the research of R. G. GALLAGER [1] on Low-Density Parity-Check (LDPC) codes and various parameters related to LDPC codes like, encoding and decoding of LDPC codes, code rate, parity check matrix, tanner graph. We also discuss advantages and applications as well as the usage of LDPC codes in 5G technology. We have simulated… ▽ More This paper basically expresses the core fundamentals and brief overview of the research of R. G. GALLAGER [1] on Low-Density Parity-Check (LDPC) codes and various parameters related to LDPC codes like, encoding and decoding of LDPC codes, code rate, parity check matrix, tanner graph. We also discuss advantages and applications as well as the usage of LDPC codes in 5G technology. We have simulated encoding and decoding of LDPC codes and have acquired results in terms of BER vs SNR graph in MATLAB software. This report was submitted as an assignment in Nirma University △ Less

Submitted 18 September, 2020; originally announced September 2020.

arXiv:2009.08317 [pdf]

Effect Of Weather Conditions On FSO Link

Authors: Saumya Borwankar, Dhruv Shah

Abstract: Free Space Optics (FSO) is a develo** technology for Line of Sight communication that uses light propagation in free space that provides various advantages like high bandwidth, high data rate, ease of installation, free licensing and secure communication. Thus, FSO is a develo** technology that can be used in numerous applications for Line of Sight Communication. But the diverse effects like a… ▽ More Free Space Optics (FSO) is a develo** technology for Line of Sight communication that uses light propagation in free space that provides various advantages like high bandwidth, high data rate, ease of installation, free licensing and secure communication. Thus, FSO is a develo** technology that can be used in numerous applications for Line of Sight Communication. But the diverse effects like attenuation on FSO communication link due to environmental factors and weather conditions like fog, rain, dust, sand storms, clouds, temperature and the other factors like range, effects of physical obstructions are an essential topic for study which is discussed in this paper. We have done the simulation for the effects of fog and rain on the FSO communication link in Opti system software [1]. This is submitted in leu of FOC assignment at Nirma University. △ Less

Submitted 17 September, 2020; originally announced September 2020.

arXiv:1911.09645 [pdf, other]

Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features

Authors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto

Abstract: This paper presents a simple yet effective method to achieve prosody transfer from a reference speech signal to synthesized speech. The main idea is to incorporate well-known acoustic correlates of prosody such as pitch and loudness contours of the reference speech into a modern neural text-to-speech (TTS) synthesizer such as Tacotron2 (TC2). More specifically, a small set of acoustic features are… ▽ More This paper presents a simple yet effective method to achieve prosody transfer from a reference speech signal to synthesized speech. The main idea is to incorporate well-known acoustic correlates of prosody such as pitch and loudness contours of the reference speech into a modern neural text-to-speech (TTS) synthesizer such as Tacotron2 (TC2). More specifically, a small set of acoustic features are extracted from reference audio and then used to condition a TC2 synthesizer. The trained model is evaluated using subjective listening tests and a novel objective evaluation of prosody transfer is proposed. Listening tests show that the synthesized speech is rated as highly natural and that prosody is successfully transferred from the reference speech signal to the synthesized signal. △ Less

Submitted 15 May, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

Comments: 5 pages, in review for conference publication

arXiv:1911.00344 [pdf, other]

Short and Wide Network Paths

Authors: Lavanya Marla, Lav R. Varshney, Devavrat Shah, Nirmal A. Prakash, Michael E. Gale

Abstract: Network flow is a powerful mathematical framework to systematically explore the relationship between structure and function in biological, social, and technological networks. We introduce a new pipelining model of flow through networks where commodities must be transported over single paths rather than split over several paths and recombined. We show this notion of pipelined network flow is optimi… ▽ More Network flow is a powerful mathematical framework to systematically explore the relationship between structure and function in biological, social, and technological networks. We introduce a new pipelining model of flow through networks where commodities must be transported over single paths rather than split over several paths and recombined. We show this notion of pipelined network flow is optimized using network paths that are both short and wide, and develop efficient algorithms to compute such paths for given pairs of nodes and for all-pairs. Short and wide paths are characterized for many real-world networks. To further demonstrate the utility of this network characterization, we develop novel information-theoretic lower bounds on computation speed in nervous systems due to limitations from anatomical connectivity and physical noise. For the nematode Caenorhabditis elegans, we find these bounds are predictive of biological timescales of behavior. Further, we find the particular C. elegans connectome is globally less efficient for information flow than random networks, but the hub-and-spoke architecture of functional subcircuits is optimal under constraint on number of synapses. This suggests functional subcircuits are a primary organizational principle of this small invertebrate nervous system. △ Less

Submitted 1 November, 2019; originally announced November 2019.

arXiv:1402.3654 [pdf]

Temperature Control using Fuzzy Logic

Authors: Piyush Singhala, Dhrumil Shah, Bhavikkumar Patel

Abstract: The aim of the temperature control is to heat the system up todelimitated temperature, afterwardhold it at that temperature in insured manner. Fuzzy Logic Controller (FLC) is best way in which this type of precision control can be accomplished by controller. During past twenty yearssignificant amount of research using fuzzy logichas done in this field of control of non-linear dynamical system. Her… ▽ More The aim of the temperature control is to heat the system up todelimitated temperature, afterwardhold it at that temperature in insured manner. Fuzzy Logic Controller (FLC) is best way in which this type of precision control can be accomplished by controller. During past twenty yearssignificant amount of research using fuzzy logichas done in this field of control of non-linear dynamical system. Here we have developed temperature control system using fuzzy logic. Control theory techniques are the root from which convention controllers are deducted. The desired response of the output can be guaranteed by the feedback controller. △ Less

Submitted 15 February, 2014; originally announced February 2014.

Comments: 10 pages

Showing 1–16 of 16 results for author: Shah, D