Search | arXiv e-print repository

doi 10.11916/j.issn.1005-9113.2019044

Improving Deep Attractor Network by BGRU and GMM for Speech Separation

Authors: Rawad Melhem, Assef Jafar, Riad Hamadeh

Abstract: Deep Attractor Network (DANet) is the state-of-the-art technique in speech separation field, which uses Bidirectional Long Short-Term Memory (BLSTM), but the complexity of the DANet model is very high. In this paper, a simplified and powerful DANet model is proposed using Bidirectional Gated neural network (BGRU) instead of BLSTM. The Gaussian Mixture Model (GMM) other than the k-means was applied… ▽ More Deep Attractor Network (DANet) is the state-of-the-art technique in speech separation field, which uses Bidirectional Long Short-Term Memory (BLSTM), but the complexity of the DANet model is very high. In this paper, a simplified and powerful DANet model is proposed using Bidirectional Gated neural network (BGRU) instead of BLSTM. The Gaussian Mixture Model (GMM) other than the k-means was applied in DANet as a clustering algorithm to reduce the complexity and increase the learning speed and accuracy. The metrics used in this paper are Signal to Distortion Ratio (SDR), Signal to Interference Ratio (SIR), Signal to Artifact Ratio (SAR), and Perceptual Evaluation Speech Quality (PESQ) score. Two speaker mixture datasets from TIMIT corpus were prepared to evaluate the proposed model, and the system achieved 12.3 dB and 2.94 for SDR and PESQ scores respectively, which were better than the original DANet model. Other improvements were 20.7% and 17.9% in the number of parameters and time training, respectively. The model was applied on mixed Arabic speech signals and the results were better than that in English. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Journal ref: Journal of Harbin Institute of Technology (New Series), vol. 28, no. 3, pp. 90-96, 2021

arXiv:2305.15758 [pdf]

Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation

Authors: Rawad Melhem, Assef Jafar, Oumayma Al Dakkak

Abstract: Speech separation is very important in real-world applications such as human-machine interaction, hearing aids devices, and automatic meeting transcription. In recent years, a significant improvement occurred towards the solution based on deep learning. In fact, much attention has been drawn to supervised learning methods using synthetic mixtures datasets despite their being not representative of… ▽ More Speech separation is very important in real-world applications such as human-machine interaction, hearing aids devices, and automatic meeting transcription. In recent years, a significant improvement occurred towards the solution based on deep learning. In fact, much attention has been drawn to supervised learning methods using synthetic mixtures datasets despite their being not representative of real-world mixtures. The difficulty in building a realistic dataset led researchers to use unsupervised learning methods, because of their ability to handle realistic mixtures directly. The results of unsupervised learning methods are still unconvincing. In this paper, a method is introduced to create a realistic dataset with ground truth sources for speech separation. The main challenge in designing a realistic dataset is the unavailability of ground truths for speakers signals. To address this, we propose a method for simultaneously recording two speakers and obtaining the ground truth for each. We present a methodology for benchmarking our realistic dataset using a deep learning model based on Bidirectional Gated Recurrent Units (BGRU) and clustering algorithm. The experiments show that our proposed dataset improved SI-SDR (Scale Invariant Signal to Distortion Ratio) by 1.65 dB and PESQ (Perceptual Evaluation of Speech Quality) by approximately 0.5. We also evaluated the effectiveness of our method at different distances between the microphone and the speakers and found that it improved the stability of the learned model. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2301.10772 [pdf]

Gene-SGAN: a method for discovering disease subtypes with imaging and genetic signatures via multi-view weakly-supervised deep clustering

Authors: Zhijian Yang, Junhao Wen, Ahmed Abdulkadir, Yuhan Cui, Guray Erus, Elizabeth Mamourian, Randa Melhem, Dhivya Srinivasan, Sindhuja T. Govindarajan, Jiong Chen, Mohamad Habes, Colin L. Masters, Paul Maruff, Jurgen Fripp, Luigi Ferrucci, Marilyn S. Albert, Sterling C. Johnson, John C. Morris, Pamela LaMontagne, Daniel S. Marcus, Tammie L. S. Benzinger, David A. Wolk, Li Shen, **gxuan Bao, Susan M. Resnick , et al. (3 additional authors not shown)

Abstract: Disease heterogeneity has been a critical challenge for precision diagnosis and treatment, especially in neurologic and neuropsychiatric diseases. Many diseases can display multiple distinct brain phenotypes across individuals, potentially reflecting disease subtypes that can be captured using MRI and machine learning methods. However, biological interpretability and treatment relevance are limite… ▽ More Disease heterogeneity has been a critical challenge for precision diagnosis and treatment, especially in neurologic and neuropsychiatric diseases. Many diseases can display multiple distinct brain phenotypes across individuals, potentially reflecting disease subtypes that can be captured using MRI and machine learning methods. However, biological interpretability and treatment relevance are limited if the derived subtypes are not associated with genetic drivers or susceptibility factors. Herein, we describe Gene-SGAN - a multi-view, weakly-supervised deep clustering method - which dissects disease heterogeneity by jointly considering phenotypic and genetic data, thereby conferring genetic correlations to the disease subtypes and associated endophenotypic signatures. We first validate the generalizability, interpretability, and robustness of Gene-SGAN in semi-synthetic experiments. We then demonstrate its application to real multi-site datasets from 28,858 individuals, deriving subtypes of Alzheimer's disease and brain endophenotypes associated with hypertension, from MRI and SNP data. Derived brain phenotypes displayed significant differences in neuroanatomical patterns, genetic determinants, biological and clinical biomarkers, indicating potentially distinct underlying neuropathologic processes, genetic drivers, and susceptibility factors. Overall, Gene-SGAN is broadly applicable to disease subty** and endophenotype discovery, and is herein tested on disease-related, genetically-driven neuroimaging phenotypes. △ Less

Submitted 25 January, 2023; originally announced January 2023.

arXiv:1907.06768 [pdf, other]

Partitioning Graphs for the Cloud using Reinforcement Learning

Authors: Mohammad Hasanzadeh Mofrad, Rami Melhem, Mohammad Hammoud

Abstract: In this paper, we propose Revolver, a parallel graph partitioning algorithm capable of partitioning large-scale graphs on a single shared-memory machine. Revolver employs an asynchronous processing framework, which leverages reinforcement learning and label propagation to adaptively partition a graph. In addition, it adopts a vertex-centric view of the graph where each vertex is assigned an autono… ▽ More In this paper, we propose Revolver, a parallel graph partitioning algorithm capable of partitioning large-scale graphs on a single shared-memory machine. Revolver employs an asynchronous processing framework, which leverages reinforcement learning and label propagation to adaptively partition a graph. In addition, it adopts a vertex-centric view of the graph where each vertex is assigned an autonomous agent responsible for selecting a suitable partition for it, distributing thereby the computation across all vertices. The intuition behind using a vertex-centric view is that it naturally fits the graph partitioning problem, which entails that a graph can be partitioned using local information provided by each vertex's neighborhood. We fully implemented and comprehensively tested Revolver using nine real-world graphs. Our results show that Revolver is scalable and can outperform three popular and state-of-the-art graph partitioners via producing comparable localized partitions, yet without sacrificing the load balance across partitions. △ Less

Submitted 17 July, 2019; v1 submitted 15 July, 2019; originally announced July 2019.

arXiv:1806.02498 [pdf, other]

Mitigating Wordline Crosstalk using Adaptive Trees of Counters

Authors: Seyed Mohammad Seyedzadeh, Alex K. Jones, Rami Melhem

Abstract: High access frequency of certain rows in the DRAM may cause data loss in cells of physically adjacent rows due to crosstalk. The malicious exploit of this crosstalk by repeatedly accessing a row to induce this effect is known as row hammering. Additionally, inadvertent row hammering may also occur due to the natural weighted nature of applications' access patterns. In this paper, we analyze the… ▽ More High access frequency of certain rows in the DRAM may cause data loss in cells of physically adjacent rows due to crosstalk. The malicious exploit of this crosstalk by repeatedly accessing a row to induce this effect is known as row hammering. Additionally, inadvertent row hammering may also occur due to the natural weighted nature of applications' access patterns. In this paper, we analyze the efficiency of existing approaches for mitigating wordline crosstalk and demonstrate that they have been conservatively designed. Given the unbalanced nature of DRAM accesses, a small group of dynamically allocated counters in banks can deterministically detect hot rows and mitigate crosstalk. Based on our findings, we propose a Counter-based Adaptive Tree (CAT) approach to mitigate wordline crosstalk using adaptive trees of counters to guide appropriate refreshing of vulnerable rows. The key idea is to tune the distribution of the counters to the rows in a bank based on the memory reference patterns. In contrast to deterministic solutions, CAT utilizes fewer counters, making it practically feasible to be implemented on-chip. Compared to existing probabilistic approaches, CAT more precisely refreshes rows vulnerable to crosstalk based on their access frequency. Experimental results on workloads from four benchmark suites show that CAT reduces the Crosstalk Mitigation Refresh Power Overhead in quad-core systems to 7%, which is an improvement over the 21% and 18% incurred in the leading deterministic and probabilistic approaches, respectively. Moreover, CAT incurs very low performance overhead (0.5%). Hardware synthesis evaluation shows that CAT can be implemented on-chip with only a nominal area overhead. △ Less

Submitted 6 June, 2018; originally announced June 2018.

Comments: 12 pages

arXiv:1711.08572 [pdf, other]

Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM

Authors: Seyed Mohammad Seyedzadeh, Alex K. Jones, Rami Melhem

Abstract: Phase change memory (PCM) has recently emerged as a promising technology to meet the fast growing demand for large capacity memory in computer systems, replacing DRAM that is impeded by physical limitations. Multi-level cell (MLC) PCM offers high density with low per-byte fabrication cost. However, despite many advantages, such as scalability and low leakage, the energy for programming intermediat… ▽ More Phase change memory (PCM) has recently emerged as a promising technology to meet the fast growing demand for large capacity memory in computer systems, replacing DRAM that is impeded by physical limitations. Multi-level cell (MLC) PCM offers high density with low per-byte fabrication cost. However, despite many advantages, such as scalability and low leakage, the energy for programming intermediate states is considerably larger than programing single-level cell PCM. In this paper, we study encoding techniques to reduce write energy for MLC PCM when the encoding granularity is lowered below the typical cache line size. We observe that encoding data blocks at small granularity to reduce write energy actually increases the write energy because of the auxiliary encoding bits. We mitigate this adverse effect by 1) designing suitable codeword map**s that use fewer auxiliary bits and 2) proposing a new Word-Level Compression (WLC) which compresses more than 91% of the memory lines and provides enough room to store the auxiliary data using a novel restricted coset encoding applied at small data block granularities. Experimental results show that the proposed encoding at 16-bit data granularity reduces the write energy by 39%, on average, versus the leading encoding approach for write energy reduction. Furthermore, it improves endurance by 20% and is more reliable than the leading approach. Hardware synthesis evaluation shows that the proposed encoding can be implemented on-chip with only a nominal area overhead. △ Less

Submitted 22 November, 2017; originally announced November 2017.

Comments: 12 pages

arXiv:1710.08940 [pdf, other]

A Variable Length Coding Framework for Cost Function Reduction in Non-Volatile Memory Systems

Authors: Seyed Mohammad Seyedzadeh, Alex K. Jones, Rami Melhem

Abstract: Variable length coding for Non-Volatile Memory (NVM) technologies is a promising method to improve memory capacity and system performance through compressing memory blocks. However, compression techniques used to improve capacity or bandwidth utilization do not take into consideration the asymmetric costs of writing 1's and 0's in NVMs. Taking into account this asymmetry, we propose a variable len… ▽ More Variable length coding for Non-Volatile Memory (NVM) technologies is a promising method to improve memory capacity and system performance through compressing memory blocks. However, compression techniques used to improve capacity or bandwidth utilization do not take into consideration the asymmetric costs of writing 1's and 0's in NVMs. Taking into account this asymmetry, we propose a variable length encoding framework that reduces the cost of writing data into NVM. Our experimental results on 12 workloads of the SPEC CPU2006 benchmark suite show that, when the cost asymmetry is 1:2, the proposed framework is capable of reducing the NVM programming cost by up to 24% more than leading compression approaches and by 12.5% more than the flip-and-write approach which selects between the data and its complement based on the programming cost. △ Less

Submitted 24 October, 2017; originally announced October 2017.

Comments: NVMW2017

arXiv:1302.3720 [pdf, other]

Energy-aware checkpointing of divisible tasks with soft or hard deadlines

Authors: Guillaume Aupy, Anne Benoit, Rami Melhem, Paul Renaud-Goud, Yves Robert

Abstract: In this paper, we aim at minimizing the energy consumption when executing a divisible workload under a bound on the total execution time, while resilience is provided through checkpointing. We discuss several variants of this multi-criteria problem. Given the workload, we need to decide how many chunks to use, what are the sizes of these chunks, and at which speed each chunk is executed. Furthermo… ▽ More In this paper, we aim at minimizing the energy consumption when executing a divisible workload under a bound on the total execution time, while resilience is provided through checkpointing. We discuss several variants of this multi-criteria problem. Given the workload, we need to decide how many chunks to use, what are the sizes of these chunks, and at which speed each chunk is executed. Furthermore, since a failure may occur during the execution of a chunk, we also need to decide at which speed a chunk should be re-executed in the event of a failure. The goal is to minimize the expectation of the total energy consumption, while enforcing a deadline on the execution time, that should be met either in expectation (soft deadline), or in the worst case (hard deadline). For each problem instance, we propose either an exact solution, or a function that can be optimized numerically. The different models are then compared through an extensive set of experiments. △ Less

Submitted 15 February, 2013; originally announced February 2013.

Comments: This work was supported by ANR Rescue

Report number: INRIA RR-8238

Showing 1–8 of 8 results for author: Melhem, R