Skip to main content

Showing 1–20 of 20 results for author: Humayun, A I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09657  [pdf, other

    cs.LG stat.ML

    ScaLES: Scalable Latent Exploration Score for Pre-Trained Generative Networks

    Authors: Omer Ronen, Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk, Bin Yu

    Abstract: We develop Scalable Latent Exploration Score (ScaLES) to mitigate over-exploration in Latent Space Optimization (LSO), a popular method for solving black-box discrete optimization problems. LSO utilizes continuous optimization within the latent space of a Variational Autoencoder (VAE) and is known to be susceptible to over-exploration, which manifests in unrealistic solutions that reduce its pract… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2402.15555  [pdf, other

    cs.LG cs.AI cs.CV

    Deep Networks Always Grok and Here is Why

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

    Abstract: Grokking, or delayed generalization, is a phenomenon where generalization in a deep neural network (DNN) occurs long after achieving near zero training error. Previous studies have reported the occurrence of grokking in specific controlled settings, such as DNNs initialized with large-norm parameters or transformers trained on algorithmic datasets. We demonstrate that grokking is actually much mor… ▽ More

    Submitted 6 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: ICML 2024. Website: https://bit.ly/grok-adversarial. Pages 24, Figures 36

  3. arXiv:2310.12977  [pdf, other

    cs.LG cs.AI cs.CV

    Training Dynamics of Deep Network Linear Regions

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

    Abstract: The study of Deep Network (DN) training dynamics has largely focused on the evolution of the loss function, evaluated on or around train and test set data points. In fact, many DN phenomenon were first introduced in literature with that respect, e.g., double descent, grokking. In this study, we look at the training dynamics of the input space partition or linear regions formed by continuous piecew… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 14 pages, 14 figures

  4. arXiv:2307.01850  [pdf, other

    cs.LG cs.AI cs.CV

    Self-Consuming Generative Models Go MAD

    Authors: Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, Richard G. Baraniuk

    Abstract: Seismic advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models. Repeating this process creates an autophagous (self-consuming) loop whose properties are poorly understood. We conduct a thorough analytical and empirical analysis using state-of-the-art generative image models of three families of au… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: 31 pages, 31 figures, pre-print

  5. arXiv:2306.01743  [pdf

    cs.CL

    Unicode Normalization and Grapheme Parsing of Indic Languages

    Authors: Nazmuddoha Ansary, Quazi Adibur Rahman Adib, Tahsin Reasat, Asif Shahriyar Sushmit, Ahmed Imtiaz Humayun, Sazia Mehnaz, Kanij Fatema, Mohammad Mamun Or Rashid, Farig Sadeque

    Abstract: Writing systems of Indic languages have orthographic syllables, also known as complex graphemes, as unique horizontal units. A prominent feature of these languages is these complex grapheme units that comprise consonants/consonant conjuncts, vowel diacritics, and consonant diacritics, which, together make a unique Language. Unicode-based writing schemes of these languages often disregard this feat… ▽ More

    Submitted 27 May, 2024; v1 submitted 11 May, 2023; originally announced June 2023.

    Comments: Published at LREC-COLING 2024

  6. arXiv:2305.09688  [pdf

    eess.AS cs.CL cs.LG

    OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking

    Authors: Fazle Rabbi Rakib, Souhardya Saha Dip, Samiul Alam, Nazia Tasnim, Md. Istiak Hossain Shihab, Md. Nazmuddoha Ansary, Syed Mobassir Hossen, Marsia Haque Meghla, Mamunur Mamun, Farig Sadeque, Sayma Sultana Chowdhury, Tahsin Reasat, Asif Sushmit, Ahmed Imtiaz Humayun

    Abstract: We present OOD-Speech, the first out-of-distribution (OOD) benchmarking dataset for Bengali automatic speech recognition (ASR). Being one of the most spoken languages globally, Bengali portrays large diversity in dialects and prosodic features, which demands ASR frameworks to be robust towards distribution shifts. For example, islamic religious sermons in Bengali are delivered with a tonality that… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  7. arXiv:2303.05325  [pdf, other

    cs.CV

    BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset

    Authors: Md. Istiak Hossain Shihab, Md. Rakibul Hasan, Mahfuzur Rahman Emon, Syed Mobassir Hossen, Md. Nazmuddoha Ansary, Intesur Ahmed, Fazle Rabbi Rakib, Shahriar Elahi Dhruvo, Souhardya Saha Dip, Akib Hasan Pavel, Marsia Haque Meghla, Md. Rezwanul Haque, Sayma Sultana Chowdhury, Farig Sadeque, Tahsin Reasat, Ahmed Imtiaz Humayun, Asif Shahriyar Sushmit

    Abstract: While strides have been made in deep learning based Bengali Optical Character Recognition (OCR) in the past decade, the absence of large Document Layout Analysis (DLA) datasets has hindered the application of OCR in document transcription, e.g., transcribing historical documents and newspapers. Moreover, rule-based DLA systems that are currently being employed in practice are not robust to domain… ▽ More

    Submitted 5 May, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  8. arXiv:2302.12828  [pdf, other

    cs.CV cs.LG

    SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Guha Balakrishnan, Richard Baraniuk

    Abstract: Current Deep Network (DN) visualization and interpretability methods rely heavily on data space visualizations such as scoring which dimensions of the data are responsible for their associated prediction or generating new data features or samples that best match a given DN unit or representation. In this paper, we go one step further by develo** the first provably exact method for computing the… ▽ More

    Submitted 6 June, 2024; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: 11 pages, 20 figures

  9. arXiv:2206.14053  [pdf

    cs.CL cs.SD eess.AS

    Bengali Common Voice Speech Dataset for Automatic Speech Recognition

    Authors: Samiul Alam, Asif Sushmit, Zaowad Abdullah, Shahrin Nakkhatra, MD. Nazmuddoha Ansary, Syed Mobassir Hossen, Sazia Morshed Mehnaz, Tahsin Reasat, Ahmed Imtiaz Humayun

    Abstract: Bengali is one of the most spoken languages in the world with over 300 million speakers globally. Despite its popularity, research into the development of Bengali speech recognition systems is hindered due to the lack of diverse open-source datasets. As a way forward, we have crowdsourced the Bengali Common Voice Speech Dataset, which is a sentence-level automatic speech recognition corpus. Collec… ▽ More

    Submitted 29 June, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

  10. arXiv:2203.02502  [pdf, other

    cs.LG cs.AI

    No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Anastasios Kyrillidis, Richard Baraniuk

    Abstract: Centroid based clustering methods such as k-means, k-medoids and k-centers are heavily applied as a go-to tool in exploratory data analysis. In many cases, those methods are used to obtain representative centroids of the data manifold for visualization or summarization of a dataset. Real world datasets often contain inherent abnormalities, e.g., repeated samples and sampling bias, that manifest im… ▽ More

    Submitted 15 June, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted for ICASSP 2022, 8 figures, 1 table

  11. arXiv:2203.01993  [pdf, other

    cs.CV

    Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

    Abstract: We present Polarity Sampling, a theoretically justified plug-and-play method for controlling the generation quality and diversity of pre-trained deep generative networks DGNs). Leveraging the fact that DGNs are, or can be approximated by, continuous piecewise affine splines, we derive the analytical DGN output space distribution as a function of the product of the DGN's Jacobian singular values ra… ▽ More

    Submitted 6 May, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: 20 pages, 16 figures, CVPR 2022 Oral, Camera Ready

  12. arXiv:2110.08009  [pdf, other

    cs.LG cs.CV

    MaGNET: Uniform Sampling from Deep Generative Network Manifolds Without Retraining

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

    Abstract: Deep Generative Networks (DGNs) are extensively employed in Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and their variants to approximate the data manifold and distribution. However, training samples are often distributed in a non-uniform fashion on the manifold, due to costs or convenience of collection. For example, the CelebA dataset contains a large fraction of smi… ▽ More

    Submitted 20 January, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: ICLR Accepted version, 28 pages, 23 figures

  13. arXiv:2010.13975  [pdf, other

    eess.SP cs.LG

    Wearing a MASK: Compressed Representations of Variable-Length Sequences Using Recurrent Neural Tangent Kernels

    Authors: Sina Alemohammad, Hossein Babaei, Randall Balestriero, Matt Y. Cheung, Ahmed Imtiaz Humayun, Daniel LeJeune, Naiming Liu, Lorenzo Luzi, Jasper Tan, Zichao Wang, Richard G. Baraniuk

    Abstract: High dimensionality poses many challenges to the use of data, from visualization and interpretation, to prediction and storage for historical preservation. Techniques abound to reduce the dimensionality of fixed-length sequences, yet these methods rarely generalize to variable-length sequences. To address this gap, we extend existing methods that rely on the use of kernels to variable-length seque… ▽ More

    Submitted 17 April, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

  14. A Large Multi-Target Dataset of Common Bengali Handwritten Graphemes

    Authors: Samiul Alam, Tahsin Reasat, Asif Shahriyar Sushmit, Sadi Mohammad Siddiquee, Fuad Rahman, Mahady Hasan, Ahmed Imtiaz Humayun

    Abstract: Latin has historically led the state-of-the-art in handwritten optical character recognition (OCR) research. Adapting existing systems from Latin to alpha-syllabary languages is particularly challenging due to a sharp contrast between their orthographies. The segmentation of graphical constituents corresponding to characters becomes significantly hard due to a cursive writing system and frequent u… ▽ More

    Submitted 13 January, 2021; v1 submitted 30 September, 2020; originally announced October 2020.

    Comments: 15 pages, 12 figures, 6 Tables, Submitted to CVPR-21

  15. arXiv:1904.12271  [pdf, other

    cs.CV eess.IV

    X-Ray Image Compression Using Convolutional Recurrent Neural Networks

    Authors: Asif Shahriyar Sushmit, Shakib Uz Zaman, Ahmed Imtiaz Humayun, Taufiq Hasan, Mohammed Imamul Hassan Bhuiyan

    Abstract: In the advent of a digital health revolution, vast amounts of clinical data are being generated, stored and processed on a daily basis. This has made the storage and retrieval of large volumes of health-care data, especially, high-resolution medical images, particularly challenging. Effective image compression for medical images thus plays a vital role in today's healthcare information system, par… ▽ More

    Submitted 9 May, 2019; v1 submitted 28 April, 2019; originally announced April 2019.

    Comments: 4 pages, 2 figures, IEEE BHI 2019

  16. arXiv:1904.10255  [pdf, other

    cs.LG cs.CV eess.SP stat.ML

    End-to-end Sleep Staging with Raw Single Channel EEG using Deep Residual ConvNets

    Authors: Ahmed Imtiaz Humayun, Asif Shahriyar Sushmit, Taufiq Hasan, Mohammed Imamul Hassan Bhuiyan

    Abstract: Humans approximately spend a third of their life slee**, which makes monitoring sleep an integral part of well-being. In this paper, a 34-layer deep residual ConvNet architecture for end-to-end sleep staging is proposed. The network takes raw single channel electroencephalogram (Fpz-Cz) signal as input and yields hypnogram annotations for each 30s segments as output. Experiments are carried out… ▽ More

    Submitted 23 April, 2019; originally announced April 2019.

    Comments: 5 pages, 3 Figures, Appendix, IEEE BHI 2019

  17. arXiv:1810.04452  [pdf, other

    cs.CV

    AI Learns to Recognize Bengali Handwritten Digits: Bengali.AI Computer Vision Challenge 2018

    Authors: Sharif Amit Kamran, Ahmed Imtiaz Humayun, Samiul Alam, Rashed Mohammad Doha, Manash Kumar Mandal, Tahsin Reasat, Fuad Rahman

    Abstract: Solving problems with Artificial intelligence in a competitive manner has long been absent in Bangladesh and Bengali-speaking community. On the other hand, there has not been a well structured database for Bengali Handwritten digits for mass public use. To bring out the best minds working in machine learning and use their expertise to create a model which can easily recognize Bengali Handwritten d… ▽ More

    Submitted 10 October, 2018; originally announced October 2018.

    Comments: 5 pages, 3 figures

  18. An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification

    Authors: Ahmed Imtiaz Humayun, Md. Tauhiduzzaman Khan, Shabnam Ghaffarzadegan, Zhe Feng, Taufiq Hasan

    Abstract: In this work, we propose an ensemble of classifiers to distinguish between various degrees of abnormalities of the heart using Phonocardiogram (PCG) signals acquired using digital stethoscopes in a clinical setting, for the INTERSPEECH 2018 Computational Paralinguistics (ComParE) Heart Beats SubChallenge. Our primary classification framework constitutes a convolutional neural network with 1D-CNN t… ▽ More

    Submitted 7 October, 2018; v1 submitted 18 June, 2018; originally announced June 2018.

    Comments: 5 pages, 5 figures, Interspeech 2018 accepted manuscript

  19. arXiv:1806.05892  [pdf, other

    cs.CV cs.LG eess.SP stat.ML

    Learning Front-end Filter-bank Parameters using Convolutional Neural Networks for Abnormal Heart Sound Detection

    Authors: Ahmed Imtiaz Humayun, Shabnam Ghaffarzadegan, Zhe Feng, Taufiq Hasan

    Abstract: Automatic heart sound abnormality detection can play a vital role in the early diagnosis of heart diseases, particularly in low-resource settings. The state-of-the-art algorithms for this task utilize a set of Finite Impulse Response (FIR) band-pass filters as a front-end followed by a Convolutional Neural Network (CNN) model. In this work, we propound a novel CNN architecture that integrates the… ▽ More

    Submitted 15 June, 2018; originally announced June 2018.

    Comments: 4 pages, 6 figures, IEEE International Engineering in Medicine and Biology Conference (EMBC)

  20. arXiv:1806.02452  [pdf, other

    cs.CV

    NumtaDB - Assembled Bengali Handwritten Digits

    Authors: Samiul Alam, Tahsin Reasat, Rashed Mohammad Doha, Ahmed Imtiaz Humayun

    Abstract: To benchmark Bengali digit recognition algorithms, a large publicly available dataset is required which is free from biases originating from geographical location, gender, and age. With this aim in mind, NumtaDB, a dataset consisting of more than 85,000 images of hand-written Bengali digits, has been assembled. This paper documents the collection and curation process of numerals along with the sal… ▽ More

    Submitted 6 June, 2018; originally announced June 2018.

    Comments: 6 page, 12 figures

    MSC Class: 68T10 ACM Class: I.5.1; I.5.4