Search | arXiv e-print repository

Cognitively Inspired Energy-Based World Models

Authors: Alexi Gladstone, Ganesh Nanduru, Md Mofijul Islam, Aman Chadha, Jundong Li, Tariq Iqbal

Abstract: One of the predominant methods for training world models is autoregressive prediction in the output space of the next element of a sequence. In Natural Language Processing (NLP), this takes the form of Large Language Models (LLMs) predicting the next token; in Computer Vision (CV), this takes the form of autoregressive models predicting the next frame/token/pixel. However, this approach differs fr… ▽ More One of the predominant methods for training world models is autoregressive prediction in the output space of the next element of a sequence. In Natural Language Processing (NLP), this takes the form of Large Language Models (LLMs) predicting the next token; in Computer Vision (CV), this takes the form of autoregressive models predicting the next frame/token/pixel. However, this approach differs from human cognition in several respects. First, human predictions about the future actively influence internal cognitive processes. Second, humans naturally evaluate the plausibility of predictions regarding future states. Based on this capability, and third, by assessing when predictions are sufficient, humans allocate a dynamic amount of time to make a prediction. This adaptive process is analogous to System 2 thinking in psychology. All these capabilities are fundamental to the success of humans at high-level reasoning and planning. Therefore, to address the limitations of traditional autoregressive models lacking these human-like capabilities, we introduce Energy-Based World Models (EBWM). EBWM involves training an Energy-Based Model (EBM) to predict the compatibility of a given context and a predicted future state. In doing so, EBWM enables models to achieve all three facets of human cognition described. Moreover, we developed a variant of the traditional autoregressive transformer tailored for Energy-Based models, termed the Energy-Based Transformer (EBT). Our results demonstrate that EBWM scales better with data and GPU Hours than traditional autoregressive transformers in CV, and that EBWM offers promising early scaling in NLP. Consequently, this approach offers an exciting path toward training future models capable of System 2 thinking and intelligently searching across state spaces. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 23 pages, 6 figures

arXiv:2402.19318 [pdf, other]

doi 10.1145/3613904.3642685

DISCERN: Designing Decision Support Interfaces to Investigate the Complexities of Workplace Social Decision-Making With Line Managers

Authors: Pranav Khadpe, Lindy Le, Kate Nowak, Shamsi T. Iqbal, **a Suh

Abstract: Line managers form the first level of management in organizations, and must make complex decisions, while maintaining relationships with those impacted by their decisions. Amidst growing interest in technology-supported decision-making at work, their needs remain understudied. Further, most existing design knowledge for supporting social decision-making comes from domains where decision-makers are… ▽ More Line managers form the first level of management in organizations, and must make complex decisions, while maintaining relationships with those impacted by their decisions. Amidst growing interest in technology-supported decision-making at work, their needs remain understudied. Further, most existing design knowledge for supporting social decision-making comes from domains where decision-makers are more socially detached from those they decide for. We conducted iterative design research with line managers within a technology organization, investigating decision-making practices, and opportunities for technological support. Through formative research, development of a decision-representation tool -- DISCERN -- and user enactments, we identify their communication and analysis needs that lack adequate support. We found they preferred tools for externalizing reasoning rather than tools that replace interpersonal interactions, and they wanted tools to support a range of intuitive and calculative decision-making. We discuss how design of social decision-making supports, especially in the workplace, can more explicitly support highly interactional social decision-making. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: CHI 2024

arXiv:2401.08960 [pdf, other]

From User Surveys to Telemetry-Driven Agents: Exploring the Potential of Personalized Productivity Solutions

Authors: Subigya Nepal, Javier Hernandez, Talie Massachi, Kael Rowan, Judith Amores, **a Suh, Gonzalo Ramos, Brian Houck, Shamsi T. Iqbal, Mary Czerwinski

Abstract: We present a comprehensive, user-centric approach to understand preferences in AI-based productivity agents and develop personalized solutions tailored to users' needs. Utilizing a two-phase method, we first conducted a survey with 363 participants, exploring various aspects of productivity, communication style, agent approach, personality traits, personalization, and privacy. Drawing on the surve… ▽ More We present a comprehensive, user-centric approach to understand preferences in AI-based productivity agents and develop personalized solutions tailored to users' needs. Utilizing a two-phase method, we first conducted a survey with 363 participants, exploring various aspects of productivity, communication style, agent approach, personality traits, personalization, and privacy. Drawing on the survey insights, we developed a GPT-4 powered personalized productivity agent that utilizes telemetry data gathered via Viva Insights from information workers to provide tailored assistance. We compared its performance with alternative productivity-assistive tools, such as dashboard and narrative, in a study involving 40 participants. Our findings highlight the importance of user-centric design, adaptability, and the balance between personalization and privacy in AI-assisted productivity tools. By building on the insights distilled from our study, we believe that our work can enable and guide future research to further enhance productivity solutions, ultimately leading to optimized efficiency and user experiences for information workers. △ Less

Submitted 16 January, 2024; originally announced January 2024.

ACM Class: H.5.0; H.5.3; H.5.m; J.0

arXiv:2305.16741 [pdf, other]

Identifying human values from goal models: An industrial case study

Authors: Tahira Iqbal, Kuldar Taveter, Tarmo Strenze, Waqar Hussain, Omar Haggag, John Alphonsus Matthews, Anu Piirisild

Abstract: Human values are principles that guide human actions and behaviour in personal and social life. Ignoring human values during requirements engineering introduces a negative impact on software uptake and continued use. Embedding human values into software is admittedly challenging; however, early elicitation of stakeholder values increases the chances of their inclusion into the developed system. Us… ▽ More Human values are principles that guide human actions and behaviour in personal and social life. Ignoring human values during requirements engineering introduces a negative impact on software uptake and continued use. Embedding human values into software is admittedly challenging; however, early elicitation of stakeholder values increases the chances of their inclusion into the developed system. Using Pharaon, a research and innovation project of the European Union's Horizon 2020 program, as a case study we analysed stakeholder requirements expressed as motivational goal models consisting of functional, quality, and emotional goals in three large-scale trial applications of the project. We were able to elicit 9 of 10 human values according to the theory of human values by Schwartz from the motivational goal models that represent the requirements for the three applications. Our findings highlight the dominant trend of stakeholder values being embedded in emotional goals and show that almost 45% of the identified values belong to the value categories of Security and Self-direction. Our research extends prior work in emotional goal modelling in requirements engineering by linking emotional goals to various stakeholder roles and identifying their values based on the Schwartz theory of human values △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.16091 [pdf, other]

Emotions in Requirements Engineering: A Systematic Map** Study

Authors: Tahira Iqbal, Hina Anwar, Syazwanie Filzah, Mohammad Gharib, Kerli Moose, Kuldar Taveter

Abstract: The purpose of requirements engineering (RE) is to make sure that the expectations and needs of the stakeholders of a software system are met. Emotional needs can be captured as emotional requirements that represent how the end user should feel when using the system. Differently from functional and quality (non-functional) requirements, emotional requirements have received relatively less attentio… ▽ More The purpose of requirements engineering (RE) is to make sure that the expectations and needs of the stakeholders of a software system are met. Emotional needs can be captured as emotional requirements that represent how the end user should feel when using the system. Differently from functional and quality (non-functional) requirements, emotional requirements have received relatively less attention from the RE community. This study is motivated by the need to explore and map the literature on emotional requirements. The study applies the systematic map** study technique for surveying and analyzing the available literature to identify the most relevant publications on emotional requirements. We identified 34 publications that address a wide spectrum of practices concerned with engineering emotional requirements. The identified publications were analyzed with respect to the application domains, instruments used for eliciting and artefacts used for representing emotional requirements, and the state of the practice in emotion-related requirements engineering. This analysis serves to identify research gaps and research directions in engineering emotional requirements. To the best of the knowledge by the authors, no other similar study has been conducted on emotional requirements. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2303.09262 [pdf, other]

Elastoviscoplastic fluid flow past a circular cylinder

Authors: S. Parvar, K. T. Iqbal, M. N. Ardekani, L. Brandt, O. Tammisola

Abstract: The combined effect of fluid elasticity and yield-stress on the flow past a circular cylinder is studied by two-dimensional direct numerical simulation. We analyze the effects of yield-stress, elasticity, shear-thinning, and shear-thickening on the wake characteristics using the Saramito constitutive model. The elastoviscoplastic (EVP) wake flow is studied at a moderate Reynolds number (Re = 100)… ▽ More The combined effect of fluid elasticity and yield-stress on the flow past a circular cylinder is studied by two-dimensional direct numerical simulation. We analyze the effects of yield-stress, elasticity, shear-thinning, and shear-thickening on the wake characteristics using the Saramito constitutive model. The elastoviscoplastic (EVP) wake flow is studied at a moderate Reynolds number (Re = 100) where two-dimensional vortex shedding occurs in the Newtonian case. We find that in the shear-thinning elastoviscoplastic flow, when yield stress increases, the drag coefficient and root mean square of the lift coefficient both decrease, while the length of the recirculation bubble $L_{RB}$ increases. These changes indicate that the wake oscillation amplitude decreases with an increasing yield stress. For shear-thickening however, the drag coefficient $C_D$ increases at a large Bingham number, and the wake becomes chaotic. The comparison of viscoelastic fluid and EVP fluid reveals that the polymer stresses, tr($τ^p$), decay considerably less downstream of the cylinder in the EVP case, indicating that significant stresses persist at large distances. We observe that shear-thinning competes with elastic and yield stresses and counteracts their effect, while shear-thickening enhances elastic and yield stress effects, so that the flow pattern can change from periodic to a chaotic flow. △ Less

Submitted 16 March, 2023; originally announced March 2023.

arXiv:2303.06794 [pdf, other]

Sensing Wellbeing in the Workplace, Why and For Whom? Envisioning Impacts with Organizational Stakeholders

Authors: Anna Kawakami, Shreya Chowdhary, Shamsi T. Iqbal, Q. Vera Liao, Alexandra Olteanu, **a Suh, Koustuv Saha

Abstract: With the heightened digitization of the workplace, alongside the rise of remote and hybrid work prompted by the pandemic, there is growing corporate interest in using passive sensing technologies for workplace wellbeing. Existing research on these technologies often focus on understanding or improving interactions between an individual user and the technology. Workplace settings can, however, intr… ▽ More With the heightened digitization of the workplace, alongside the rise of remote and hybrid work prompted by the pandemic, there is growing corporate interest in using passive sensing technologies for workplace wellbeing. Existing research on these technologies often focus on understanding or improving interactions between an individual user and the technology. Workplace settings can, however, introduce a range of complexities that challenge the potential impact and in-practice desirability of wellbeing sensing technologies. Today, there is an inadequate empirical understanding of how everyday workers -- including those who are impacted by, and impact the deployment of workplace technologies -- envision its broader socio-ecological impacts. In this study, we conduct storyboard-driven interviews with 33 participants across three stakeholder groups: organizational governors, AI builders, and worker data subjects. Overall, our findings surface how workers envisioned wellbeing sensing technologies may lead to cascading impacts on their broader organizational culture, interpersonal relationships with colleagues, and individual day-to-day lives. Participants anticipated harms arising from ambiguity and misalignment around scaled notions of ``worker wellbeing,'' underlying technical limitations to workplace-situated sensing, and assumptions regarding how social structures and relationships may shape the impacts and use of these technologies. Based on our findings, we discuss implications for designing worker-centered data-driven wellbeing technologies. △ Less

Submitted 6 June, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

arXiv:2212.13835 [pdf, other]

Representation Learning in Deep RL via Discrete Information Bottleneck

Authors: Riashat Islam, Hongyu Zang, Manan Tomar, Aniket Didolkar, Md Mofijul Islam, Samin Yeasar Arnob, Tariq Iqbal, Xin Li, Anirudh Goyal, Nicolas Heess, Alex Lamb

Abstract: Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in th… ▽ More Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information. △ Less

Submitted 30 May, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: AISTATS 2023

arXiv:2210.16040 [pdf, other]

Review on Classification Techniques used in Biophysiological Stress Monitoring

Authors: Talha Iqbal, Adnan Elahi, Atif Shahzad, William Wijns

Abstract: Cardiovascular activities are directly related to the response of a body in a stressed condition. Stress, based on its intensity, can be divided into two types i.e. Acute stress (short-term stress) and Chronic stress (long-term stress). Repeated acute stress and continuous chronic stress may play a vital role in inflammation in the circulatory system and thus leads to a heart attack or to a stroke… ▽ More Cardiovascular activities are directly related to the response of a body in a stressed condition. Stress, based on its intensity, can be divided into two types i.e. Acute stress (short-term stress) and Chronic stress (long-term stress). Repeated acute stress and continuous chronic stress may play a vital role in inflammation in the circulatory system and thus leads to a heart attack or to a stroke. In this study, we have reviewed commonly used machine learning classification techniques applied to different stress-indicating parameters used in stress monitoring devices. These parameters include Photoplethysmograph (PPG), Electrocardiographs (ECG), Electromyograph (EMG), Galvanic Skin Response (GSR), Heart Rate Variation (HRV), skin temperature, respiratory rate, Electroencephalograph (EEG) and salivary cortisol, used in stress monitoring devices. This study also provides a discussion on choosing a classifier, which depends upon a number of factors other than accuracy, like the number of subjects involved in an experiment, type of signals processing and computational limitations. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: 17 pages, 17 figures, 1 table

ACM Class: I.2.6

arXiv:2205.06575 [pdf, other]

Experimental and numerical investigation of bubble migration in shear flow: deformability-driven chaining and repulsion

Authors: Blandine Feneuil, Kazi Tassawar Iqbal, Atle Jensen, Luca Brandt, Outi Tammisola, Andreas Carlson

Abstract: We study the interaction-induced migration of bubbles in shear flow and observe that bubbles suspended in elastoviscoplastic emulsions organise into chains aligned in the flow direction, similarly to particles in viscoelastic fluids. To investigate the driving mechanism, we perform experiments and simulations on bubble pairs, using suspending fluids with different rheological properties. First, we… ▽ More We study the interaction-induced migration of bubbles in shear flow and observe that bubbles suspended in elastoviscoplastic emulsions organise into chains aligned in the flow direction, similarly to particles in viscoelastic fluids. To investigate the driving mechanism, we perform experiments and simulations on bubble pairs, using suspending fluids with different rheological properties. First, we notice that, for all fluids, the interaction type depends on the relative position of the bubbles. If they are aligned in the vorticity direction, they repel, if not, they attract each other. The simulations show a similar behavior in Newtonian fluids as in viscoelastic and elastoviscoplastic fluids, as long as the capillary number is sufficiently large. This shows that the interaction-related migration of the bubbles is strongly affected by the bubble deformation. We suggest that the cause of migration is the interaction between the heterogeneous pressure fields around the deformed bubbles, due to capillary pressure. △ Less

Submitted 21 April, 2023; v1 submitted 13 May, 2022; originally announced May 2022.

Comments: 20 pages, 19 figures

arXiv:2109.09227 [pdf, other]

ARCA23K: An audio dataset for investigating open-set label noise

Authors: Turab Iqbal, Yin Cao, Andrew Bailey, Mark D. Plumbley, Wenwu Wang

Abstract: The availability of audio data on sound sharing platforms such as Freesound gives users access to large amounts of annotated audio. Utilising such data for training is becoming increasingly popular, but the problem of label noise that is often prevalent in such datasets requires further investigation. This paper introduces ARCA23K, an Automatically Retrieved and Curated Audio dataset comprised of… ▽ More The availability of audio data on sound sharing platforms such as Freesound gives users access to large amounts of annotated audio. Utilising such data for training is becoming increasingly popular, but the problem of label noise that is often prevalent in such datasets requires further investigation. This paper introduces ARCA23K, an Automatically Retrieved and Curated Audio dataset comprised of over 23000 labelled Freesound clips. Unlike past datasets such as FSDKaggle2018 and FSDnoisy18K, ARCA23K facilitates the study of label noise in a more controlled manner. We describe the entire process of creating the dataset such that it is fully reproducible, meaning researchers can extend our work with little effort. We show that the majority of labelling errors in ARCA23K are due to out-of-vocabulary audio clips, and we refer to this type of label noise as open-set label noise. Experiments are carried out in which we study the impact of label noise in terms of classification performance and representation learning. △ Less

Submitted 27 February, 2022; v1 submitted 19 September, 2021; originally announced September 2021.

Comments: Accepted to the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)

arXiv:2107.09998 [pdf, other]

Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning

Authors: Xubo Liu, Turab Iqbal, **zheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

Abstract: Deep generative models have recently achieved impressive performance in speech and music synthesis. However, compared to the generation of those domain-specific sounds, generating general sounds (such as siren, gunshots) has received less attention, despite their wide applications. In previous work, the SampleRNN method was considered for sound generation in the time domain. However, SampleRNN is… ▽ More Deep generative models have recently achieved impressive performance in speech and music synthesis. However, compared to the generation of those domain-specific sounds, generating general sounds (such as siren, gunshots) has received less attention, despite their wide applications. In previous work, the SampleRNN method was considered for sound generation in the time domain. However, SampleRNN is potentially limited in capturing long-range dependencies within sounds as it only back-propagates through a limited number of samples. In this work, we propose a method for generating sounds via neural discrete time-frequency representation learning, conditioned on sound classes. This offers an advantage in efficiently modelling long-range dependencies and retaining local fine-grained structures within sound clips. We evaluate our approach on the UrbanSound8K dataset, compared to SampleRNN, with the performance metrics measuring the quality and diversity of generated sounds. Experimental results show that our method offers comparable performance in quality and significantly better performance in diversity. △ Less

Submitted 6 October, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

Comments: Accepted by IEEE 31st International Worlshop on Machine Learning for Signal Processing (MLSP) 2021, 6 pages, 1 figure

arXiv:2107.00544 [pdf, other]

Improving Human Motion Prediction Through Continual Learning

Authors: Mohammad Samin Yasar, Tariq Iqbal

Abstract: Human motion prediction is an essential component for enabling closer human-robot collaboration. The task of accurately predicting human motion is non-trivial. It is compounded by the variability of human motion, both at a skeletal level due to the varying size of humans and at a motion level due to individual movement's idiosyncrasies. These variables make it challenging for learning algorithms t… ▽ More Human motion prediction is an essential component for enabling closer human-robot collaboration. The task of accurately predicting human motion is non-trivial. It is compounded by the variability of human motion, both at a skeletal level due to the varying size of humans and at a motion level due to individual movement's idiosyncrasies. These variables make it challenging for learning algorithms to obtain a general representation that is robust to the diverse spatio-temporal patterns of human motion. In this work, we propose a modular sequence learning approach that allows end-to-end training while also having the flexibility of being fine-tuned. Our approach relies on the diversity of training samples to first learn a robust representation, which can then be fine-tuned in a continual learning setup to predict the motion of new subjects. We evaluated the proposed approach by comparing its performance against state-of-the-art baselines. The results suggest that our approach outperforms other methods over all the evaluated temporal horizons, using a small amount of data for fine-tuning. The improved performance of our approach opens up the possibility of using continual learning for personalized and reliable motion prediction. △ Less

Submitted 1 July, 2021; originally announced July 2021.

arXiv:2105.04356 [pdf, other]

doi 10.1049/cvi2.12028

Coconut trees detection and segmentation in aerial imagery using mask region-based convolution neural network

Authors: Muhammad Shakaib Iqbal, Hazrat Ali, Son N. Tran, Talha Iqbal

Abstract: Food resources face severe damages under extraordinary situations of catastrophes such as earthquakes, cyclones, and tsunamis. Under such scenarios, speedy assessment of food resources from agricultural land is critical as it supports aid activity in the disaster hit areas. In this article, a deep learning approach is presented for the detection and segmentation of coconut tress in aerial imagery… ▽ More Food resources face severe damages under extraordinary situations of catastrophes such as earthquakes, cyclones, and tsunamis. Under such scenarios, speedy assessment of food resources from agricultural land is critical as it supports aid activity in the disaster hit areas. In this article, a deep learning approach is presented for the detection and segmentation of coconut tress in aerial imagery provided through the AI competition organized by the World Bank in collaboration with OpenAerialMap and WeRobotics. Maked Region-based Convolutional Neural Network approach was used identification and segmentation of coconut trees. For the segmentation task, Mask R-CNN model with ResNet50 and ResNet1010 based architectures was used. Several experiments with different configuration parameters were performed and the best configuration for the detection of coconut trees with more than 90% confidence factor was reported. For the purpose of evaluation, Microsoft COCO dataset evaluation metric namely mean average precision (mAP) was used. An overall 91% mean average precision for coconut trees detection was achieved. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: Published in IET Computer Vision, 09 April 2021

arXiv:2103.01223 [pdf]

doi 10.30534/ijatcse/2021/151012021

Offshore Software Maintenance Outsourcing Predicting Clients Proposal using Supervised Learning

Authors: Atif Ikram, Masita Abdul Jalil, Amir Bin Ngah, Ahmad Salman Khan, Tahir Iqbal

Abstract: In software engineering, software maintenance is the process of correction, updating, and improvement of software products after handed over to the customer. Through offshore software maintenance outsourcing clients can get advantages like reduce cost, save time, and improve quality. In most cases, the OSMO vendor generates considerable revenue. However, the selection of an appropriate proposal am… ▽ More In software engineering, software maintenance is the process of correction, updating, and improvement of software products after handed over to the customer. Through offshore software maintenance outsourcing clients can get advantages like reduce cost, save time, and improve quality. In most cases, the OSMO vendor generates considerable revenue. However, the selection of an appropriate proposal among multiple clients is one of the critical problems for OSMO vendors. The purpose of this paper is to suggest an effective machine learning technique that can be used by OSMO vendors to assess or predict the OSMO client proposal. The dataset is generated through a survey of OSMO vendors working in a develo** country. The results showed that supervised learning-based classifiers like Naïve Bayesian, SMO, Logistics apprehended 69.75, 81.81, and 87.27 percent testing accuracy respectively. This study concludes that supervised learning is the most suitable technique to predict the OSMO client's proposal. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: 10 pages, 2 figures

Journal ref: International Journal of Advanced Trends in Computer Science and Engineering, 2021

arXiv:2102.05151 [pdf, other]

Enhancing Audio Augmentation Methods with Consistency Learning

Authors: Turab Iqbal, Karim Helwani, Arvindh Krishnaswamy, Wenwu Wang

Abstract: Data augmentation is an inexpensive way to increase training data diversity and is commonly achieved via transformations of existing data. For tasks such as classification, there is a good case for learning representations of the data that are invariant to such transformations, yet this is not explicitly enforced by classification losses such as the cross-entropy loss. This paper investigates the… ▽ More Data augmentation is an inexpensive way to increase training data diversity and is commonly achieved via transformations of existing data. For tasks such as classification, there is a good case for learning representations of the data that are invariant to such transformations, yet this is not explicitly enforced by classification losses such as the cross-entropy loss. This paper investigates the use of training objectives that explicitly impose this consistency constraint and how it can impact downstream audio classification tasks. In the context of deep convolutional neural networks in the supervised setting, we show empirically that certain measures of consistency are not implicitly captured by the cross-entropy loss and that incorporating such measures into the loss function can improve the performance of audio classification systems. Put another way, we demonstrate how existing augmentation methods can further improve learning by enforcing consistency. △ Less

Submitted 19 April, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

Comments: Accepted to 46th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

arXiv:2012.11911 [pdf]

A Hybrid VDV Model for Automatic Diagnosis of Pneumothorax using Class-Imbalanced Chest X-rays Dataset

Authors: Tahira Iqbal, Arslan Shaukat, Usman Akram, Zartasha Mustansar, Yung-Cheol Byun

Abstract: Pneumothorax, a life threatening disease, needs to be diagnosed immediately and efficiently. The prognosis in this case is not only time consuming but also prone to human errors. So an automatic way of accurate diagnosis using chest X-rays is the utmost requirement. To-date, most of the available medical images datasets have class-imbalance issue. The main theme of this study is to solve this prob… ▽ More Pneumothorax, a life threatening disease, needs to be diagnosed immediately and efficiently. The prognosis in this case is not only time consuming but also prone to human errors. So an automatic way of accurate diagnosis using chest X-rays is the utmost requirement. To-date, most of the available medical images datasets have class-imbalance issue. The main theme of this study is to solve this problem along with proposing an automated way of detecting pneumothorax. We first compare the existing approaches to tackle the class-imbalance issue and find that data-level-ensemble (i.e. ensemble of subsets of dataset) outperforms other approaches. Thus, we propose a novel framework named as VDV model, which is a complex model-level-ensemble of data-level-ensembles and uses three convolutional neural networks (CNN) including VGG16, VGG-19 and DenseNet-121 as fixed feature extractors. In each data-level-ensemble features extracted from one of the pre-defined CNN are fed to support vector machine (SVM) classifier, and output from each data-level-ensemble is calculated using voting method. Once outputs from the three data-level-ensembles with three different CNN architectures are obtained, then, again, voting method is used to calculate the final prediction. Our proposed framework is tested on SIIM ACR Pneumothorax dataset and Random Sample of NIH Chest X-ray dataset (RS-NIH). For the first dataset, 85.17% Recall with 86.0% Area under the Receiver Operating Characteristic curve (AUC) is attained. For the second dataset, 90.9% Recall with 95.0% AUC is achieved with random split of data while 85.45% recall with 77.06% AUC is obtained with patient-wise split of data. For RS-NIH, the obtained results are higher as compared to previous results from literature However, for first dataset, direct comparison cannot be made, since this dataset has not been used earlier for Pneumothorax classification. △ Less

Submitted 22 December, 2020; originally announced December 2020.

Comments: 21 pages, 4 figures, 12 Tables, TO BE PUBLISHED

arXiv:2012.11214 [pdf]

doi 10.1109/ACCESS.2021.3122998

Automatic Diagnosis of Pneumothorax from Chest Radiographs: A Systematic Literature Review

Authors: Tahira Iqbal, Arslan Shaukat, Usman Akram, Zartasha Mustansar

Abstract: Among various medical imaging tools, chest radiographs are the most important and widely used diagnostic tool for detection of thoracic pathologies. Research is being carried out in order to propose robust automatic diagnostic tool for detection of pathologies from chest radiographs. Artificial Intelligence techniques especially deep learning methodologies have found to be giving promising results… ▽ More Among various medical imaging tools, chest radiographs are the most important and widely used diagnostic tool for detection of thoracic pathologies. Research is being carried out in order to propose robust automatic diagnostic tool for detection of pathologies from chest radiographs. Artificial Intelligence techniques especially deep learning methodologies have found to be giving promising results in automating the field of medicine. Lot of research has been done for automatic and fast detection of pneumothorax from chest radiographs while proposing several frameworks based on artificial intelligence and machine learning techniques. This study summarizes the existing literature for the automatic detection of pneumothorax from chest x-rays along with describing the available chest radiographs datasets. The comparative analysis of the literature is also provided in terms of goodness. Limitations of the existing literature along with the research gaps is also given for further investigation. The paper provides a brief overview of the present work for pneumothorax detection for hel** the researchers in selection of optimal approach for future research. △ Less

Submitted 16 April, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

Comments: 30 pages, 5 figures, 4 Tables to be published in journal

Journal ref: IEEE Access, vol. 9, pp. 145817-145839, 2021

arXiv:2010.13092 [pdf, other]

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

Authors: Yin Cao, Turab Iqbal, Qiuqiang Kong, Fengyan An, Wenwu Wang, Mark D. Plumbley

Abstract: Polyphonic sound event localization and detection (SELD), which jointly performs sound event detection (SED) and direction-of-arrival (DoA) estimation, detects the type and occurrence time of sound events as well as their corresponding DoA angles simultaneously. We study the SELD task from a multi-task learning perspective. Two open problems are addressed in this paper. Firstly, to detect overlapp… ▽ More Polyphonic sound event localization and detection (SELD), which jointly performs sound event detection (SED) and direction-of-arrival (DoA) estimation, detects the type and occurrence time of sound events as well as their corresponding DoA angles simultaneously. We study the SELD task from a multi-task learning perspective. Two open problems are addressed in this paper. Firstly, to detect overlap** sound events of the same type but with different DoAs, we propose to use a trackwise output format and solve the accompanying track permutation problem with permutation-invariant training. Multi-head self-attention is further used to separate tracks. Secondly, a previous finding is that, by using hard parameter-sharing, SELD suffers from a performance loss compared with learning the subtasks separately. This is solved by a soft parameter-sharing scheme. We term the proposed method as Event Independent Network V2 (EINV2), which is an improved version of our previously-proposed method and an end-to-end network for SELD. We show that our proposed EINV2 for joint SED and DoA estimation outperforms previous methods by a large margin, and has comparable performance to state-of-the-art ensemble models. △ Less

Submitted 10 February, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

Comments: 5 pages, 2021 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2010.00140 [pdf, other]

Event-Independent Network for Polyphonic Sound Event Localization and Detection

Authors: Yin Cao, Turab Iqbal, Qiuqiang Kong, Yue Zhong, Wenwu Wang, Mark D. Plumbley

Abstract: Polyphonic sound event localization and detection is not only detecting what sound events are happening but localizing corresponding sound sources. This series of tasks was first introduced in DCASE 2019 Task 3. In 2020, the sound event localization and detection task introduces additional challenges in moving sound sources and overlap**-event cases, which include two events of the same type wit… ▽ More Polyphonic sound event localization and detection is not only detecting what sound events are happening but localizing corresponding sound sources. This series of tasks was first introduced in DCASE 2019 Task 3. In 2020, the sound event localization and detection task introduces additional challenges in moving sound sources and overlap**-event cases, which include two events of the same type with two different direction-of-arrival (DoA) angles. In this paper, a novel event-independent network for polyphonic sound event localization and detection is proposed. Unlike the two-stage method we proposed in DCASE 2019 Task 3, this new network is fully end-to-end. Inputs to the network are first-order Ambisonics (FOA) time-domain signals, which are then fed into a 1-D convolutional layer to extract acoustic features. The network is then split into two parallel branches. The first branch is for sound event detection (SED), and the second branch is for DoA estimation. There are three types of predictions from the network, SED predictions, DoA predictions, and event activity detection (EAD) predictions that are used to combine the SED and DoA features for on-set and off-set estimation. All of these predictions have the format of two tracks indicating that there are at most two overlap** events. Within each track, there could be at most one event happening. This architecture introduces a problem of track permutation. To address this problem, a frame-level permutation invariant training method is used. Experimental results show that the proposed method can detect polyphonic sound events and their corresponding DoAs. Its performance on the Task 3 dataset is greatly increased as compared with that of the baseline method. △ Less

Submitted 30 September, 2020; originally announced October 2020.

Comments: conference

arXiv:2009.05752 [pdf]

doi 10.1109/ACCESS.2020.3017915

Segmentation of Lungs in Chest X-Ray Image Using Generative Adversarial Networks

Authors: Faizan Munawar, Shoaib Azmat, Talha Iqbal, Christer Grönlund, Hazrat Ali

Abstract: Chest X-ray (CXR) is a low-cost medical imaging technique. It is a common procedure for the identification of many respiratory diseases compared to MRI, CT, and PET scans. This paper presents the use of generative adversarial networks (GAN) to perform the task of lung segmentation on a given CXR. GANs are popular to generate realistic data by learning the map** from one domain to another. In our… ▽ More Chest X-ray (CXR) is a low-cost medical imaging technique. It is a common procedure for the identification of many respiratory diseases compared to MRI, CT, and PET scans. This paper presents the use of generative adversarial networks (GAN) to perform the task of lung segmentation on a given CXR. GANs are popular to generate realistic data by learning the map** from one domain to another. In our work, the generator of the GAN is trained to generate a segmented mask of a given input CXR. The discriminator distinguishes between a ground truth and the generated mask, and updates the generator through the adversarial loss measure. The objective is to generate masks for the input CXR, which are as realistic as possible compared to the ground truth masks. The model is trained and evaluated using four different discriminators referred to as D1, D2, D3, and D4, respectively. Experimental results on three different CXR datasets reveal that the proposed model is able to achieve a dice-score of 0.9740, and IOU score of 0.943, which are better than other reported state-of-the art results. △ Less

Submitted 12 September, 2020; originally announced September 2020.

Comments: Volume 8, August 2020, Pages 153535 - 153545

Journal ref: in IEEE Access, vol. 8, pp. 153535-153545, 2020

arXiv:2008.01148 [pdf, other]

HAMLET: A Hierarchical Multimodal Attention-based Human Activity Recognition Algorithm

Authors: Md Mofijul Islam, Tariq Iqbal

Abstract: To fluently collaborate with people, robots need the ability to recognize human activities accurately. Although modern robots are equipped with various sensors, robust human activity recognition (HAR) still remains a challenging task for robots due to difficulties related to multimodal data fusion. To address these challenges, in this work, we introduce a deep neural network-based multimodal HAR a… ▽ More To fluently collaborate with people, robots need the ability to recognize human activities accurately. Although modern robots are equipped with various sensors, robust human activity recognition (HAR) still remains a challenging task for robots due to difficulties related to multimodal data fusion. To address these challenges, in this work, we introduce a deep neural network-based multimodal HAR algorithm, HAMLET. HAMLET incorporates a hierarchical architecture, where the lower layer encodes spatio-temporal features from unimodal data by adopting a multi-head self-attention mechanism. We develop a novel multimodal attention mechanism for disentangling and fusing the salient unimodal features to compute the multimodal features in the upper layer. Finally, multimodal features are used in a fully connect neural-network to recognize human activities. We evaluated our algorithm by comparing its performance to several state-of-the-art activity recognition algorithms on three human activity datasets. The results suggest that HAMLET outperformed all other evaluated baselines across all datasets and metrics tested, with the highest top-1 accuracy of 95.12% and 97.45% on the UTD-MHAD [1] and the UT-Kinect [2] datasets respectively, and F1-score of 81.52% on the UCSD-MIT [3] dataset. We further visualize the unimodal and multimodal attention maps, which provide us with a tool to interpret the impact of attention mechanisms concerning HAR. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: To be published in the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020

Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020

arXiv:2002.04683 [pdf, other]

Learning with Out-of-Distribution Data for Audio Classification

Authors: Turab Iqbal, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

Abstract: In supervised machine learning, the assumption that training data is labelled correctly is not always satisfied. In this paper, we investigate an instance of labelling error for classification tasks in which the dataset is corrupted with out-of-distribution (OOD) instances: data that does not belong to any of the target classes, but is labelled as such. We show that detecting and relabelling certa… ▽ More In supervised machine learning, the assumption that training data is labelled correctly is not always satisfied. In this paper, we investigate an instance of labelling error for classification tasks in which the dataset is corrupted with out-of-distribution (OOD) instances: data that does not belong to any of the target classes, but is labelled as such. We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning. The proposed method uses an auxiliary classifier, trained on data that is known to be in-distribution, for detection and relabelling. The amount of data required for this is shown to be small. Experiments are carried out on the FSDnoisy18k audio dataset, where OOD instances are very prevalent. The proposed method is shown to improve the performance of convolutional neural networks by a significant margin. Comparisons with other noise-robust techniques are similarly encouraging. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Comments: Paper accepted for 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

arXiv:1912.10211 [pdf, other]

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

Authors: Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, Mark D. Plumbley

Abstract: Audio pattern recognition is an important research topic in the machine learning area, and includes several tasks such as audio tagging, acoustic scene classification, music classification, speech emotion classification and sound event detection. Recently, neural networks have been applied to tackle audio pattern recognition problems. However, previous systems are built on specific datasets with l… ▽ More Audio pattern recognition is an important research topic in the machine learning area, and includes several tasks such as audio tagging, acoustic scene classification, music classification, speech emotion classification and sound event detection. Recently, neural networks have been applied to tackle audio pattern recognition problems. However, previous systems are built on specific datasets with limited durations. Recently, in computer vision and natural language processing, systems pretrained on large-scale datasets have generalized well to several tasks. However, there is limited research on pretraining systems on large-scale datasets for audio pattern recognition. In this paper, we propose pretrained audio neural networks (PANNs) trained on the large-scale AudioSet dataset. These PANNs are transferred to other audio related tasks. We investigate the performance and computational complexity of PANNs modeled by a variety of convolutional neural networks. We propose an architecture called Wavegram-Logmel-CNN using both log-mel spectrogram and waveform as input feature. Our best PANN system achieves a state-of-the-art mean average precision (mAP) of 0.439 on AudioSet tagging, outperforming the best previous system of 0.392. We transfer PANNs to six audio pattern recognition tasks, and demonstrate state-of-the-art performance in several of those tasks. We have released the source code and pretrained models of PANNs: https://github.com/qiuqiangkong/audioset_tagging_cnn. △ Less

Submitted 23 August, 2020; v1 submitted 21 December, 2019; originally announced December 2019.

Comments: 14 pages

arXiv:1912.07943 [pdf]

doi 10.1007/s42452-019-1914-1

Pioneer dataset and automatic recognition of Urdu handwritten characters using a deep autoencoder and convolutional neural network

Authors: Hazrat Ali, Ahsan Ullah, Talha Iqbal, Shahid Khattak

Abstract: Automatic recognition of Urdu handwritten digits and characters, is a challenging task. It has applications in postal address reading, bank's cheque processing, and digitization and preservation of handwritten manuscripts from old ages. While there exists a significant work for automatic recognition of handwritten English characters and other major languages of the world, the work done for Urdu la… ▽ More Automatic recognition of Urdu handwritten digits and characters, is a challenging task. It has applications in postal address reading, bank's cheque processing, and digitization and preservation of handwritten manuscripts from old ages. While there exists a significant work for automatic recognition of handwritten English characters and other major languages of the world, the work done for Urdu lan-guage is extremely insufficient. This paper has two goals. Firstly, we introduce a pioneer dataset for handwritten digits and characters of Urdu, containing samples from more than 900 individuals. Secondly, we report results for automatic recog-nition of handwritten digits and characters as achieved by using deep auto-encoder network and convolutional neural network. More specifically, we use a two-layer and a three-layer deep autoencoder network and convolutional neural network and evaluate the two frameworks in terms of recognition accuracy. The proposed framework of deep autoencoder can successfully recognize digits and characters with an accuracy of 97% for digits only, 81% for characters only and 82% for both digits and characters simultaneously. In comparison, the framework of convolutional neural network has accuracy of 96.7% for digits only, 86.5% for characters only and 82.7% for both digits and characters simultaneously. These frameworks can serve as baselines for future research on Urdu handwritten text. △ Less

Submitted 17 December, 2019; originally announced December 2019.

Comments: SN Applied Sciences, December 2019

arXiv:1909.11302 [pdf, other]

Generating Requirements Out of Thin Air: Towards Automated Feature Identification for New Apps

Authors: Tahira Iqbal, Norbert Seyff, Daniel Mendez Fernández

Abstract: App store mining has proven to be a promising technique for requirements elicitation as companies can gain valuable knowledge to maintain and evolve existing apps. However, despite first advancements in using mining techniques for requirements elicitation, little is yet known how to distill requirements for new apps based on existing (similar) solutions and how exactly practitioners would benefit… ▽ More App store mining has proven to be a promising technique for requirements elicitation as companies can gain valuable knowledge to maintain and evolve existing apps. However, despite first advancements in using mining techniques for requirements elicitation, little is yet known how to distill requirements for new apps based on existing (similar) solutions and how exactly practitioners would benefit from such a technique. In the proposed work, we focus on exploring information (e.g. app store data) provided by the crowd about existing solutions to identify key features of applications in a particular domain. We argue that these discovered features and other related influential aspects (e.g. ratings) can help practitioners(e.g. software developer) to identify potential key features for new applications. To support this argument, we first conducted an interview study with practitioners to understand the extent to which such an approach would find champions in practice. In this paper, we present the first results of our ongoing research in the context of a larger road-map. Our interview study confirms that practitioners see the need for our envisioned approach. Furthermore, we present an early conceptual solution to discuss the feasibility of our approach. However, this manuscript is also intended to foster discussions on the extent to which machine learning can and should be applied to elicit automated requirements on crowd generated data on different forums and to identify further collaborations in this endeavor. △ Less

Submitted 25 September, 2019; originally announced September 2019.

Comments: Preprint of manuscript accepted at the 3rd International Workshop on Crowd-Based Requirements Engineering

arXiv:1907.05514 [pdf, other]

doi 10.1109/ACCESS.2019.2942346

Hybrid Residual Attention Network for Single Image Super Resolution

Authors: Abdul Muqeet, Md Tauhid Bin Iqbal, Sung-Ho Bae

Abstract: The extraction and proper utilization of convolution neural network (CNN) features have a significant impact on the performance of image super-resolution (SR). Although CNN features contain both the spatial and channel information, current deep techniques on SR often suffer to maximize performance due to using either the spatial or channel information. Moreover, they integrate such information wit… ▽ More The extraction and proper utilization of convolution neural network (CNN) features have a significant impact on the performance of image super-resolution (SR). Although CNN features contain both the spatial and channel information, current deep techniques on SR often suffer to maximize performance due to using either the spatial or channel information. Moreover, they integrate such information within a deep or wide network rather than exploiting all the available features, eventually resulting in high computational complexity. To address these issues, we present a binarized feature fusion (BFF) structure that utilizes the extracted features from residual groups (RG) in an effective way. Each residual group (RG) consists of multiple hybrid residual attention blocks (HRAB) that effectively integrates the multiscale feature extraction module and channel attention mechanism in a single block. Furthermore, we use dilated convolutions with different dilation factors to extract multiscale features. We also propose to adopt global, short and long skip connections and residual groups (RG) structure to ease the flow of information without losing important features details. In the paper, we call this overall network architecture as hybrid residual attention network (HRAN). In the experiment, we have observed the efficacy of our method against the state-of-the-art methods for both the quantitative and qualitative comparisons. △ Less

Submitted 11 July, 2019; originally announced July 2019.

Comments: 12 pages, 5 figures

arXiv:1907.01210 [pdf, ps, other]

Paired domination and 2- distance Paired domination of the flower graph $f_{n\times m}$

Authors: Tanveer Iqbal, Syed Ahtsham Ul Haq Bokhary

Abstract: Let $G = (V, E)$ be a graph without an isolated vertex. A set $D\subseteq V(G)$ is a $k$-distance paired domination set of $G$ if $D$ is a $k$-distance dominating set of $G$ and the induced subgraph $\langle D \rangle$ has a perfect matching. The minimum cardinality of a $k$-distance paired dominating set for graph $G$ is the $k$-distance paired domination number, denoted by $γ_{p} ^{k}(G)$. In th… ▽ More Let $G = (V, E)$ be a graph without an isolated vertex. A set $D\subseteq V(G)$ is a $k$-distance paired domination set of $G$ if $D$ is a $k$-distance dominating set of $G$ and the induced subgraph $\langle D \rangle$ has a perfect matching. The minimum cardinality of a $k$-distance paired dominating set for graph $G$ is the $k$-distance paired domination number, denoted by $γ_{p} ^{k}(G)$. In this paper, the $k$-distance paired domination of the flower graph $f_{n\times m}$ is discussed. For $m,n\geq 3$, the exact values for paired domination number and $2$-distance paired domination number of flower graph $f_{n\times m}$ are determined △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: 17 pages, 2 figures

MSC Class: 05C15; 05C65

arXiv:1905.00268 [pdf, ps, other]

doi 10.33682/4jhy-bj81

Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

Authors: Yin Cao, Qiuqiang Kong, Turab Iqbal, Fengyan An, Wenwu Wang, Mark D. Plumbley

Abstract: Sound event detection (SED) and localization refer to recognizing sound events and estimating their spatial and temporal locations. Using neural networks has become the prevailing method for SED. In the area of sound localization, which is usually performed by estimating the direction of arrival (DOA), learning-based methods have recently been developed. In this paper, it is experimentally shown t… ▽ More Sound event detection (SED) and localization refer to recognizing sound events and estimating their spatial and temporal locations. Using neural networks has become the prevailing method for SED. In the area of sound localization, which is usually performed by estimating the direction of arrival (DOA), learning-based methods have recently been developed. In this paper, it is experimentally shown that the trained SED model is able to contribute to the direction of arrival estimation (DOAE). However, joint training of SED and DOAE degrades the performance of both. Based on these results, a two-stage polyphonic sound event detection and localization method is proposed. The method learns SED first, after which the learned feature layers are transferred for DOAE. It then uses the SED ground truth as a mask to train DOAE. The proposed method is evaluated on the DCASE 2019 Task 3 dataset, which contains different overlap** sound events in different environments. Experimental results show that the proposed method is able to improve the performance of both SED and DOAE, and also performs significantly better than the baseline method. △ Less

Submitted 5 November, 2019; v1 submitted 1 May, 2019; originally announced May 2019.

Comments: 6 pages, 2 figures, conference

arXiv:1904.05635

Cross-task learning for audio tagging, sound event detection spatial localization: DCASE 2019 baseline systems

Authors: Qiuqiang Kong, Yin Cao, Turab Iqbal, Yong Xu, Wenwu Wang, Mark D. Plumbley

Abstract: The Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge focuses on audio tagging, sound event detection and spatial localisation. DCASE 2019 consists of five tasks: 1) acoustic scene classification, 2) audio tagging with noisy labels and minimal supervision, 3) sound event localisation and detection, 4) sound event detection in domestic environments, and 5) urban soun… ▽ More The Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge focuses on audio tagging, sound event detection and spatial localisation. DCASE 2019 consists of five tasks: 1) acoustic scene classification, 2) audio tagging with noisy labels and minimal supervision, 3) sound event localisation and detection, 4) sound event detection in domestic environments, and 5) urban sound tagging. In this paper, we propose generic cross-task baseline systems based on convolutional neural networks (CNNs). The motivation is to investigate the performance of a variety of models across several tasks without exploiting the specific characteristics of the tasks. We looked at CNNs with 5, 9, and 13 layers, and found that the optimal architecture is task-dependent. For the systems we considered, we found that the 9-layer CNN with average pooling is a good model for a majority of the DCASE 2019 tasks. △ Less

Submitted 14 April, 2019; v1 submitted 11 April, 2019; originally announced April 2019.

Comments: We want to replace but create this submission by mistake. See arXiv:1904.03476 instead

arXiv:1904.03476 [pdf, other]

Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems

Authors: Qiuqiang Kong, Yin Cao, Turab Iqbal, Yong Xu, Wenwu Wang, Mark D. Plumbley

Abstract: The Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge focuses on audio tagging, sound event detection and spatial localisation. DCASE 2019 consists of five tasks: 1) acoustic scene classification, 2) audio tagging with noisy labels and minimal supervision, 3) sound event localisation and detection, 4) sound event detection in domestic environments, and 5) urban soun… ▽ More The Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge focuses on audio tagging, sound event detection and spatial localisation. DCASE 2019 consists of five tasks: 1) acoustic scene classification, 2) audio tagging with noisy labels and minimal supervision, 3) sound event localisation and detection, 4) sound event detection in domestic environments, and 5) urban sound tagging. In this paper, we propose generic cross-task baseline systems based on convolutional neural networks (CNNs). The motivation is to investigate the performance of a variety of models across several audio recognition tasks without exploiting the specific characteristics of the tasks. We looked at CNNs with 5, 9, and 13 layers, and found that the optimal architecture is task-dependent. For the systems we considered, we found that the 9-layer CNN with average pooling after convolutional layers is a good model for a majority of the DCASE 2019 tasks. △ Less

Submitted 9 June, 2019; v1 submitted 6 April, 2019; originally announced April 2019.

Comments: 5 pages

arXiv:1904.01157 [pdf]

Sizing and Dynamic modeling of a Power System for the MUN Explorer Autonomous Underwater Vehicle using a Fuel Cell and Batteries

Authors: Mohamed M. Albarghot, M. Tariq Iqbal, Kevin Pope, Luc Rolland

Abstract: The combination of a fuel cell and batteries has promising potential for powering autonomous vehicles. The MUN Explorer Autonomous Underwater Vehicle (AUV) is built to do map**-type missions of seabeds as well as survey missions. These missions require a great deal of power to reach underwater depths (i.e. 3000 meters). The MUN Explorer uses 11 rechargeable Lithium-ion (Li-ion) batteries as the… ▽ More The combination of a fuel cell and batteries has promising potential for powering autonomous vehicles. The MUN Explorer Autonomous Underwater Vehicle (AUV) is built to do map**-type missions of seabeds as well as survey missions. These missions require a great deal of power to reach underwater depths (i.e. 3000 meters). The MUN Explorer uses 11 rechargeable Lithium-ion (Li-ion) batteries as the main power source with a total capacity of 14.6kWh to 17.952kWh, and the vehicle can run for 10 hours. The draw-backs of operating the existing power system of the MUN Explorer, which was done by the researcher at the Holyrood management facility, include mobilization costs, logistics and transport, and facility access, all of which should be taken into consideration. Recharging the batteries for at least 8 hours is also very challenging and time consuming. To overcome these challenges and run the MUN Explorer for a long time, it is essential to integrate a fuel cell into an existing power system (i.e. battery bank). The integration of the fuel cell not only will increase the system power, but it will also reduce the number of batteries needed as suggested by HOMER software. In this paper, an integrated fuel cell is designed to be added into the MUN Explorer AUV along with a battery bank system to increase its power system. The system sizing is performed using HOMER software. The results from HOMER software show that a 1kW fuel cell and 8 Li-ion batteries can increase the power system capacity to 68 kWh. The dynamic model is then built in MATLAB/Simulink environment to provide a better understanding of the system behavior.The 1kW fuel cell is connected to a DC/DC Boost Converter to increase the output voltage from 24V to 48V as required by the battery and DC motor. △ Less

Submitted 1 April, 2019; originally announced April 2019.

Comments: A hydrogen gas tank is also included in the model. The advantage of installing the hydrogen and oxygen tanks beside the batteries is that it helps the buoyancy force in underwater depths. The design of this system is based on MUN Explorer data sheets and system dynamic simulation results

arXiv:1903.00765 [pdf, other]

doi 10.1109/TASLP.2019.2930913

Weakly Labelled AudioSet Tagging with Attention Neural Networks

Authors: Qiuqiang Kong, Changsong Yu, Turab Iqbal, Yong Xu, Wenwu Wang, Mark D. Plumbley

Abstract: Audio tagging is the task of predicting the presence or absence of sound classes within an audio clip. Previous work in audio tagging focused on relatively small datasets limited to recognising a small number of sound classes. We investigate audio tagging on AudioSet, which is a dataset consisting of over 2 million audio clips and 527 classes. AudioSet is weakly labelled, in that only the presence… ▽ More Audio tagging is the task of predicting the presence or absence of sound classes within an audio clip. Previous work in audio tagging focused on relatively small datasets limited to recognising a small number of sound classes. We investigate audio tagging on AudioSet, which is a dataset consisting of over 2 million audio clips and 527 classes. AudioSet is weakly labelled, in that only the presence or absence of sound classes is known for each clip, while the onset and offset times are unknown. To address the weakly-labelled audio tagging problem, we propose attention neural networks as a way to attend the most salient parts of an audio clip. We bridge the connection between attention neural networks and multiple instance learning (MIL) methods, and propose decision-level and feature-level attention neural networks for audio tagging. We investigate attention neural networks modeled by different functions, depths and widths. Experiments on AudioSet show that the feature-level attention neural network achieves a state-of-the-art mean average precision (mAP) of 0.369, outperforming the best multiple instance learning (MIL) method of 0.317 and Google's deep neural network baseline of 0.314. In addition, we discover that the audio tagging performance on AudioSet embedding features has a weak correlation with the number of training samples and the quality of labels of each sound class. △ Less

Submitted 10 December, 2019; v1 submitted 2 March, 2019; originally announced March 2019.

Comments: 13 pages

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 11, pp. 1791-1802, Nov. 2019

arXiv:1810.00551 [pdf, other]

doi 10.1007/s10916-018-1072-9

Generative Adversarial Network for Medical Images (MI-GAN)

Authors: Talha Iqbal, Hazrat Ali

Abstract: Deep learning algorithms produces state-of-the-art results for different machine learning and computer vision tasks. To perform well on a given task, these algorithms require large dataset for training. However, deep learning algorithms lack generalization and suffer from over-fitting whenever trained on small dataset, especially when one is dealing with medical images. For supervised image analys… ▽ More Deep learning algorithms produces state-of-the-art results for different machine learning and computer vision tasks. To perform well on a given task, these algorithms require large dataset for training. However, deep learning algorithms lack generalization and suffer from over-fitting whenever trained on small dataset, especially when one is dealing with medical images. For supervised image analysis in medical imaging, having image data along with their corresponding annotated ground-truths is costly as well as time consuming since annotations of the data is done by medical experts manually. In this paper, we propose a new Generative Adversarial Network for Medical Imaging (MI-GAN). The MI-GAN generates synthetic medical images and their segmented masks, which can then be used for the application of supervised analysis of medical images. Particularly, we present MI-GAN for synthesis of retinal images. The proposed method generates precise segmented images better than the existing techniques. The proposed model achieves a dice coefficient of 0.837 on STARE dataset and 0.832 on DRIVE dataset which is state-of-the-art performance on both the datasets. △ Less

Submitted 1 October, 2018; originally announced October 2018.

Comments: Journal of Medical Systems

Journal ref: Med Syst (2018) 42: 231

arXiv:1808.02358 [pdf]

Optimal voltage control using singular value decomposition of fast decoupled load flow jacobian

Authors: Talha Iqbal, Ali Dehghan Banadaki, Ali Feliachi

Abstract: The problem of regulating voltages within the required limits is complicated by the fact that power system supplies power to a vast number of loads and is fed from many generating units. As loads vary, reactive power requirements of the transmission system vary. Moreover, voltage magnitude is relatively less sensitive to active power compared to reactive power due to high X/R ratio of transmission… ▽ More The problem of regulating voltages within the required limits is complicated by the fact that power system supplies power to a vast number of loads and is fed from many generating units. As loads vary, reactive power requirements of the transmission system vary. Moreover, voltage magnitude is relatively less sensitive to active power compared to reactive power due to high X/R ratio of transmission lines. Therefore, separating voltage control from active power is not only justified but also the common and practical way in power transmission systems. Considering these facts, the fast decoupled power flow jacobian can be used to control voltage magnitudes by reactive power compensation. In this paper, an optimal voltage control is presented to obtain new voltage set-points for PV buses by maximizing the effect of input change on output change using the Fast Decoupled Load Flow (FDLF) jacobian matrix. The proposed algorithm was tested on three IEEE systems: 9 bus, 14 bus and 30 bus systems. △ Less

Submitted 6 August, 2018; originally announced August 2018.

arXiv:1808.00773 [pdf, other]

DCASE 2018 Challenge Surrey Cross-Task convolutional neural network baseline

Authors: Qiuqiang Kong, Turab Iqbal, Yong Xu, Wenwu Wang, Mark D. Plumbley

Abstract: The Detection and Classification of Acoustic Scenes and Events (DCASE) consists of five audio classification and sound event detection tasks: 1) Acoustic scene classification, 2) General-purpose audio tagging of Freesound, 3) Bird audio detection, 4) Weakly-labeled semi-supervised sound event detection and 5) Multi-channel audio classification. In this paper, we create a cross-task baseline system… ▽ More The Detection and Classification of Acoustic Scenes and Events (DCASE) consists of five audio classification and sound event detection tasks: 1) Acoustic scene classification, 2) General-purpose audio tagging of Freesound, 3) Bird audio detection, 4) Weakly-labeled semi-supervised sound event detection and 5) Multi-channel audio classification. In this paper, we create a cross-task baseline system for all five tasks based on a convlutional neural network (CNN): a "CNN Baseline" system. We implemented CNNs with 4 layers and 8 layers originating from AlexNet and VGG from computer vision. We investigated how the performance varies from task to task with the same configuration of neural networks. Experiments show that deeper CNN with 8 layers performs better than CNN with 4 layers on all tasks except Task 1. Using CNN with 8 layers, we achieve an accuracy of 0.680 on Task 1, an accuracy of 0.895 and a mean average precision (MAP) of 0.928 on Task 2, an accuracy of 0.751 and an area under the curve (AUC) of 0.854 on Task 3, a sound event detection F1 score of 20.8% on Task 4, and an F1 score of 87.75% on Task 5. We released the Python source code of the baseline systems under the MIT license for further research. △ Less

Submitted 29 September, 2018; v1 submitted 2 August, 2018; originally announced August 2018.

Comments: Accepted by DCASE 2018 Workshop. 4 pages. Source code available

Journal ref: Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2018, pp. 217-221

arXiv:1806.04699 [pdf, other]

Capsule Routing for Sound Event Detection

Authors: Turab Iqbal, Yong Xu, Qiuqiang Kong, Wenwu Wang

Abstract: The detection of acoustic scenes is a challenging problem in which environmental sound events must be detected from a given audio signal. This includes classifying the events as well as estimating their onset and offset times. We approach this problem with a neural network architecture that uses the recently-proposed capsule routing mechanism. A capsule is a group of activation units representing… ▽ More The detection of acoustic scenes is a challenging problem in which environmental sound events must be detected from a given audio signal. This includes classifying the events as well as estimating their onset and offset times. We approach this problem with a neural network architecture that uses the recently-proposed capsule routing mechanism. A capsule is a group of activation units representing a set of properties for an entity of interest, and the purpose of routing is to identify part-whole relationships between capsules. That is, a capsule in one layer is assumed to belong to a capsule in the layer above in terms of the entity being represented. Using capsule routing, we wish to train a network that can learn global coherence implicitly, thereby improving generalization performance. Our proposed method is evaluated on Task 4 of the DCASE 2017 challenge. Results show that classification performance is state-of-the-art, achieving an F-score of 58.6%. In addition, overfitting is reduced considerably compared to other architectures. △ Less

Submitted 12 June, 2018; originally announced June 2018.

Comments: Paper accepted for 26th European Signal Processing Conference (EUSIPCO 2018)

arXiv:1802.00380 [pdf, other]

Approximate Message Passing for Underdetermined Audio Source Separation

Authors: Turab Iqbal, Wenwu Wang

Abstract: Approximate message passing (AMP) algorithms have shown great promise in sparse signal reconstruction due to their low computational requirements and fast convergence to an exact solution. Moreover, they provide a probabilistic framework that is often more intuitive than alternatives such as convex optimisation. In this paper, AMP is used for audio source separation from underdetermined instantane… ▽ More Approximate message passing (AMP) algorithms have shown great promise in sparse signal reconstruction due to their low computational requirements and fast convergence to an exact solution. Moreover, they provide a probabilistic framework that is often more intuitive than alternatives such as convex optimisation. In this paper, AMP is used for audio source separation from underdetermined instantaneous mixtures. In the time-frequency domain, it is typical to assume a priori that the sources are sparse, so we solve the corresponding sparse linear inverse problem using AMP. We present a block-based approach that uses AMP to process multiple time-frequency points simultaneously. Two algorithms known as AMP and vector AMP (VAMP) are evaluated in particular. Results show that they are promising in terms of artefact suppression. △ Less

Submitted 1 February, 2018; originally announced February 2018.

Comments: Paper accepted for 3rd International Conference on Intelligent Signal Processing (ISP 2017)

arXiv:1605.01459 [pdf, other]

doi 10.1109/TRO.2016.2570240

Movement Coordination in Human-Robot Teams: A Dynamical Systems Approach

Authors: Tariq Iqbal, Samantha Rack, Laurel D. Riek

Abstract: In order to be effective teammates, robots need to be able to understand high-level human behavior to recognize, anticipate, and adapt to human motion. We have designed a new approach to enable robots to perceive human group motion in real-time, anticipate future actions, and synthesize their own motion accordingly. We explore this within the context of joint action, where humans and robots move t… ▽ More In order to be effective teammates, robots need to be able to understand high-level human behavior to recognize, anticipate, and adapt to human motion. We have designed a new approach to enable robots to perceive human group motion in real-time, anticipate future actions, and synthesize their own motion accordingly. We explore this within the context of joint action, where humans and robots move together synchronously. In this paper, we present an anticipation method which takes high-level group behavior into account. We validate the method within a human-robot interaction scenario, where an autonomous mobile robot observes a team of human dancers, and then successfully and contingently coordinates its movements to "join the dance". We compared the results of our anticipation method to move the robot with another method which did not rely on high-level group behavior, and found our method performed better both in terms of more closely synchronizing the robot's motion to the team, and also exhibiting more contingent and fluent motion. These findings suggest that the robot performs better when it has an understanding of high-level group behavior than when it does not. This work will help enable others in the robotics community to build more fluent and adaptable robots in the future. △ Less

Submitted 4 May, 2016; originally announced May 2016.

Comments: 11 pages, 7 figures, IEEE Transactions on Robotics 2016 preprint

ACM Class: I.2.9; I.2.11; H.5.3; I.5.5; J.5

arXiv:1601.00754 [pdf, ps, other]

doi 10.1088/1674-1137/41/4/043104

Some New Symmetric Relations and the Prediction of Left and Right Handed Neutrino Masses using Koide's Relation

Authors: Yong-Chang Huang, Syeda Tehreem Iqbal, Zhen Lei, Wen-Yu Wang

Abstract: Masses of the three generations of charged leptons are known to completely satisfy the Koide's mass relation. But the question remains if such a relation exists for neutrinos? In this paper, by considering SeeSaw mechanism as the mechanism generating tiny neutrino masses, we show how neutrinos satisfy the Koide's mass relation, on the basis of which we systematically give exact values of not only… ▽ More Masses of the three generations of charged leptons are known to completely satisfy the Koide's mass relation. But the question remains if such a relation exists for neutrinos? In this paper, by considering SeeSaw mechanism as the mechanism generating tiny neutrino masses, we show how neutrinos satisfy the Koide's mass relation, on the basis of which we systematically give exact values of not only left but also right handed neutrino masses. △ Less

Submitted 5 April, 2016; v1 submitted 5 January, 2016; originally announced January 2016.

Journal ref: Chinese Physics C, (2016)

arXiv:gr-qc/0401079 [pdf, ps, other]

Non-Static Spherically Symmetric Perfect Fluid Solutions

Authors: M. Sharif, T. Iqbal

Abstract: We investigate solutions of Einstein field equations for the non-static spherically symmetric perfect fluid case using different equations of state. The properties of an exact spherically symmetric perfect fluid solutions are obtained which contain shear. We obtain three different solutions out of these one turns out to be an incoherent dust solution and the other two are stiff matter solutions. We investigate solutions of Einstein field equations for the non-static spherically symmetric perfect fluid case using different equations of state. The properties of an exact spherically symmetric perfect fluid solutions are obtained which contain shear. We obtain three different solutions out of these one turns out to be an incoherent dust solution and the other two are stiff matter solutions. △ Less

Submitted 19 January, 2004; originally announced January 2004.

Journal ref: Chin.J.Phys. 40 (2002) 242-250

Showing 1–41 of 41 results for author: Iqbal, T