-
Exploring Spatial Generalized Functional Linear Models: A Comparative Simulation Study and Analysis of COVID-19
Authors:
Sooran Kim,
Mark S. Kaiser,
Xiongtao Dai
Abstract:
Implementation of spatial generalized linear models with a functional covariate can be accomplished through the use of a truncated basis expansion of the covariate process. In practice, one must select a truncation level for use. We compare five criteria for the selection of an appropriate truncation level, including AIC and BIC based on a log composite likelihood, a fraction of variance explained…
▽ More
Implementation of spatial generalized linear models with a functional covariate can be accomplished through the use of a truncated basis expansion of the covariate process. In practice, one must select a truncation level for use. We compare five criteria for the selection of an appropriate truncation level, including AIC and BIC based on a log composite likelihood, a fraction of variance explained criterion, a fitted mean squared error, and a prediction error with one standard error rule. Based on the use of extensive simulation studies, we propose that BIC constitutes a reasonable default criterion for the selection of the truncation level for use in a spatial functional generalized linear model. In addition, we demonstrate that the spatial model with a functional covariate outperforms other models when the data contain spatial structure and response variables are in fact influenced by a functional covariate process. We apply the spatial functional generalized linear model to a problem in which the objective is to relate COVID-19 vaccination rates in counties of states in the Midwestern United States to the number of new cases from previous weeks in those same geographic regions.
△ Less
Submitted 26 March, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Generalized linear models with spatial dependence and a functional covariate
Authors:
Sooran Kim,
Mark S. Kaiser,
Xiongtao Dai
Abstract:
We extend generalized functional linear models under independence to a situation in which a functional covariate is related to a scalar response variable that exhibits spatial dependence. For estimation, we apply basis expansion and truncation for dimension reduction of the covariate process followed by a composite likelihood estimating equation to handle the spatial dependency. We develop asympto…
▽ More
We extend generalized functional linear models under independence to a situation in which a functional covariate is related to a scalar response variable that exhibits spatial dependence. For estimation, we apply basis expansion and truncation for dimension reduction of the covariate process followed by a composite likelihood estimating equation to handle the spatial dependency. We develop asymptotic results for the proposed model under a repeating lattice asymptotic context, allowing us to construct a confidence interval for the spatial dependence parameter and a confidence band for the parameter function. A binary conditionals model is presented as a concrete illustration and is used in simulation studies to verify the applicability of the asymptotic inferential results.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Properties of Test Statistics for Nonparametric Cointegrating Regression Functions Based on Subsamples
Authors:
Sepideh Mosaferi,
Mark S. Kaiser,
Daniel J. Nordman
Abstract:
Nonparametric cointegrating regression models have been extensively used in financial markets, stock prices, heavy traffic, climate data sets, and energy markets. Models with parametric regression functions can be more appealing in practice compared to non-parametric forms, but do result in potential functional misspecification. Thus, there exists a vast literature on develo** a model specificat…
▽ More
Nonparametric cointegrating regression models have been extensively used in financial markets, stock prices, heavy traffic, climate data sets, and energy markets. Models with parametric regression functions can be more appealing in practice compared to non-parametric forms, but do result in potential functional misspecification. Thus, there exists a vast literature on develo** a model specification test for parametric forms of regression functions. In this paper, we develop two test statistics which are applicable for the endogenous regressors driven by long memory and semi-long memory input shocks in the regression model. The limit distributions of the test statistics under these two scenarios are complicated and cannot be effectively used in practice. To overcome this difficulty, we use the subsampling method and compute the test statistics on smaller blocks of the data to construct their empirical distributions. Throughout, Monte Carlo simulation studies are used to illustrate the properties of test statistics. We also provide an empirical example of relating gross domestic product to total output of carbon dioxide in two European countries.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Nonparametric Cointegrating Regression Functions with Endogeneity and Semi-Long Memory
Authors:
Sepideh Mosaferi,
Mark S. Kaiser
Abstract:
This article develops nonparametric cointegrating regression models with endogeneity and semi-long memory. We assume semi-long memory is produced in the regressor process by tempering of random shock coefficients. The fundamental properties of long memory processes are thus retained in the regressor process. Nonparametric nonlinear cointegrating regressions with serially dependent errors and endog…
▽ More
This article develops nonparametric cointegrating regression models with endogeneity and semi-long memory. We assume semi-long memory is produced in the regressor process by tempering of random shock coefficients. The fundamental properties of long memory processes are thus retained in the regressor process. Nonparametric nonlinear cointegrating regressions with serially dependent errors and endogenous regressors that are driven by long memory innovations have been considered in Wang and Phillips (2016). That work also implemented a statistical specification test for testing whether the regression function follows a parametric form. The convergence rate of the proposed test is parameter dependent, and its limit theory involves the local time of fractional Brownian motion. The present paper modifies the test statistic proposed for the long memory case by Wang and Phillips (2016) to be suitable for the semi-long memory case. With this modification, the limit theory for the test involves the local time of standard Brownian motion. Through simulation studies, we investigate properties of nonparametric regression function estimation with semi-long memory regressors as well as long memory regressors.
△ Less
Submitted 26 August, 2022; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Deep Learning in Mining Biological Data
Authors:
Mufti Mahmud,
M Shamim Kaiser,
Amir Hussain
Abstract:
Recent technological advancements in data acquisition tools allowed life scientists to acquire multimodal data from different biological application domains. Broadly categorized in three types (i.e., sequences, images, and signals), these data are huge in amount and complex in nature. Mining such an enormous amount of data for pattern recognition is a big challenge and requires sophisticated data-…
▽ More
Recent technological advancements in data acquisition tools allowed life scientists to acquire multimodal data from different biological application domains. Broadly categorized in three types (i.e., sequences, images, and signals), these data are huge in amount and complex in nature. Mining such an enormous amount of data for pattern recognition is a big challenge and requires sophisticated data-intensive machine learning techniques. Artificial neural network-based learning systems are well known for their pattern recognition capabilities and lately their deep architectures - known as deep learning (DL) - have been successfully applied to solve many complex pattern recognition problems. Highlighting the role of DL in recognizing patterns in biological data, this article provides - applications of DL to biological sequences, images, and signals data; overview of open access sources of these data; description of open source DL tools applicable on these data; and comparison of these tools from qualitative and quantitative perspectives. At the end, it outlines some open research challenges in mining biological data and puts forward a number of possible future perspectives.
△ Less
Submitted 28 February, 2020;
originally announced March 2020.
-
Simulating Markov random fields with a conclique-based Gibbs sampler
Authors:
Andee Kaplan,
Mark S. Kaiser,
Soumendra N. Lahiri,
Daniel J. Nordman
Abstract:
For spatial and network data, we consider models formed from a Markov random field (MRF) structure and the specification of a conditional distribution for each observation. Fast simulation from such MRF models is often an important consideration, particularly when repeated generation of large numbers of data sets is required. However, a standard Gibbs strategy for simulating from MRF models involv…
▽ More
For spatial and network data, we consider models formed from a Markov random field (MRF) structure and the specification of a conditional distribution for each observation. Fast simulation from such MRF models is often an important consideration, particularly when repeated generation of large numbers of data sets is required. However, a standard Gibbs strategy for simulating from MRF models involves single-site updates, performed with the conditional univariate distribution of each observation in a sequential manner, whereby a complete Gibbs iteration may become computationally involved even for moderate samples. As an alternative, we describe a general way to simulate from MRF models using Gibbs sampling with "concliques" (i.e., groups of non-neighboring observations). Compared to standard Gibbs sampling, this simulation scheme can be much faster by reducing Gibbs steps and independently updating all observations per conclique at once. The speed improvement depends on the number of concliques relative to the sample size for simulation, and order-of-magnitude speed increases are possible with many MRF models (e.g., having appropriately bounded neighborhoods). We detail the simulation method, establish its validity, and assess its computational performance through numerical studies, where speed advantages are shown for several spatial and network examples.
△ Less
Submitted 10 September, 2019; v1 submitted 14 August, 2018;
originally announced August 2018.
-
A Brain-Inspired Trust Management Model to Assure Security in a Cloud based IoT Framework for Neuroscience Applications
Authors:
Mufti Mahmud,
M. Shamim Kaiser,
M. Mostafizur Rahman,
M. Arifur Rahman,
Antesar Shabut,
Shamim Al-Mamun,
Amir Hussain
Abstract:
Rapid popularity of Internet of Things (IoT) and cloud computing permits neuroscientists to collect multilevel and multichannel brain data to better understand brain functions, diagnose diseases, and devise treatments. To ensure secure and reliable data communication between end-to-end (E2E) devices supported by current IoT and cloud infrastructure, trust management is needed at the IoT and user e…
▽ More
Rapid popularity of Internet of Things (IoT) and cloud computing permits neuroscientists to collect multilevel and multichannel brain data to better understand brain functions, diagnose diseases, and devise treatments. To ensure secure and reliable data communication between end-to-end (E2E) devices supported by current IoT and cloud infrastructure, trust management is needed at the IoT and user ends. This paper introduces a Neuro-Fuzzy based Brain-inspired trust management model (TMM) to secure IoT devices and relay nodes, and to ensure data reliability. The proposed TMM utilizes node behavioral trust and data trust estimated using Adaptive Neuro-Fuzzy Inference System and weighted-additive methods respectively to assess the nodes trustworthiness. In contrast to the existing fuzzy based TMMs, the NS2 simulation results confirm the robustness and accuracy of the proposed TMM in identifying malicious nodes in the communication network. With the growing usage of cloud based IoT frameworks in Neuroscience research, integrating the proposed TMM into the existing infrastructure will assure secure and reliable data communication among the E2E devices.
△ Less
Submitted 11 January, 2018;
originally announced January 2018.
-
An Energy Conserving Routing Scheme for Wireless Body Sensor Nanonetwork Communication
Authors:
Fariha Afsana,
Md. Asif-Ur-Rahman,
Muhammad R. Ahmed,
Mufti Mahmud,
M. Shamim Kaiser
Abstract:
Current developments in nanotechnology make electromagnetic communication (EC) possible at the nanoscale for applications involving Wireless [Body] Sensor Networks (W[B]SNs). This specialized branch of WSN has emerged as an important research area contributing to medical treatment, social welfare, and sports. The concept is based on the interaction of integrated nanoscale machines by means of wire…
▽ More
Current developments in nanotechnology make electromagnetic communication (EC) possible at the nanoscale for applications involving Wireless [Body] Sensor Networks (W[B]SNs). This specialized branch of WSN has emerged as an important research area contributing to medical treatment, social welfare, and sports. The concept is based on the interaction of integrated nanoscale machines by means of wireless communications. One key hurdle for advancing nanocommunications is the lack of an apposite networking protocol to address the upcoming needs of the nanonetworks. Recently, some key challenges have been identified, such as nanonodes with extreme energy constraints, limited computational capabilities, Terahertz frequency bands with limited transmission range, etc., in designing protocols for wireless nanosensor networks (WNN). This work proposes an improved performance scheme of nanocommunication over Terahertz bands for wireless BSNs making it suitable for smart e-health applications. The scheme contains -- a new energy-efficient forwarding routine for EC in WNN consisting of hybrid clusters with centralized scheduling, a model designed for channel behavior taking into account the aggregated impact of molecular absorption, spreading loss, and shadowing, and an energy model for energy harvesting and consumption. The outage probability is derived for both single and multilinks and extended to determine the outage capacity. The outage probability for a multilink is derived using a cooperative fusion technique at a predefined fusion node. Simulated using a Nano-Sim simulator, performance of the proposed model has been evaluated for energy efficiency, outage capacity, and outage probability. The results demonstrate the efficiency of the proposed scheme through maximized energy utilization in both single and multihop communication, multisensor fusion enhances the link quality of the transmission.
△ Less
Submitted 7 January, 2018;
originally announced January 2018.
-
Applications of Deep Learning and Reinforcement Learning to Biological Data
Authors:
Mufti Mahmud,
M. Shamim Kaiser,
Amir Hussain,
Stefano Vassanelli
Abstract:
Rapid advances of hardware-based technologies during the past decades have opened up new possibilities for Life scientists to gather multimodal data in various application domains (e.g., Omics, Bioimaging, Medical Imaging, and [Brain/Body]-Machine Interfaces), thus generating novel opportunities for development of dedicated data intensive machine learning techniques. Overall, recent research in De…
▽ More
Rapid advances of hardware-based technologies during the past decades have opened up new possibilities for Life scientists to gather multimodal data in various application domains (e.g., Omics, Bioimaging, Medical Imaging, and [Brain/Body]-Machine Interfaces), thus generating novel opportunities for development of dedicated data intensive machine learning techniques. Overall, recent research in Deep learning (DL), Reinforcement learning (RL), and their combination (Deep RL) promise to revolutionize Artificial Intelligence. The growth in computational power accompanied by faster and increased data storage and declining computing costs have already allowed scientists in various fields to apply these techniques on datasets that were previously intractable for their size and complexity. This review article provides a comprehensive survey on the application of DL, RL, and Deep RL techniques in mining Biological data. In addition, we compare performances of DL techniques when applied to different datasets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.
△ Less
Submitted 7 January, 2018; v1 submitted 10 November, 2017;
originally announced November 2017.
-
Goodness of fit tests for a class of Markov random field models
Authors:
Mark S. Kaiser,
Soumendra N. Lahiri,
Daniel J. Nordman
Abstract:
This paper develops goodness of fit statistics that can be used to formally assess Markov random field models for spatial data, when the model distributions are discrete or continuous and potentially parametric. Test statistics are formed from generalized spatial residuals which are collected over groups of nonneighboring spatial observations, called concliques. Under a hypothesized Markov model s…
▽ More
This paper develops goodness of fit statistics that can be used to formally assess Markov random field models for spatial data, when the model distributions are discrete or continuous and potentially parametric. Test statistics are formed from generalized spatial residuals which are collected over groups of nonneighboring spatial observations, called concliques. Under a hypothesized Markov model structure, spatial residuals within each conclique are shown to be independent and identically distributed as uniform variables. The information from a series of concliques can be then pooled into goodness of fit statistics. Under some conditions, large sample distributions of these statistics are explicitly derived for testing both simple and composite hypotheses, where the latter involves additional parametric estimation steps. The distributional results are verified through simulation, and a data example illustrates the method for model assessment.
△ Less
Submitted 28 May, 2012;
originally announced May 2012.