-
Principal Component Regression to Study the Impact of Economic Factors on Disadvantaged Communities
Authors:
Narmadha M. Mohankumar,
Milan Jain,
Heng Wan,
Sumitrra Ganguli,
Kyle D. Wilson,
David M. Anderson
Abstract:
The Council on Environmental Quality's Climate and Economic Justice Screening Tool defines "disadvantaged communities" (DAC) in the USA, highlighting census tracts where benefits of climate and energy investments are not accruing. We use a principal component generalized linear model, which addresses the intertwined nature of economic factors, income and employment and model their relationship to…
▽ More
The Council on Environmental Quality's Climate and Economic Justice Screening Tool defines "disadvantaged communities" (DAC) in the USA, highlighting census tracts where benefits of climate and energy investments are not accruing. We use a principal component generalized linear model, which addresses the intertwined nature of economic factors, income and employment and model their relationship to DAC status. Our study 1) identifies the most significant income groups and employment industries that impact DAC status, 2) provides the probability of DAC status across census tracts and compares the predictive accuracy with widely used machine learning approaches, 3) obtains historical predictions of the probability of DAC status, 4) obtains spatial downscaling of DAC status across block groups. Our study provides valuable insights for policymakers and stakeholders to develop strategies that promote sustainable development and address inequities in climate and energy investments in the USA.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Data fusion of distance sampling and capture-recapture data
Authors:
Narmadha M. Mohankumar,
Trevor J. Hefley,
Katy Silber,
W. Alice Boyle
Abstract:
Species distribution models (SDMs) are increasingly used in ecology, biogeography, and wildlife management to learn about the species-habitat relationships and abundance across space and time. Distance sampling (DS) and capture-recapture (CR) are two widely collected data types to learn about species-habitat relationships and abundance; still, they are seldomly used in SDMs due to the lack of spat…
▽ More
Species distribution models (SDMs) are increasingly used in ecology, biogeography, and wildlife management to learn about the species-habitat relationships and abundance across space and time. Distance sampling (DS) and capture-recapture (CR) are two widely collected data types to learn about species-habitat relationships and abundance; still, they are seldomly used in SDMs due to the lack of spatial coverage. However, data fusion of the two data sources can increase spatial coverage, which can reduce parameter uncertainty and make predictions more accurate, and therefore, can be used for species distribution modeling. We developed a model-based approach for data fusion of DS and CR data. Our modeling approach accounts for two common missing data issues: 1) missing individuals that are missing not at random (MNAR) and 2) partially missing location information. Using a simulation experiment, we evaluated the performance of our modeling approach and compared it to existing approaches that use ad-hoc methods to account for missing data issues. Our results show that our approach provides unbiased parameter estimates with increased efficiency compared to the existing approaches. We demonstrated our approach using data collected for Grasshopper Sparrows (Ammodramus savannarum) in north-eastern Kansas, USA.
△ Less
Submitted 8 March, 2022;
originally announced March 2022.
-
Using machine learning to identify nontraditional spatial dependence in occupancy data
Authors:
Narmadha M. Mohankumar,
Trevor J. Hefley
Abstract:
Spatial models for occupancy data are used to estimate and map the true presence of a species, which may depend on biotic and abiotic factors as well as spatial autocorrelation. Traditionally researchers have accounted for spatial autocorrelation in occupancy data by using a correlated normally distributed site-level random effect, which might be incapable of identifying nontraditional spatial dep…
▽ More
Spatial models for occupancy data are used to estimate and map the true presence of a species, which may depend on biotic and abiotic factors as well as spatial autocorrelation. Traditionally researchers have accounted for spatial autocorrelation in occupancy data by using a correlated normally distributed site-level random effect, which might be incapable of identifying nontraditional spatial dependence such as discontinuities and abrupt transitions. Machine learning approaches have the potential to identify and model nontraditional spatial dependence, but these approaches do not account for observer errors such as false absences. By combining the flexibility of Bayesian hierarchal modeling and machine learning approaches, we present a general framework to model occupancy data that accounts for both traditional and nontraditional spatial dependence as well as false absences. We demonstrate our framework using six synthetic occupancy data sets and two real data sets. Our results demonstrate how to identify and model both traditional and nontraditional spatial dependence in occupancy data which enables a broader class of spatial occupancy models that can be used to improve predictive accuracy and model adequacy.
△ Less
Submitted 4 May, 2021; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Accounting for location uncertainty in distance sampling data
Authors:
Trevor J. Hefley,
W. Alice Boyle,
Narmadha M. Mohankumar
Abstract:
Ecologists use distance sampling to estimate the abundance of plants and animals while correcting for undetected individuals. By design, data collection is simplified by requiring only the distances from a transect to the detected individuals be recorded. Compared to traditional design-based methods that require restrictive assumption and limit the use of distance sampling data, model-based approa…
▽ More
Ecologists use distance sampling to estimate the abundance of plants and animals while correcting for undetected individuals. By design, data collection is simplified by requiring only the distances from a transect to the detected individuals be recorded. Compared to traditional design-based methods that require restrictive assumption and limit the use of distance sampling data, model-based approaches enable broader applications such as spatial prediction, inferring species-habitat relationships, unbiased estimation from preferentially sampled transects, and integration into multi-type data models. Unfortunately, model-based approaches require the exact location of each detected individual in order to incorporate environmental and habitat characteristics as predictor variables. We modified model-based methods for distance sampling data by including a probability distribution that accounts for location uncertainty generated when only the distances are recorded. We tested and demonstrated our method using a simulation experiment and by modeling the habitat use of Dickcissels (Spiza americana) using distance sampling data collected from the Konza Prairie in Kansas, USA. Our results showed that ignoring location uncertainty can result in biased coefficient estimates and predictions. However, accounting for location uncertainty remedies the issue and results in reliable inference and prediction. Like other types of measurement error, hierarchical models can accommodate the data collection process thereby enabling reliable inference. Our approach is a significant advancement for the analysis of distance sampling data because it remedies the deleterious effects of location uncertainty and requires only distances be recorded. In turn, this enables historical distance sampling data sets to be compatible with modern data collection and modeling practices.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.