-
Comparison of machine learning models applied on anonymized data with different techniques
Authors:
Judith Sáinz-Pardo Díaz,
Álvaro López García
Abstract:
Anonymization techniques based on obfuscating the quasi-identifiers by means of value generalization hierarchies are widely used to achieve preset levels of privacy. To prevent different types of attacks against database privacy it is necessary to apply several anonymization techniques beyond the classical k-anonymity or $\ell$-diversity. However, the application of these methods is directly conne…
▽ More
Anonymization techniques based on obfuscating the quasi-identifiers by means of value generalization hierarchies are widely used to achieve preset levels of privacy. To prevent different types of attacks against database privacy it is necessary to apply several anonymization techniques beyond the classical k-anonymity or $\ell$-diversity. However, the application of these methods is directly connected to a reduction of their utility in prediction and decision making tasks. In this work we study four classical machine learning methods currently used for classification purposes in order to analyze the results as a function of the anonymization techniques applied and the parameters selected for each of them. The performance of these models is studied when varying the value of k for k-anonymity and additional tools such as $\ell$-diversity, t-closeness and $δ$-disclosure privacy are also deployed on the well-known adult dataset.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
pyCANON: A Python library to check the level of anonymity of a dataset
Authors:
Judith Sáinz-Pardo Díaz,
Álvaro López García
Abstract:
Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymization techniques: k-anonymity, ($α$,k)-anonymity, $\ell$-diversity, entropy $\ell$-diversity, recursi…
▽ More
Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymization techniques: k-anonymity, ($α$,k)-anonymity, $\ell$-diversity, entropy $\ell$-diversity, recursive (c,$\ell$)-diversity, basic $β$-likeness, enhanced $β$-likeness, t-closeness and $δ$-disclosure privacy. For the case of more than one sensitive attributes, two approaches are proposed for evaluating this techniques. The main strength of this library is to obtain a full report of the parameters that are fulfilled for each of the techniques mentioned above, with the unique requirement of the set of quasi-identifiers and that of sensitive attributes. We present the methods implemented together with the attacks they prevent, the description of the library, use examples of the different functions, as well as the impact and the possible applications that can be developed. Finally, some possible aspects to be incorporated in future updates are proposed.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Assessing The Performance of YOLOv5 Algorithm for Detecting Volunteer Cotton Plants in Corn Fields at Three Different Growth Stages
Authors:
Pappu Kumar Yadav,
J. Alex Thomasson,
Stephen W. Searcy,
Robert G. Hardin,
Ulisses Braga-Neto,
Sorin C. Popescu,
Daniel E. Martin,
Roberto Rodriguez,
Karem Meza,
Juan Enciso,
Jorge Solorzano Diaz,
Tianyi Wang
Abstract:
The boll weevil (Anthonomus grandis L.) is a serious pest that primarily feeds on cotton plants. In places like Lower Rio Grande Valley of Texas, due to sub-tropical climatic conditions, cotton plants can grow year-round and therefore the left-over seeds from the previous season during harvest can continue to grow in the middle of rotation crops like corn (Zea mays L.) and sorghum (Sorghum bicolor…
▽ More
The boll weevil (Anthonomus grandis L.) is a serious pest that primarily feeds on cotton plants. In places like Lower Rio Grande Valley of Texas, due to sub-tropical climatic conditions, cotton plants can grow year-round and therefore the left-over seeds from the previous season during harvest can continue to grow in the middle of rotation crops like corn (Zea mays L.) and sorghum (Sorghum bicolor L.). These feral or volunteer cotton (VC) plants when reach the pinhead squaring phase (5-6 leaf stage) can act as hosts for the boll weevil pest. The Texas Boll Weevil Eradication Program (TBWEP) employs people to locate and eliminate VC plants growing by the side of roads or fields with rotation crops but the ones growing in the middle of fields remain undetected. In this paper, we demonstrate the application of computer vision (CV) algorithm based on You Only Look Once version 5 (YOLOv5) for detecting VC plants growing in the middle of corn fields at three different growth stages (V3, V6, and VT) using unmanned aircraft systems (UAS) remote sensing imagery. All the four variants of YOLOv5 (s, m, l, and x) were used and their performances were compared based on classification accuracy, mean average precision (mAP), and F1-score. It was found that YOLOv5s could detect VC plants with a maximum classification accuracy of 98% and mAP of 96.3 % at the V6 stage of corn while YOLOv5s and YOLOv5m resulted in the lowest classification accuracy of 85% and YOLOv5m and YOLOv5l had the least mAP of 86.5% at the VT stage on images of size 416 x 416 pixels. The developed CV algorithm has the potential to effectively detect and locate VC plants growing in the middle of corn fields as well as expedite the management aspects of TBWEP.
△ Less
Submitted 31 July, 2022;
originally announced August 2022.
-
Study of the performance and scalability of federated learning for medical imaging with intermittent clients
Authors:
Judith Sáinz-Pardo Díaz,
Álvaro López García
Abstract:
Federated learning is a data decentralization privacy-preserving technique used to perform machine or deep learning in a secure way. In this paper we present theoretical aspects about federated learning, such as the presentation of an aggregation operator, different types of federated learning, and issues to be taken into account in relation to the distribution of data from the clients, together w…
▽ More
Federated learning is a data decentralization privacy-preserving technique used to perform machine or deep learning in a secure way. In this paper we present theoretical aspects about federated learning, such as the presentation of an aggregation operator, different types of federated learning, and issues to be taken into account in relation to the distribution of data from the clients, together with the exhaustive analysis of a use case where the number of clients varies. Specifically, a use case of medical image analysis is proposed, using chest X-Ray images obtained from an open data repository. In addition to the advantages related to privacy, improvements in predictions (in terms of accuracy, loss and area under the curve) and reduction of execution times will be studied with respect to the classical case (the centralized approach). Different clients will be simulated from the training data, selected in an unbalanced manner. The results of considering three or ten clients are exposed and compared between them and against the centralized case. Two different problems related to intermittent clients are discussed, together with two approaches to be followed for each of them. Specifically, this type of problems may occur because in a real scenario some clients may leave the training, and others enter it, and on the other hand because of client technical or connectivity problems. Finally, improvements and future work in the field are proposed.
△ Less
Submitted 3 November, 2022; v1 submitted 18 July, 2022;
originally announced July 2022.
-
Computer Vision for Volunteer Cotton Detection in a Corn Field with UAS Remote Sensing Imagery and Spot Spray Applications
Authors:
Pappu Kumar Yadav,
J. Alex Thomasson,
Stephen W. Searcy,
Robert G. Hardin,
Ulisses Braga-Neto,
Sorin C. Popescu,
Daniel E. Martin,
Roberto Rodriguez,
Karem Meza,
Juan Enciso,
Jorge Solorzano Diaz,
Tianyi Wang
Abstract:
To control boll weevil (Anthonomus grandis L.) pest re-infestation in cotton fields, the current practices of volunteer cotton (VC) (Gossypium hirsutum L.) plant detection in fields of rotation crops like corn (Zea mays L.) and sorghum (Sorghum bicolor L.) involve manual field scouting at the edges of fields. This leads to many VC plants growing in the middle of fields remain undetected that conti…
▽ More
To control boll weevil (Anthonomus grandis L.) pest re-infestation in cotton fields, the current practices of volunteer cotton (VC) (Gossypium hirsutum L.) plant detection in fields of rotation crops like corn (Zea mays L.) and sorghum (Sorghum bicolor L.) involve manual field scouting at the edges of fields. This leads to many VC plants growing in the middle of fields remain undetected that continue to grow side by side along with corn and sorghum. When they reach pinhead squaring stage (5-6 leaves), they can serve as hosts for the boll weevil pests. Therefore, it is required to detect, locate and then precisely spot-spray them with chemicals. In this paper, we present the application of YOLOv5m on radiometrically and gamma-corrected low resolution (1.2 Megapixel) multispectral imagery for detecting and locating VC plants growing in the middle of tasseling (VT) growth stage of cornfield. Our results show that VC plants can be detected with a mean average precision (mAP) of 79% and classification accuracy of 78% on images of size 1207 x 923 pixels at an average inference speed of nearly 47 frames per second (FPS) on NVIDIA Tesla P100 GPU-16GB and 0.4 FPS on NVIDIA Jetson TX2 GPU. We also demonstrate the application of a customized unmanned aircraft systems (UAS) for spot-spray applications based on the developed computer vision (CV) algorithm and how it can be used for near real-time detection and mitigation of VC plants growing in corn fields for efficient management of the boll weevil pests.
△ Less
Submitted 15 July, 2022;
originally announced July 2022.
-
Detecting Volunteer Cotton Plants in a Corn Field with Deep Learning on UAV Remote-Sensing Imagery
Authors:
Pappu Kumar Yadav,
J. Alex Thomasson,
Robert Hardin,
Stephen W. Searcy,
Ulisses Braga-Neto,
Sorin C. Popescu,
Daniel E. Martin,
Roberto Rodriguez,
Karem Meza,
Juan Enciso,
Jorge Solorzano Diaz,
Tianyi Wang
Abstract:
The cotton boll weevil, Anthonomus grandis Boheman is a serious pest to the U.S. cotton industry that has cost more than 16 billion USD in damages since it entered the United States from Mexico in the late 1800s. This pest has been nearly eradicated; however, southern part of Texas still faces this issue and is always prone to the pest reinfestation each year due to its sub-tropical climate where…
▽ More
The cotton boll weevil, Anthonomus grandis Boheman is a serious pest to the U.S. cotton industry that has cost more than 16 billion USD in damages since it entered the United States from Mexico in the late 1800s. This pest has been nearly eradicated; however, southern part of Texas still faces this issue and is always prone to the pest reinfestation each year due to its sub-tropical climate where cotton plants can grow year-round. Volunteer cotton (VC) plants growing in the fields of inter-seasonal crops, like corn, can serve as hosts to these pests once they reach pin-head square stage (5-6 leaf stage) and therefore need to be detected, located, and destroyed or sprayed . In this paper, we present a study to detect VC plants in a corn field using YOLOv3 on three band aerial images collected by unmanned aircraft system (UAS). The two-fold objectives of this paper were : (i) to determine whether YOLOv3 can be used for VC detection in a corn field using RGB (red, green, and blue) aerial images collected by UAS and (ii) to investigate the behavior of YOLOv3 on images at three different scales (320 x 320, S1; 416 x 416, S2; and 512 x 512, S3 pixels) based on average precision (AP), mean average precision (mAP) and F1-score at 95% confidence level. No significant differences existed for mAP among the three scales, while a significant difference was found for AP between S1 and S3 (p = 0.04) and S2 and S3 (p = 0.02). A significant difference was also found for F1-score between S2 and S3 (p = 0.02). The lack of significant differences of mAP at all the three scales indicated that the trained YOLOv3 model can be used on a computer vision-based remotely piloted aerial application system (RPAAS) for VC detection and spray application in near real-time.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
Forecasting COVID-19 spreading trough an ensemble of classical and machine learning models: Spain's case study
Authors:
Ignacio Heredia Cacha,
Judith Sainz-Pardo Díaz,
María Castrillo Melguizo,
Álvaro López García
Abstract:
In this work we evaluate the applicability of an ensemble of population models and machine learning models to predict the near future evolution of the COVID-19 pandemic, with a particular use case in Spain. We rely solely in open and public datasets, fusing incidence, vaccination, human mobility and weather data to feed our machine learning models (Random Forest, Gradient Boosting, k-Nearest Neigh…
▽ More
In this work we evaluate the applicability of an ensemble of population models and machine learning models to predict the near future evolution of the COVID-19 pandemic, with a particular use case in Spain. We rely solely in open and public datasets, fusing incidence, vaccination, human mobility and weather data to feed our machine learning models (Random Forest, Gradient Boosting, k-Nearest Neighbours and Kernel Ridge Regression). We use the incidence data to adjust classic population models (Gompertz, Logistic, Richards, Bertalanffy) in order to be able to better capture the trend of the data. We then ensemble these two families of models in order to obtain a more robust and accurate prediction. Furthermore, we have observed an improvement in the predictions obtained with machine learning models as we add new features (vaccines, mobility, climatic conditions), analyzing the importance of each of them using Shapley Additive Explanation values. As in any other modelling work, data and predictions quality have several limitations and therefore they must be seen from a critical standpoint, as we discuss in the text. Our work concludes that the ensemble use of these models improves the individual predictions (using only machine learning models or only population models) and can be applied, with caution, in cases when compartmental models cannot be utilized due to the lack of relevant data.
△ Less
Submitted 12 August, 2022; v1 submitted 12 July, 2022;
originally announced July 2022.