-
Naïve Bayes and Random Forest for Crop Yield Prediction
Authors:
Abbas Maazallahi,
Sreehari Thota,
Naga Prasad Kondaboina,
Vineetha Muktineni,
Deepthi Annem,
Abhi Stephen Rokkam,
Mohammad Hossein Amini,
Mohammad Amir Salari,
Payam Norouzzadeh,
Eli Snir,
Bahareh Rahmani
Abstract:
This study analyzes crop yield prediction in India from 1997 to 2020, focusing on various crops and key environmental factors. It aims to predict agricultural yields by utilizing advanced machine learning techniques like Linear Regression, Decision Tree, KNN, Naïve Bayes, K-Mean Clustering, and Random Forest. The models, particularly Naïve Bayes and Random Forest, demonstrate high effectiveness, a…
▽ More
This study analyzes crop yield prediction in India from 1997 to 2020, focusing on various crops and key environmental factors. It aims to predict agricultural yields by utilizing advanced machine learning techniques like Linear Regression, Decision Tree, KNN, Naïve Bayes, K-Mean Clustering, and Random Forest. The models, particularly Naïve Bayes and Random Forest, demonstrate high effectiveness, as shown through data visualizations. The research concludes that integrating these analytical methods significantly enhances the accuracy and reliability of crop yield predictions, offering vital contributions to agricultural data science.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Rate-Convergence Tradeoff of Federated Learning over Wireless Channel
Authors:
Ayoob Salari,
Mahyar Shirvanimoghaddam,
Branka Vucetic,
Sarah Johnson
Abstract:
In this paper, we consider a federated learning problem over wireless channel that takes into account the coding rate and packet transmission errors. Communication channels are modelled as packet erasure channels (PEC), where the erasure probability is determined by the block length, code rate, and signal-to-noise ratio (SNR). To lessen the effect of packet erasure on the FL performance, we propos…
▽ More
In this paper, we consider a federated learning problem over wireless channel that takes into account the coding rate and packet transmission errors. Communication channels are modelled as packet erasure channels (PEC), where the erasure probability is determined by the block length, code rate, and signal-to-noise ratio (SNR). To lessen the effect of packet erasure on the FL performance, we propose two schemes in which the central node (CN) reuses either the past local updates or the previous global parameters in case of packet erasure. We investigate the impact of coding rate on the convergence of federated learning (FL) for both short packet and long packet communications considering erroneous transmissions. Our simulation results shows that even one unit of memory has considerable impact on the performance of FL in erroneous communication.
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
NOMA Joint Channel Estimation and Signal Detection using Rotational Invariant Codes and GMM-based Clustering
Authors:
Ayoob Salari,
Mahyar Shirvanimoghaddam,
Muhammad Basit Shahab,
Yonghui Li,
Sarah Johnson
Abstract:
This paper studies the joint channel estimation and signal detection for the uplink power-domain non-orthogonal multiple access. The proposed technique performs both detection and estimation without the need of pilot symbols by using a clustering technique. We apply rotational-invariant coding to assist signal detection at the receiver without sending pilot symbols. We utilize Gaussian mixture mod…
▽ More
This paper studies the joint channel estimation and signal detection for the uplink power-domain non-orthogonal multiple access. The proposed technique performs both detection and estimation without the need of pilot symbols by using a clustering technique. We apply rotational-invariant coding to assist signal detection at the receiver without sending pilot symbols. We utilize Gaussian mixture model (GMM) to automatically cluster the received signals without supervision and optimize decision boundaries to improve the bit error rate (BER) performance. Simulation results show that the proposed scheme without using any pilot symbol achieves almost the same BER performance as that for the conventional maximum likelihood receiver with full channel state information.
△ Less
Submitted 7 July, 2022; v1 submitted 25 February, 2022;
originally announced February 2022.
-
Federated Learning with Erroneous Communication Links
Authors:
Mahyar Shirvanimoghaddam,
Ayoob Salari,
Yifeng Gao,
Aradhika Guha
Abstract:
In this paper, we consider the federated learning (FL) problem in the presence of communication errors. We model the link between the devices and the central node (CN) by a packet erasure channel, where the local parameters from devices are either erased or received correctly by CN with probability $ε$ and $1-ε$, respectively. We proved that the FL algorithm in the presence of communication errors…
▽ More
In this paper, we consider the federated learning (FL) problem in the presence of communication errors. We model the link between the devices and the central node (CN) by a packet erasure channel, where the local parameters from devices are either erased or received correctly by CN with probability $ε$ and $1-ε$, respectively. We proved that the FL algorithm in the presence of communication errors, where the CN uses the past local update if the fresh one is not received from a device, converges to the same global parameter as that the FL algorithm converges to without any communication error. We provide several simulation results to validate our theoretical analysis. We also show that when the dataset is uniformly distributed among devices, the FL algorithm that only uses fresh updates and discards missing updates might converge faster than the FL algorithm that uses past local updates.
△ Less
Submitted 11 April, 2022; v1 submitted 30 January, 2022;
originally announced January 2022.
-
Design and Analysis of Clustering-based Joint Channel Estimation and Signal Detection for NOMA
Authors:
Ayoob Salari,
Mahyar Shirvanimoghaddam,
Muhammad Basit Shahab,
Reza Arablouei,
Sarah Johnson
Abstract:
We propose a joint channel estimation and signal detection approach for the uplink non-orthogonal multiple access (NOMA) using unsupervised machine learning. We apply a Gaussian mixture model (GMM) to cluster the received signals, and accordingly optimize the decision regions to enhance the symbol error rate (SER) performance. We show that, when the received powers of the users are sufficiently di…
▽ More
We propose a joint channel estimation and signal detection approach for the uplink non-orthogonal multiple access (NOMA) using unsupervised machine learning. We apply a Gaussian mixture model (GMM) to cluster the received signals, and accordingly optimize the decision regions to enhance the symbol error rate (SER) performance. We show that, when the received powers of the users are sufficiently different, the proposed clustering-based approach achieves an SER performance on a par with that of the conventional maximum-likelihood detector (MLD) with full channel state information (CSI). We study the tradeoff between the accuracy of the proposed approach and the blocklength, as the accuracy of the utilized clustering algorithm depends on the number of symbols available at the receiver. We provide a comprehensive performance analysis of the proposed approach and derive a theoretical bound on its SER performance. Our simulation results corroborate the effectiveness of the proposed approach and verify that the calculated theoretical bound can predict the SER performance of the proposed approach well. We further explore the application of the proposed approach to a practical grant-free NOMA scenario, and show that its performance is very close to that of the optimal MLD with full CSI, which usually requires long pilot sequences.
△ Less
Submitted 22 December, 2022; v1 submitted 17 January, 2022;
originally announced January 2022.
-
Data Augmentation Through Monte Carlo Arithmetic Leads to More Generalizable Classification in Connectomics
Authors:
Gregory Kiar,
Yohan Chatelain,
Ali Salari,
Alan C. Evans,
Tristan Glatard
Abstract:
Machine learning models are commonly applied to human brain imaging datasets in an effort to associate function or structure with behaviour, health, or other individual phenotypes. Such models often rely on low-dimensional maps generated by complex processing pipelines. However, the numerical instabilities inherent to pipelines limit the fidelity of these maps and introduce computational bias. Mon…
▽ More
Machine learning models are commonly applied to human brain imaging datasets in an effort to associate function or structure with behaviour, health, or other individual phenotypes. Such models often rely on low-dimensional maps generated by complex processing pipelines. However, the numerical instabilities inherent to pipelines limit the fidelity of these maps and introduce computational bias. Monte Carlo Arithmetic, a technique for introducing controlled amounts of numerical noise, was used to perturb a structural connectome estimation pipeline, ultimately producing a range of plausible networks for each sample. The variability in the perturbed networks was captured in an augmented dataset, which was then used for an age classification task. We found that resampling brain networks across a series of such numerically perturbed outcomes led to improved performance in all tested classifiers, preprocessing strategies, and dimensionality reduction techniques. Importantly, we find that this benefit does not hinge on a large number of perturbations, suggesting that even minimally perturbing a dataset adds meaningful variance which can be captured in the subsequently designed models.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Accurate simulation of operating system updates in neuroimaging using Monte-Carlo arithmetic
Authors:
Ali Salari,
Yohan Chatelain,
Gregory Kiar,
Tristan Glatard
Abstract:
Operating system (OS) updates introduce numerical perturbations that impact the reproducibility of computational pipelines. In neuroimaging, this has important practical implications on the validity of computational results, particularly when obtained in systems such as high-performance computing clusters where the experimenter does not control software updates. We present a framework to reproduce…
▽ More
Operating system (OS) updates introduce numerical perturbations that impact the reproducibility of computational pipelines. In neuroimaging, this has important practical implications on the validity of computational results, particularly when obtained in systems such as high-performance computing clusters where the experimenter does not control software updates. We present a framework to reproduce the variability induced by OS updates in controlled conditions. We hypothesize that OS updates impact computational pipelines mainly through numerical perturbations originating in mathematical libraries, which we simulate using Monte-Carlo arithmetic in a framework called "fuzzy libmath" (FL). We applied this methodology to pre-processing pipelines of the Human Connectome Project, a flagship open-data project in neuroimaging. We found that FL-perturbed pipelines accurately reproduce the variability induced by OS updates and that this similarity is only mildly dependent on simulation parameters. Importantly, we also found between-subject differences were preserved in both cases, though the between-run variability was of comparable magnitude for both FL and OS perturbations. We found the numerical precision in the HCP pre-processed images to be relatively low, with less than 8 significant bits among the 24 available, which motivates further investigation of the numerical stability of components in the tested pipeline. Overall, our results establish that FL accurately simulates results variability due to OS updates, and is a practical framework to quantify numerical uncertainty in neuroimaging.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Clustering-based Joint Channel Estimation and Signal Detection for Grant-free NOMA
Authors:
Ayoob Salari,
Mahyar Shirvanimoghaddam,
Muhammad Basit Shahab,
Reza Arablouei,
Sarah Johnson
Abstract:
We propose a joint channel estimation and signal detection technique for the uplink non-orthogonal multiple access using an unsupervised clustering approach. We apply the Gaussian mixture model to cluster received signals and accordingly optimize the decision regions to enhance the symbol error rate (SER). We show that when the received powers of the users are sufficiently different, the proposed…
▽ More
We propose a joint channel estimation and signal detection technique for the uplink non-orthogonal multiple access using an unsupervised clustering approach. We apply the Gaussian mixture model to cluster received signals and accordingly optimize the decision regions to enhance the symbol error rate (SER). We show that when the received powers of the users are sufficiently different, the proposed clustering-based approach with no channel state information (CSI) at the receiver achieves an SER performance similar to that of the conventional maximum likelihood detector with full CSI. Since the accuracy of the utilized clustering algorithm depends on the number of the data points available at the receiver, the proposed technique delivers a tradeoff between the accuracy and block length.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
File-based localization of numerical perturbations in data analysis pipelines
Authors:
Ali Salari,
Gregory Kiar,
Lindsay Lewis,
Alan C. Evans,
Tristan Glatard
Abstract:
Data analysis pipelines are known to be impacted by computational conditions, presumably due to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise causes of such instabilities and the path along which they propagate in pipelines are unclear. We present Spot, a tool to identify which processes in a pipeline…
▽ More
Data analysis pipelines are known to be impacted by computational conditions, presumably due to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise causes of such instabilities and the path along which they propagate in pipelines are unclear. We present Spot, a tool to identify which processes in a pipeline create numerical differences when executed in different computational conditions. Spot leverages system-call interception through ReproZip to reconstruct and compare provenance graphs without pipeline instrumentation. By applying Spot to the structural pre-processing pipelines of the Human Connectome Project, we found that linear and non-linear registration are the cause of most numerical instabilities in these pipelines, which confirms previous findings.
△ Less
Submitted 28 September, 2020; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Deep Modulation Embedding
Authors:
Amin Abbasloo,
Alan Salari
Abstract:
Deep neural network has recently shown very promising applications in different research directions and attracted the industry attention as well. Although the idea was introduced in the past but just recently the main limitation of using this class of algorithms is solved by enabling parallel computing on GPU hardware. Opening the possibility of hardware prototy** with proven superiority of this…
▽ More
Deep neural network has recently shown very promising applications in different research directions and attracted the industry attention as well. Although the idea was introduced in the past but just recently the main limitation of using this class of algorithms is solved by enabling parallel computing on GPU hardware. Opening the possibility of hardware prototy** with proven superiority of this class of algorithm, trigger several research directions in communication system too. Among them cognitive radio, modulation recognition, learning based receiver and transceiver are already given very interesting result in simulation and real experimental evaluation implemented on software defined radio. Specifically, modulation recognition is mostly approached as a classification problem which is a supervised learning framework. But it is here addressed as an unsupervised problem with introducing new features for training, a new loss function and investigating the robustness of the pipeline against several mismatch conditions.
△ Less
Submitted 15 April, 2019; v1 submitted 17 February, 2019;
originally announced February 2019.
-
Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering
Authors:
Soudabeh Barghi,
Lalet Scaria,
Ali Salari,
Tristan Glatard
Abstract:
Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the prob…
▽ More
Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the problem as a collaborative filtering process, with constraints on the construction of the training set. We propose 6 different strategies to build the training set, which we evaluate on 2 datasets, a synthetic one modeling a population with a growing number of subject types, and a real one obtained with neuroinformatics pipelines. Results show that one sampling method, "Random File Numbers (Uniform)" is able to predict computational reproducibility with a good accuracy. We also analyze the relevance of including file and subject biases in the collaborative filtering model. We conclude that the proposed method is able to speedup reproducibility evaluations substantially, with a reduced accuracy loss.
△ Less
Submitted 26 September, 2018;
originally announced September 2018.