-
The Gamma Generalized Normal Distribution: A Descriptor of SAR Imagery
Authors:
G. M. Cordeiro,
R. J. Cintra,
L. C. Rêgo,
A. D. C. Nascimento
Abstract:
We propose a new four-parameter distribution for modeling synthetic aperture radar (SAR) imagery named the gamma generalized normal (GGN) by combining the gamma and generalized normal distributions. A mathematical characterization of the new distribution is provided by identifying the limit behavior and by calculating the density and moment expansions. The GGN model performance is evaluated on bot…
▽ More
We propose a new four-parameter distribution for modeling synthetic aperture radar (SAR) imagery named the gamma generalized normal (GGN) by combining the gamma and generalized normal distributions. A mathematical characterization of the new distribution is provided by identifying the limit behavior and by calculating the density and moment expansions. The GGN model performance is evaluated on both synthetic and actual data and, for that, maximum likelihood estimation and random number generation are discussed. The proposed distribution is compared with the beta generalized normal distribution (BGN), which has already shown to appropriately represent SAR imagery. The performance of these two distributions are measured by means of statistics which provide evidence that the GGN can outperform the BGN distribution in some contexts.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
Inline Detection of DGA Domains Using Side Information
Authors:
Raaghavi Sivaguru,
Jonathan Peck,
Femi Olumofin,
Anderson Nascimento,
Martine De Cock
Abstract:
Malware applications typically use a command and control (C&C) server to manage bots to perform malicious activities. Domain Generation Algorithms (DGAs) are popular methods for generating pseudo-random domain names that can be used to establish a communication between an infected bot and the C&C server. In recent years, machine learning based systems have been widely used to detect DGAs. There ar…
▽ More
Malware applications typically use a command and control (C&C) server to manage bots to perform malicious activities. Domain Generation Algorithms (DGAs) are popular methods for generating pseudo-random domain names that can be used to establish a communication between an infected bot and the C&C server. In recent years, machine learning based systems have been widely used to detect DGAs. There are several well known state-of-the-art classifiers in the literature that can detect DGA domain names in real-time applications with high predictive performance. However, these DGA classifiers are highly vulnerable to adversarial attacks in which adversaries purposely craft domain names to evade DGA detection classifiers. In our work, we focus on hardening DGA classifiers against adversarial attacks. To this end, we train and evaluate state-of-the-art deep learning and random forest (RF) classifiers for DGA detection using side information that is harder for adversaries to manipulate than the domain name itself. Additionally, the side information features are selected such that they are easily obtainable in practice to perform inline DGA detection. The performance and robustness of these models is assessed by exposing them to one day of real-traffic data as well as domains generated by adversarial attack algorithms. We found that the DGA classifiers that rely on both the domain name and side information have high performance and are more robust against adversaries.
△ Less
Submitted 12 March, 2020;
originally announced March 2020.
-
CharBot: A Simple and Effective Method for Evading DGA Classifiers
Authors:
Jonathan Peck,
Claire Nie,
Raaghavi Sivaguru,
Charles Grumer,
Femi Olumofin,
Bin Yu,
Anderson Nascimento,
Martine De Cock
Abstract:
Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this work, we present a novel DGA called CharBot which is capable of producing large numbers of unregistered d…
▽ More
Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this work, we present a novel DGA called CharBot which is capable of producing large numbers of unregistered domain names that are not detected by state-of-the-art classifiers for real-time detection of DGAs, including the recently published methods FANCI (a random forest based on human-engineered features) and LSTM.MI (a deep learning approach). CharBot is very simple, effective and requires no knowledge of the targeted DGA classifiers. We show that retraining the classifiers on CharBot samples is not a viable defense strategy. We believe these findings show that DGA classifiers are inherently vulnerable to adversarial attacks if they rely only on the domain name string to make a decision. Designing a robust DGA classifier may, therefore, necessitate the use of additional information besides the domain name alone. To the best of our knowledge, CharBot is the simplest and most efficient black-box adversarial attack against DGA classifiers proposed to date.
△ Less
Submitted 30 May, 2019; v1 submitted 3 May, 2019;
originally announced May 2019.
-
A new extended Cardioid model: an application to wind data
Authors:
Fernanda V. Paula,
Abraão D. C. Nascimento,
Getúlio J. A. Amaral
Abstract:
The Cardioid distribution is a relevant model for circular data. However, this model is not suitable for scenarios were there is asymmetry or multimodality. In order to overcome this gap, an extended Cardioid model is proposed, which is called Exponentiated Cardioid (EC) distribution. Besides, some of its properties are derived, such as trigonometric moments, kurtosis and skewness. A discussion ab…
▽ More
The Cardioid distribution is a relevant model for circular data. However, this model is not suitable for scenarios were there is asymmetry or multimodality. In order to overcome this gap, an extended Cardioid model is proposed, which is called Exponentiated Cardioid (EC) distribution. Besides, some of its properties are derived, such as trigonometric moments, kurtosis and skewness. A discussion about the modality and and expressions for the quantiles through approximations of the studied model are also presented. To fit the EC model, two estimation methods are presented based on maximum likelihood and quantile least squares procedures. The performance of proposed estimators is evaluated in a Monte Carlo simulation study, adopting both bias and mean square error as comparison criteria. Finally, the proposed model is applied to a dataset in the wind direction context. Results indicate that the EC distribution may outperform Cardioid and the von Mises distributions.
△ Less
Submitted 5 December, 2017;
originally announced December 2017.
-
Bias Correction and Modified Profile Likelihood under the Wishart Complex Distribution
Authors:
Abraão D. C. Nascimento,
Alejandro C. Frery,
Renato J. Cintra
Abstract:
This paper proposes improved methods for the maximum likelihood (ML) estimation of the equivalent number of looks $L$. This parameter has a meaningful interpretation in the context of polarimetric synthetic aperture radar (PolSAR) images. Due to the presence of coherent illumination in their processing, PolSAR systems generate images which present a granular noise called speckle. As a potential so…
▽ More
This paper proposes improved methods for the maximum likelihood (ML) estimation of the equivalent number of looks $L$. This parameter has a meaningful interpretation in the context of polarimetric synthetic aperture radar (PolSAR) images. Due to the presence of coherent illumination in their processing, PolSAR systems generate images which present a granular noise called speckle. As a potential solution for reducing such interference, the parameter $L$ controls the signal-noise ratio. Thus, the proposal of efficient estimation methodologies for $L$ has been sought. To that end, we consider firstly that a PolSAR image is well described by the scaled complex Wishart distribution. In recent years, Anfinsen et al. derived and analyzed estimation methods based on the ML and on trace statistical moments for obtaining the parameter $L$ of the unscaled version of such probability law. This paper generalizes that approach. We present the second-order bias expression proposed by Cox and Snell for the ML estimator of this parameter. Moreover, the formula of the profile likelihood modified by Barndorff-Nielsen in terms of $L$ is discussed. Such derivations yield two new ML estimators for the parameter $L$, which are compared to the estimators proposed by Anfinsen et al. The performance of these estimators is assessed by means of Monte Carlo experiments, adopting three statistical measures as comparison criterion: the mean square error, the bias, and the coefficient of variation. Equivalently to the simulation study, an application to actual PolSAR data concludes that the proposed estimators outperform all the others in homogeneous scenarios.
△ Less
Submitted 18 April, 2014;
originally announced April 2014.
-
Information Theory and Image Understanding: An Application to Polarimetric SAR Imagery
Authors:
A. C. Frery,
A. D. C. Nascimento,
R. J. Cintra
Abstract:
This work presents a comprehensive examination of the use of information theory for understanding Polarimetric Synthetic Aperture Radar (PolSAR) images by means of contrast measures that can be used as test statistics. Due to the phenomenon called `speckle', common to all images obtained with coherent illumination such as PolSAR imagery, accurate modelling is required in their processing and analy…
▽ More
This work presents a comprehensive examination of the use of information theory for understanding Polarimetric Synthetic Aperture Radar (PolSAR) images by means of contrast measures that can be used as test statistics. Due to the phenomenon called `speckle', common to all images obtained with coherent illumination such as PolSAR imagery, accurate modelling is required in their processing and analysis. The scaled multilook complex Wishart distribution has proven to be a successful approach for modelling radar backscatter from forest and pasture areas. Classification, segmentation, and image analysis techniques which depend on this model have been devised, and many of them employ some kind of dissimilarity measure. Specifically, we introduce statistical tests for analyzing contrast in such images. These tests are based on the chi-square, Kullback-Leibler, Rényi, Bhattacharyya, and Hellinger distances. Results obtained by Monte Carlo experiments reveal the Kullback-Leibler distance as the best one with respect to the empirical test sizes under several situations which include pure and contaminated data. The proposed methodology was applied to actual data, obtained by an E-SAR sensor over surroundings of We$β$ssling, Bavaria, Germany.
△ Less
Submitted 8 February, 2014;
originally announced February 2014.
-
Analytic Expressions for Stochastic Distances Between Relaxed Complex Wishart Distributions
Authors:
Alejandro C. Frery,
Abraão D. C. Nascimento,
Renato J. Cintra
Abstract:
The scaled complex Wishart distribution is a widely used model for multilook full polarimetric SAR data whose adequacy has been attested in the literature. Classification, segmentation, and image analysis techniques which depend on this model have been devised, and many of them employ some type of dissimilarity measure. In this paper we derive analytic expressions for four stochastic distances bet…
▽ More
The scaled complex Wishart distribution is a widely used model for multilook full polarimetric SAR data whose adequacy has been attested in the literature. Classification, segmentation, and image analysis techniques which depend on this model have been devised, and many of them employ some type of dissimilarity measure. In this paper we derive analytic expressions for four stochastic distances between relaxed scaled complex Wishart distributions in their most general form and in important particular cases. Using these distances, inequalities are obtained which lead to new ways of deriving the Bartlett and revised Wishart distances. The expressiveness of the four analytic distances is assessed with respect to the variation of parameters. Such distances are then used for deriving new tests statistics, which are proved to have asymptotic chi-square distribution. Adopting the test size as a comparison criterion, a sensitivity study is performed by means of Monte Carlo experiments suggesting that the Bhattacharyya statistic outperforms all the others. The power of the tests is also assessed. Applications to actual data illustrate the discrimination and homogeneity identification capabilities of these distances.
△ Less
Submitted 19 April, 2013;
originally announced April 2013.
-
Entropy-based Statistical Analysis of PolSAR Data
Authors:
Alejandro C. Frery,
Renato J. Cintra,
Abraão D. C. Nascimento
Abstract:
Images obtained from coherent illumination processes are contaminated with speckle noise, with polarimetric synthetic aperture radar (PolSAR) imagery as a prominent example. With an adequacy widely attested in the literature, the scaled complex Wishart distribution is an acceptable model for PolSAR data. In this perspective, we derive analytic expressions for the Shannon, Rényi, and restricted Tsa…
▽ More
Images obtained from coherent illumination processes are contaminated with speckle noise, with polarimetric synthetic aperture radar (PolSAR) imagery as a prominent example. With an adequacy widely attested in the literature, the scaled complex Wishart distribution is an acceptable model for PolSAR data. In this perspective, we derive analytic expressions for the Shannon, Rényi, and restricted Tsallis entropies under this model. Relationships between the derived measures and the parameters of the scaled Wishart law (i.e., the equivalent number of looks and the covariance matrix) are discussed. In addition, we obtain the asymptotic variances of the Shannon and Rényi entropies when replacing distribution parameters by maximum likelihood estimators. As a consequence, confidence intervals based on these two entropies are also derived and proposed as new ways of capturing contrast. New hypothesis tests are additionally proposed using these results, and their performance is assessed using simulated and real data. In general terms, the test based on the Shannon entropy outperforms those based on Rényi's.
△ Less
Submitted 15 October, 2012;
originally announced October 2012.
-
Hypothesis Testing in Speckled Data with Stochastic Distances
Authors:
Abraão D. C. Nascimento,
Renato J. Cintra,
Alejandro C. Frery
Abstract:
Images obtained with coherent illumination, as is the case of sonar, ultrasound-B, laser and Synthetic Aperture Radar -- SAR, are affected by speckle noise which reduces the ability to extract information from the data. Specialized techniques are required to deal with such imagery, which has been modeled by the G0 distribution and under which regions with different degrees of roughness and mean br…
▽ More
Images obtained with coherent illumination, as is the case of sonar, ultrasound-B, laser and Synthetic Aperture Radar -- SAR, are affected by speckle noise which reduces the ability to extract information from the data. Specialized techniques are required to deal with such imagery, which has been modeled by the G0 distribution and under which regions with different degrees of roughness and mean brightness can be characterized by two parameters; a third parameter, the number of looks, is related to the overall signal-to-noise ratio. Assessing distances between samples is an important step in image analysis; they provide grounds of the separability and, therefore, of the performance of classification procedures. This work derives and compares eight stochastic distances and assesses the performance of hypothesis tests that employ them and maximum likelihood estimation. We conclude that tests based on the triangular distance have the closest empirical size to the theoretical one, while those based on the arithmetic-geometric distances have the best power. Since the power of tests based on the triangular distance is close to optimum, we conclude that the safest choice is using this distance for hypothesis testing, even when compared with classical distances as Kullback-Leibler and Bhattacharyya.
△ Less
Submitted 12 July, 2012;
originally announced July 2012.
-
Parametric and Nonparametric Tests for Speckled Imagery
Authors:
Renato J. Cintra,
Abraão D. C. Nascimento,
Alejandro C. Frery
Abstract:
Synthetic aperture radar (SAR) has a pivotal role as a remote imaging method. Obtained by means of coherent illumination, SAR images are contaminated with speckle noise. The statistical modeling of such contamination is well described according with the multiplicative model and its implied G0 distribution. The understanding of SAR imagery and scene element identification is an important objective…
▽ More
Synthetic aperture radar (SAR) has a pivotal role as a remote imaging method. Obtained by means of coherent illumination, SAR images are contaminated with speckle noise. The statistical modeling of such contamination is well described according with the multiplicative model and its implied G0 distribution. The understanding of SAR imagery and scene element identification is an important objective in the field. In particular, reliable image contrast tools are sought. Aiming the proposition of new tools for evaluating SAR image contrast, we investigated new methods based on stochastic divergence. We propose several divergence measures specifically tailored for G0 distributed data. We also introduce a nonparametric approach based on the Kolmogorov-Smirnov distance for G0 data. We devised and assessed tests based on such measures, and their performances were quantified according to their test sizes and powers. Using Monte Carlo simulation, we present a robustness analysis of test statistics and of maximum likelihood estimators for several degrees of innovative contamination. It was identified that the proposed tests based on triangular and arithmetic-geometric measures outperformed the Kolmogorov-Smirnov methodology.
△ Less
Submitted 10 July, 2012;
originally announced July 2012.