-
Enhancing detection of labor violations in the agricultural sector: A multilevel generalized linear regression model of H-2A violation counts
Authors:
Arezoo Jafari,
Priscila De Azevedo Drummond,
Dominic Nishigaya,
Shawn Bhimani,
Aidong Adam Ding,
Amy Farrell,
Kayse Lee Maass
Abstract:
Agricultural workers are essential to the supply chain for our daily food and yet, many face harmful work conditions, including garnished wages, and other labor violations. Workers on H-2A visas are particularly vulnerable due to the precarity of their immigration status being tied to their employer. Although worksite inspections are one mechanism to detect such violations, many labor violations a…
▽ More
Agricultural workers are essential to the supply chain for our daily food and yet, many face harmful work conditions, including garnished wages, and other labor violations. Workers on H-2A visas are particularly vulnerable due to the precarity of their immigration status being tied to their employer. Although worksite inspections are one mechanism to detect such violations, many labor violations affecting agricultural workers go undetected due to limited inspection resources. In this study, we identify multiple state and industry level factors that correlate with H-2A violations identified by the U.S. Department of Labor Wage and Hour Division using a multilevel zero-inflated negative binomial model. We find that three state-level factors (average farm acreage size, the number of agricultural establishments with less than 20 employees, and higher poverty rates) are correlated with H-2A violations. These findings provide guidance for inspection agencies regarding how to prioritize their limited resources to more effectively inspect agricultural workplaces, thereby improving workplace conditions for H-2A workers.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Statistical Methods for Selective Biomarker Testing
Authors:
A. Adam Ding,
Natalie DelRocco,
Samuel Wu
Abstract:
Biomarker is a critically important tool in modern clinical diagnosis, prognosis, and classification/prediction. However, there are fiscal and analytical barriers to biomarker research. Selective Genoty** is an approach to increasing study power and efficiency where individuals with the most extreme phenotype (response) are chosen for genoty** (exposure) in order to maximize the information in…
▽ More
Biomarker is a critically important tool in modern clinical diagnosis, prognosis, and classification/prediction. However, there are fiscal and analytical barriers to biomarker research. Selective Genoty** is an approach to increasing study power and efficiency where individuals with the most extreme phenotype (response) are chosen for genoty** (exposure) in order to maximize the information in the sample. In this article, we describe an analogous procedure in the biomarker testing landscape where both response and biomarker (exposure) are continuous. We propose an intuitive reverse-regression least squares estimator for the parameters relating biomarker value to response. Monte Carlo simulations show that this method is unbiased and efficient relative to estimates from random sampling when the joint normal distribution assumption is met. We illustrate application of proposed methods on data from a chronic pain clinical trial.
△ Less
Submitted 30 July, 2022;
originally announced August 2022.
-
Finding Dynamics Preserving Adversarial Winning Tickets
Authors:
Xupeng Shi,
Pengfei Zheng,
A. Adam Ding,
Yuan Gao,
Weizhong Zhang
Abstract:
Modern deep neural networks (DNNs) are vulnerable to adversarial attacks and adversarial training has been shown to be a promising method for improving the adversarial robustness of DNNs. Pruning methods have been considered in adversarial context to reduce model capacity and improve adversarial robustness simultaneously in training. Existing adversarial pruning methods generally mimic the classic…
▽ More
Modern deep neural networks (DNNs) are vulnerable to adversarial attacks and adversarial training has been shown to be a promising method for improving the adversarial robustness of DNNs. Pruning methods have been considered in adversarial context to reduce model capacity and improve adversarial robustness simultaneously in training. Existing adversarial pruning methods generally mimic the classical pruning methods for natural training, which follow the three-stage 'training-pruning-fine-tuning' pipelines. We observe that such pruning methods do not necessarily preserve the dynamics of dense networks, making it potentially hard to be fine-tuned to compensate the accuracy degradation in pruning. Based on recent works of \textit{Neural Tangent Kernel} (NTK), we systematically study the dynamics of adversarial training and prove the existence of trainable sparse sub-network at initialization which can be trained to be adversarial robust from scratch. This theoretically verifies the \textit{lottery ticket hypothesis} in adversarial context and we refer such sub-network structure as \textit{Adversarial Winning Ticket} (AWT). We also show empirical evidences that AWT preserves the dynamics of adversarial training and achieve equal performance as dense adversarial training.
△ Less
Submitted 6 March, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Real-time privacy preserving disease diagnosis using ECG signal
Authors:
Guanhong Miao,
A. Adam Ding,
Samuel S. Wu
Abstract:
The rapid development in Internet of Medical Things (IoMT) boosts the opportunity for real-time health monitoring using various data types such as electroencephalography (EEG) and electrocardiography (ECG). Security issues have significantly impeded the e-healthcare system implementation. Three important challenges for privacy preserving system need to be addressed: accurate diagnosis, privacy pro…
▽ More
The rapid development in Internet of Medical Things (IoMT) boosts the opportunity for real-time health monitoring using various data types such as electroencephalography (EEG) and electrocardiography (ECG). Security issues have significantly impeded the e-healthcare system implementation. Three important challenges for privacy preserving system need to be addressed: accurate diagnosis, privacy protection without compromising accuracy, and computation efficiency. It is essential to guarantee prediction accuracy since disease diagnosis is strongly related to health and life. By implementing matrix encryption method, we propose a real-time disease diagnosis scheme using support vector machine (SVM). A biomedical signal provided by the client is diagnosed such that the server does not get any information about the signal as well as the final result of the diagnosis while the proposed scheme also achieves confidentiality of the SVM classifier and the server's medical data. The proposed scheme has no accuracy degradation. Experiments on real-world data illustrate the high efficiency of the proposed scheme. It takes less than 1 second to derive the disease diagnosis result using a device with 4Gb RAMs, suggesting the feasibility to implement real-time privacy preserving health monitoring.
△ Less
Submitted 22 March, 2023; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Linear Model Against Malicious Adversaries with Local Differential Privacy
Authors:
Guanhong Miao,
A. Adam Ding,
Samuel S. Wu
Abstract:
Scientific collaborations benefit from collaborative learning of distributed sources, but remain difficult to achieve when data are sensitive. In recent years, privacy preserving techniques have been widely studied to analyze distributed data across different agencies while protecting sensitive information. Most existing privacy preserving techniques are designed to resist semi-honest adversaries…
▽ More
Scientific collaborations benefit from collaborative learning of distributed sources, but remain difficult to achieve when data are sensitive. In recent years, privacy preserving techniques have been widely studied to analyze distributed data across different agencies while protecting sensitive information. Most existing privacy preserving techniques are designed to resist semi-honest adversaries and require intense computation to perform data analysis. Secure collaborative learning is significantly difficult with the presence of malicious adversaries who may deviates from the secure protocol. Another challenge is to maintain high computation efficiency with privacy protection. In this paper, matrix encryption is applied to encrypt data such that the secure schemes are against malicious adversaries, including chosen plaintext attack, known plaintext attack, and collusion attack. The encryption scheme also achieves local differential privacy. Moreover, cross validation is studied to prevent overfitting without additional communication cost. Empirical experiments on real-world datasets demonstrate that the proposed schemes are computationally efficient compared to existing techniques against malicious adversary and semi-honest model.
△ Less
Submitted 28 June, 2022; v1 submitted 4 February, 2022;
originally announced February 2022.
-
Reducing Noise Level in Differential Privacy through Matrix Masking
Authors:
A. Adam Ding,
Samuel S. Wu,
Guanhong Miao,
Shigang Chen
Abstract:
Differential privacy schemes have been widely adopted in recent years to address issues of data privacy protection. We propose a new Gaussian scheme combining with another data protection technique, called random orthogonal matrix masking, to achieve $(\varepsilon, δ)$-differential privacy (DP) more efficiently. We prove that the additional matrix masking significantly reduces the rate of noise va…
▽ More
Differential privacy schemes have been widely adopted in recent years to address issues of data privacy protection. We propose a new Gaussian scheme combining with another data protection technique, called random orthogonal matrix masking, to achieve $(\varepsilon, δ)$-differential privacy (DP) more efficiently. We prove that the additional matrix masking significantly reduces the rate of noise variance required in the Gaussian scheme to achieve $(\varepsilon, δ)-$DP in big data setting. Specifically, when $\varepsilon \to 0$, $δ\to 0$, and the sample size $n$ exceeds the number $p$ of attributes by $(n-p)=O(ln(1/δ))$, the required additive noise variance to achieve $(\varepsilon, δ)$-DP is reduced from $O(ln(1/δ)/\varepsilon^2)$ to $O(1/\varepsilon)$. With much less noise added, the resulting differential privacy protected pseudo data sets allow much more accurate inferences, thus can significantly improve the scope of application for differential privacy.
△ Less
Submitted 11 April, 2023; v1 submitted 11 January, 2022;
originally announced January 2022.
-
Understanding and Quantifying Adversarial Examples Existence in Linear Classification
Authors:
Xupeng Shi,
A. Adam Ding
Abstract:
State-of-art deep neural networks (DNN) are vulnerable to attacks by adversarial examples: a carefully designed small perturbation to the input, that is imperceptible to human, can mislead DNN. To understand the root cause of adversarial examples, we quantify the probability of adversarial example existence for linear classifiers. Previous mathematical definition of adversarial examples only invol…
▽ More
State-of-art deep neural networks (DNN) are vulnerable to attacks by adversarial examples: a carefully designed small perturbation to the input, that is imperceptible to human, can mislead DNN. To understand the root cause of adversarial examples, we quantify the probability of adversarial example existence for linear classifiers. Previous mathematical definition of adversarial examples only involves the overall perturbation amount, and we propose a more practical relevant definition of strong adversarial examples that separately limits the perturbation along the signal direction also. We show that linear classifiers can be made robust to strong adversarial examples attack in cases where no adversarial robust linear classifiers exist under the previous definition. The quantitative formulas are confirmed by numerical experiments using a linear support vector machine (SVM) classifier. The results suggest that designing general strong-adversarial-robust learning systems is feasible but only through incorporating human knowledge of the underlying classification problem.
△ Less
Submitted 26 October, 2019;
originally announced October 2019.
-
Copula Correlation: An Equitable Dependence Measure and Extension of Pearson's Correlation
Authors:
A. Adam Ding,
Yi Li
Abstract:
In Science, Reshef et al. (2011) proposed the concept of equitability for measures of dependence between two random variables. To this end, they proposed a novel measure, the maximal information coefficient (MIC). Recently a PNAS paper (Kinney and Atwal, 2014) gave a mathematical definition for equitability. They proved that MIC in fact is not equitable, while a fundamental information theoretic m…
▽ More
In Science, Reshef et al. (2011) proposed the concept of equitability for measures of dependence between two random variables. To this end, they proposed a novel measure, the maximal information coefficient (MIC). Recently a PNAS paper (Kinney and Atwal, 2014) gave a mathematical definition for equitability. They proved that MIC in fact is not equitable, while a fundamental information theoretic measure, the mutual information (MI), is self-equitable. In this paper, we show that MI also does not correctly reflect the proportion of deterministic signals hidden in noisy data. We propose a new equitability definition based on this scenario. The copula correlation (Ccor), based on the L1-distance of copula density, is shown to be equitable under both definitions. We also prove theoretically that Ccor is much easier to estimate than MI. Numerical studies illustrate the properties of the measures.
△ Less
Submitted 13 August, 2015; v1 submitted 27 December, 2013;
originally announced December 2013.