Search | arXiv e-print repository

Quantifying and Mitigating Privacy Risks for Tabular Generative Models

Authors: Chaoyi Zhu, Jiayi Tang, Hans Brouwer, Juan F. Pérez, Marten van Dijk, Lydia Y. Chen

Abstract: Synthetic data from generative models emerges as the privacy-preserving data-sharing solution. Such a synthetic data set shall resemble the original data without revealing identifiable private information. The backbone technology of tabular synthesizers is rooted in image generative models, ranging from Generative Adversarial Networks (GANs) to recent diffusion models. Recent prior work sheds ligh… ▽ More Synthetic data from generative models emerges as the privacy-preserving data-sharing solution. Such a synthetic data set shall resemble the original data without revealing identifiable private information. The backbone technology of tabular synthesizers is rooted in image generative models, ranging from Generative Adversarial Networks (GANs) to recent diffusion models. Recent prior work sheds light on the utility-privacy tradeoff on tabular data, revealing and quantifying privacy risks on synthetic data. We first conduct an exhaustive empirical analysis, highlighting the utility-privacy tradeoff of five state-of-the-art tabular synthesizers, against eight privacy attacks, with a special focus on membership inference attacks. Motivated by the observation of high data quality but also high privacy risk in tabular diffusion, we propose DP-TLDM, Differentially Private Tabular Latent Diffusion Model, which is composed of an autoencoder network to encode the tabular data and a latent diffusion model to synthesize the latent tables. Following the emerging f-DP framework, we apply DP-SGD to train the auto-encoder in combination with batch clip** and use the separation value as the privacy metric to better capture the privacy gain from DP algorithms. Our empirical evaluation demonstrates that DP-TLDM is capable of achieving a meaningful theoretical privacy guarantee while also significantly enhancing the utility of synthetic data. Specifically, compared to other DP-protected tabular generative models, DP-TLDM improves the synthetic quality by an average of 35% in data resemblance, 15% in the utility for downstream tasks, and 50% in data discriminability, all while preserving a comparable level of privacy risk. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2312.08086 [pdf, other]

Recursive Augmented Fernet (RAF) Token: Alleviating the Pain of Stolen Tokens

Authors: Reza Rahaeimehr, Marten van Dijk

Abstract: A robust authentication and authorization mechanism is imperative in modular system development, where modularity and modular thinking are pivotal. Traditional systems often employ identity modules responsible for authentication and token issuance. Tokens, representing user credentials, offer advantages such as reduced reliance on passwords, limited lifespan, and scoped access. Despite these benef… ▽ More A robust authentication and authorization mechanism is imperative in modular system development, where modularity and modular thinking are pivotal. Traditional systems often employ identity modules responsible for authentication and token issuance. Tokens, representing user credentials, offer advantages such as reduced reliance on passwords, limited lifespan, and scoped access. Despite these benefits, the "bearer token" problem persists, leaving systems vulnerable to abuse if tokens are compromised. We propose a token-based authentication mechanism addressing modular systems' critical bearer token problem. The proposed mechanism includes a novel RAF (Recursive Augmented Fernet) token, a blacklist component, and a policy enforcer component. RAF tokens are one-time-use tokens, like tickets. They carry commands, and the receiver of an RAF token can issue new tokens using the received RAF token. The blacklist component guarantees an RAF token can not be approved more than once, and the policy enforcer checks the compatibility of commands carried by an RAF token. We introduce two variations of RAF tokens: User-tied RAF, offering simplicity and compatibility, and Fully-tied RAF, providing enhanced security through service-specific secret keys. We thoroughly discuss the security guarantees, technical definitions, and construction of RAF tokens backed by game-based proofs. We demonstrate a proof of concept in the context of OpenStack, involving modifications to Keystone and creating an RAFT library. The experimental results reveal minimal overhead in typical scenarios, establishing the practicality and effectiveness of RAF. Our experiments show that the RAF mechanism beats the idea of using short-life Fernet tokens while providing much better security. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.01256 [pdf, other]

Breaking XOR Arbiter PUFs without Reliability Information

Authors: Niloufar Sayadi, Phuong Ha Nguyen, Marten van Dijk, Chenglu **

Abstract: Unreliable XOR Arbiter PUFs were broken by a machine learning attack, which targets the underlying Arbiter PUFs individually. However, reliability information from the PUF was required for this attack. We show that, for the first time, a perfectly reliable XOR Arbiter PUF, where no reliability information is accessible, can be efficiently attacked in the same divide-and-conquer manner. Our key i… ▽ More Unreliable XOR Arbiter PUFs were broken by a machine learning attack, which targets the underlying Arbiter PUFs individually. However, reliability information from the PUF was required for this attack. We show that, for the first time, a perfectly reliable XOR Arbiter PUF, where no reliability information is accessible, can be efficiently attacked in the same divide-and-conquer manner. Our key insight is that the responses of correlated challenges also reveal their distance to the decision boundary. This leads to a chosen challenge attack on XOR Arbiter PUFs. The effectiveness of our attack is confirmed through PUF simulation and FPGA implementation. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2310.20328 [pdf, other]

ChiSCor: A Corpus of Freely Told Fantasy Stories by Dutch Children for Computational Linguistics and Cognitive Science

Authors: Bram M. A. van Dijk, Max J. van Duijn, Suzan Verberne, Marco R. Spruit

Abstract: In this resource paper we release ChiSCor, a new corpus containing 619 fantasy stories, told freely by 442 Dutch children aged 4-12. ChiSCor was compiled for studying how children render character perspectives, and unravelling language and cognition in development, with computational tools. Unlike existing resources, ChiSCor's stories were produced in natural contexts, in line with recent calls fo… ▽ More In this resource paper we release ChiSCor, a new corpus containing 619 fantasy stories, told freely by 442 Dutch children aged 4-12. ChiSCor was compiled for studying how children render character perspectives, and unravelling language and cognition in development, with computational tools. Unlike existing resources, ChiSCor's stories were produced in natural contexts, in line with recent calls for more ecologically valid datasets. ChiSCor hosts text, audio, and annotations for character complexity and linguistic complexity. Additional metadata (e.g. education of caregivers) is available for one third of the Dutch children. ChiSCor also includes a small set of 62 English stories. This paper details how ChiSCor was compiled and shows its potential for future work with three brief case studies: i) we show that the syntactic complexity of stories is strikingly stable across children's ages; ii) we extend work on Zipfian distributions in free speech and show that ChiSCor obeys Zipf's law closely, reflecting its social context; iii) we show that even though ChiSCor is relatively small, the corpus is rich enough to train informative lemma vectors that allow us to analyse children's language use. We end with a reflection on the value of narrative datasets in computational linguistics. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 12 pages, 5 figures, forthcoming in Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)

arXiv:2310.20320 [pdf, other]

Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests

Authors: Max J. van Duijn, Bram M. A. van Dijk, Tom Kouwenhoven, Werner de Valk, Marco R. Spruit, Peter van der Putten

Abstract: To what degree should we ascribe cognitive capacities to Large Language Models (LLMs), such as the ability to reason about intentions and beliefs known as Theory of Mind (ToM)? Here we add to this emerging debate by (i) testing 11 base- and instruction-tuned LLMs on capabilities relevant to ToM beyond the dominant false-belief paradigm, including non-literal language usage and recursive intentiona… ▽ More To what degree should we ascribe cognitive capacities to Large Language Models (LLMs), such as the ability to reason about intentions and beliefs known as Theory of Mind (ToM)? Here we add to this emerging debate by (i) testing 11 base- and instruction-tuned LLMs on capabilities relevant to ToM beyond the dominant false-belief paradigm, including non-literal language usage and recursive intentionality; (ii) using newly rewritten versions of standardized tests to gauge LLMs' robustness; (iii) prompting and scoring for open besides closed questions; and (iv) benchmarking LLM performance against that of children aged 7-10 on the same tasks. We find that instruction-tuned LLMs from the GPT family outperform other models, and often also children. Base-LLMs are mostly unable to solve ToM tasks, even with specialized prompting. We suggest that the interlinked evolution and development of language and ToM may help explain what instruction-tuning adds: rewarding cooperative communication that takes into account interlocutor and context. We conclude by arguing for a nuanced perspective on ToM in LLMs. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 14 pages, 4 figures, Forthcoming in Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)

arXiv:2310.19671 [pdf, other]

Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding

Authors: Bram M. A. van Dijk, Tom Kouwenhoven, Marco R. Spruit, Max J. van Duijn

Abstract: Current Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. LLMs are appearing rapidly, and debates on LLM capacities have taken off, but reflection is lagging behind. Thus, in this position paper, we first zoom in on the debate and critically assess three points recurring in critiques of LLM capacities: i) that LLMs only parrot statistica… ▽ More Current Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. LLMs are appearing rapidly, and debates on LLM capacities have taken off, but reflection is lagging behind. Thus, in this position paper, we first zoom in on the debate and critically assess three points recurring in critiques of LLM capacities: i) that LLMs only parrot statistical patterns in the training data; ii) that LLMs master formal but not functional language competence; and iii) that language learning in LLMs cannot inform human language learning. Drawing on empirical and theoretical arguments, we show that these points need more nuance. Second, we outline a pragmatic perspective on the issue of `real' understanding and intentionality in LLMs. Understanding and intentionality pertain to unobservable mental states we attribute to other humans because they have pragmatic value: they allow us to abstract away from complex underlying mechanics and predict behaviour effectively. We reflect on the circumstances under which it would make sense for humans to similarly attribute mental states to LLMs, thereby outlining a pragmatic philosophical context for LLMs as an increasingly prominent technology in society. △ Less

Submitted 31 October, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: 15 pages, 0 figures, Forthcoming in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

arXiv:2307.11939 [pdf, other]

Batch Clip** and Adaptive Layerwise Clip** for Differential Private Stochastic Gradient Descent

Authors: Toan N. Nguyen, Phuong Ha Nguyen, Lam M. Nguyen, Marten Van Dijk

Abstract: Each round in Differential Private Stochastic Gradient Descent (DPSGD) transmits a sum of clipped gradients obfuscated with Gaussian noise to a central server which uses this to update a global model which often represents a deep neural network. Since the clipped gradients are computed separately, which we call Individual Clip** (IC), deep neural networks like resnet-18 cannot use Batch Normaliz… ▽ More Each round in Differential Private Stochastic Gradient Descent (DPSGD) transmits a sum of clipped gradients obfuscated with Gaussian noise to a central server which uses this to update a global model which often represents a deep neural network. Since the clipped gradients are computed separately, which we call Individual Clip** (IC), deep neural networks like resnet-18 cannot use Batch Normalization Layers (BNL) which is a crucial component in deep neural networks for achieving a high accuracy. To utilize BNL, we introduce Batch Clip** (BC) where, instead of clip** single gradients as in the orginal DPSGD, we average and clip batches of gradients. Moreover, the model entries of different layers have different sensitivities to the added Gaussian noise. Therefore, Adaptive Layerwise Clip** methods (ALC), where each layer has its own adaptively finetuned clip** constant, have been introduced and studied, but so far without rigorous DP proofs. In this paper, we propose {\em a new ALC and provide rigorous DP proofs for both BC and ALC}. Experiments show that our modified DPSGD with BC and ALC for CIFAR-$10$ with resnet-$18$ converges while DPSGD with IC and ALC does not. △ Less

Submitted 21 July, 2023; originally announced July 2023.

Comments: 20 pages, 18 Figures

arXiv:2303.04676 [pdf, ps, other]

Considerations on the Theory of Training Models with Differential Privacy

Authors: Marten van Dijk, Phuong Ha Nguyen

Abstract: In federated learning collaborative learning takes place by a set of clients who each want to remain in control of how their local training data is used, in particular, how can each client's local training data remain private? Differential privacy is one method to limit privacy leakage. We provide a general overview of its framework and provable properties, adopt the more recent hypothesis based d… ▽ More In federated learning collaborative learning takes place by a set of clients who each want to remain in control of how their local training data is used, in particular, how can each client's local training data remain private? Differential privacy is one method to limit privacy leakage. We provide a general overview of its framework and provable properties, adopt the more recent hypothesis based definition called Gaussian DP or $f$-DP, and discuss Differentially Private Stochastic Gradient Descent (DP-SGD). We stay at a meta level and attempt intuitive explanations and insights \textit{in this book chapter}. △ Less

Submitted 16 July, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: 18 pages, a book chapter. arXiv admin note: text overlap with arXiv:2212.05796

arXiv:2303.00064 [pdf]

doi 10.5334/jors.454

WEARDA: Recording Wearable Sensor Data for Human Activity Monitoring

Authors: Richard M. K. van Dijk, Daniela Gawehns, Matthijs van Leeuwen

Abstract: We present WEARDA, the open source WEARable sensor Data Acquisition software package. WEARDA facilitates the acquisition of human activity data with smartwatches and is primarily aimed at researchers who require transparency, full control, and access to raw sensor data. It provides functionality to simultaneously record raw data from four sensors -- tri-axis accelerometer, tri-axis gyroscope, baro… ▽ More We present WEARDA, the open source WEARable sensor Data Acquisition software package. WEARDA facilitates the acquisition of human activity data with smartwatches and is primarily aimed at researchers who require transparency, full control, and access to raw sensor data. It provides functionality to simultaneously record raw data from four sensors -- tri-axis accelerometer, tri-axis gyroscope, barometer, and GPS -- which should enable researchers to, for example, estimate energy expenditure and mine movement trajectories. A Samsung smartwatch running the Tizen OS was chosen because of 1) the required functionalities of the smartwatch software API, 2) the availability of software development tools and accessible documentation, 3) having the required sensors, and 4) the requirements on case design for acceptance by the target user group. WEARDA addresses five practical challenges concerning preparation, measurement, logistics, privacy preservation, and reproducibility to ensure efficient and errorless data collection. The software package was initially created for the project "Dementia back at the heart of the community", and has been successfully used in that context. △ Less

Submitted 30 October, 2023; v1 submitted 28 February, 2023; originally announced March 2023.

Comments: Submitted 20 January 2023; Accepted 6 July 2023; Published 26 October 2023 by the Journal of Open Research Software JORS, 11 pages, 5 figures, 3 tables

Report number: van Dijk RMK, Gawehns D, van Leeuwen M 2023 WEARDA: Recording Wearable Sensor Data for Human Activity Monitoring. Journal of Open Research Software, 11:13

Journal ref: van Dijk RMK, Gawehns D, van Leeuwen M 2023 WEARDA: Recording Wearable Sensor Data for Human Activity Monitoring. Journal of Open Research Software, 11:13

arXiv:2212.05796 [pdf, other]

Generalizing DP-SGD with Shuffling and Batch Clip**

Authors: Marten van Dijk, Phuong Ha Nguyen, Toan N. Nguyen, Lam M. Nguyen

Abstract: Classical differential private DP-SGD implements individual clip** with random subsampling, which forces a mini-batch SGD approach. We provide a general differential private algorithmic framework that goes beyond DP-SGD and allows any possible first order optimizers (e.g., classical SGD and momentum based SGD approaches) in combination with batch clip**, which clips an aggregate of computed gr… ▽ More Classical differential private DP-SGD implements individual clip** with random subsampling, which forces a mini-batch SGD approach. We provide a general differential private algorithmic framework that goes beyond DP-SGD and allows any possible first order optimizers (e.g., classical SGD and momentum based SGD approaches) in combination with batch clip**, which clips an aggregate of computed gradients rather than summing clipped gradients (as is done in individual clip**). The framework also admits sampling techniques beyond random subsampling such as shuffling. Our DP analysis follows the $f$-DP approach and introduces a new proof technique which allows us to derive simple closed form expressions and to also analyse group privacy. In particular, for $E$ epochs work and groups of size $g$, we show a $\sqrt{g E}$ DP dependency for batch clip** with shuffling. △ Less

Submitted 25 July, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

Comments: Update disclaimers

arXiv:2211.14669 [pdf, other]

Game Theoretic Mixed Experts for Combinational Adversarial Machine Learning

Authors: Ethan Rathbun, Kaleel Mahmood, Sohaib Ahmad, Caiwen Ding, Marten van Dijk

Abstract: Recent advances in adversarial machine learning have shown that defenses considered to be robust are actually susceptible to adversarial attacks which are specifically customized to target their weaknesses. These defenses include Barrage of Random Transforms (BaRT), Friendly Adversarial Training (FAT), Trash is Treasure (TiT) and ensemble models made up of Vision Transformers (ViTs), Big Transfer… ▽ More Recent advances in adversarial machine learning have shown that defenses considered to be robust are actually susceptible to adversarial attacks which are specifically customized to target their weaknesses. These defenses include Barrage of Random Transforms (BaRT), Friendly Adversarial Training (FAT), Trash is Treasure (TiT) and ensemble models made up of Vision Transformers (ViTs), Big Transfer models and Spiking Neural Networks (SNNs). We first conduct a transferability analysis, to demonstrate the adversarial examples generated by customized attacks on one defense, are not often misclassified by another defense. This finding leads to two important questions. First, how can the low transferability between defenses be utilized in a game theoretic framework to improve the robustness? Second, how can an adversary within this framework develop effective multi-model attacks? In this paper, we provide a game-theoretic framework for ensemble adversarial attacks and defenses. Our framework is called Game theoretic Mixed Experts (GaME). It is designed to find the Mixed-Nash strategy for both a detector based and standard defender, when facing an attacker employing compositional adversarial attacks. We further propose three new attack algorithms, specifically designed to target defenses with randomized transformations, multi-model voting schemes, and adversarial detector architectures. These attacks serve to both strengthen defenses generated by the GaME framework and verify their robustness against unforeseen attacks. Overall, our framework and analyses advance the field of adversarial machine learning by yielding new insights into compositional attack and defense formulations. △ Less

Submitted 29 April, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

Comments: 17pages, 10 figures

ACM Class: I.2; I.4

arXiv:2207.06193 [pdf, other]

Domain adaptation strategies for cancer-independent detection of lymph node metastases

Authors: Péter Bándi, Maschenka Balkenhol, Marcory van Dijk, Bram van Ginneken, Jeroen van der Laak, Geert Litjens

Abstract: Recently, large, high-quality public datasets have led to the development of convolutional neural networks that can detect lymph node metastases of breast cancer at the level of expert pathologists. Many cancers, regardless of the site of origin, can metastasize to lymph nodes. However, collecting and annotating high-volume, high-quality datasets for every cancer type is challenging. In this paper… ▽ More Recently, large, high-quality public datasets have led to the development of convolutional neural networks that can detect lymph node metastases of breast cancer at the level of expert pathologists. Many cancers, regardless of the site of origin, can metastasize to lymph nodes. However, collecting and annotating high-volume, high-quality datasets for every cancer type is challenging. In this paper we investigate how to leverage existing high-quality datasets most efficiently in multi-task settings for closely related tasks. Specifically, we will explore different training and domain adaptation strategies, including prevention of catastrophic forgetting, for colon and head-and-neck cancer metastasis detection in lymph nodes. Our results show state-of-the-art performance on both cancer metastasis detection tasks. Furthermore, we show the effectiveness of repeated adaptation of networks from one cancer type to another to obtain multi-task metastasis detection networks. Last, we show that leveraging existing high-quality datasets can significantly boost performance on new target tasks and that catastrophic forgetting can be effectively mitigated using regularization. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2202.03524 [pdf, ps, other]

Finite-Sum Optimization: A New Perspective for Convergence to a Global Solution

Authors: Lam M. Nguyen, Trang H. Tran, Marten van Dijk

Abstract: Deep neural networks (DNNs) have shown great success in many machine learning tasks. Their training is challenging since the loss surface of the network architecture is generally non-convex, or even non-smooth. How and under what assumptions is guaranteed convergence to a \textit{global} minimum possible? We propose a reformulation of the minimization problem allowing for a new recursive algorithm… ▽ More Deep neural networks (DNNs) have shown great success in many machine learning tasks. Their training is challenging since the loss surface of the network architecture is generally non-convex, or even non-smooth. How and under what assumptions is guaranteed convergence to a \textit{global} minimum possible? We propose a reformulation of the minimization problem allowing for a new recursive algorithmic framework. By using bounded style assumptions, we prove convergence to an $\varepsilon$-(global) minimum using $\mathcal{\tilde{O}}(1/\varepsilon^3)$ gradient computations. Our theoretical foundation motivates further study, implementation, and optimization of the new algorithmic framework and further investigation of its non-standard bounded style assumptions. This new direction broadens our understanding of why and under what circumstances training of a DNN converges to a global minimum. △ Less

Submitted 7 February, 2022; originally announced February 2022.

arXiv:2201.01834 [pdf, other]

Secure Remote Attestation with Strong Key Insulation Guarantees

Authors: Deniz Gurevin, Chenglu **, Phuong Ha Nguyen, Omer Khan, Marten van Dijk

Abstract: Recent years have witnessed a trend of secure processor design in both academia and industry. Secure processors with hardware-enforced isolation can be a solid foundation of cloud computation in the future. However, due to recent side-channel attacks, the commercial secure processors failed to deliver the promises of a secure isolated execution environment. Sensitive information inside the secure… ▽ More Recent years have witnessed a trend of secure processor design in both academia and industry. Secure processors with hardware-enforced isolation can be a solid foundation of cloud computation in the future. However, due to recent side-channel attacks, the commercial secure processors failed to deliver the promises of a secure isolated execution environment. Sensitive information inside the secure execution environment always gets leaked via side channels. This work considers the most powerful software-based side-channel attackers, i.e., an All Digital State Observing (ADSO) adversary who can observe all digital states, including all digital states in secure enclaves. Traditional signature schemes are not secure in ADSO adversarial model. We introduce a new cryptographic primitive called One-Time Signature with Secret Key Exposure (OTS-SKE), which ensures no one can forge a valid signature of a new message or nonce even if all secret session keys are leaked. OTS-SKE enables us to sign attestation reports securely under the ADSO adversary. We also minimize the trusted computing base by introducing a secure co-processor into the system, and the interaction between the secure co-processor and the attestation processor is unidirectional. That is, the co-processor takes no inputs from the processor and only generates secret keys for the processor to fetch. Our experimental results show that the signing of OTS-SKE is faster than that of Elliptic Curve Digital Signature Algorithm (ECDSA) used in Intel SGX. △ Less

Submitted 5 January, 2022; originally announced January 2022.

arXiv:2109.15031 [pdf, other]

Back in Black: A Comparative Evaluation of Recent State-Of-The-Art Black-Box Attacks

Authors: Kaleel Mahmood, Rigel Mahmood, Ethan Rathbun, Marten van Dijk

Abstract: The field of adversarial machine learning has experienced a near exponential growth in the amount of papers being produced since 2018. This massive information output has yet to be properly processed and categorized. In this paper, we seek to help alleviate this problem by systematizing the recent advances in adversarial machine learning black-box attacks since 2019. Our survey summarizes and cate… ▽ More The field of adversarial machine learning has experienced a near exponential growth in the amount of papers being produced since 2018. This massive information output has yet to be properly processed and categorized. In this paper, we seek to help alleviate this problem by systematizing the recent advances in adversarial machine learning black-box attacks since 2019. Our survey summarizes and categorizes 20 recent black-box attacks. We also present a new analysis for understanding the attack success rate with respect to the adversarial model used in each paper. Overall, our paper surveys a wide body of literature to highlight recent attack developments and organizes them into four attack categories: score based attacks, decision based attacks, transfer attacks and non-traditional attacks. Further, we provide a new mathematical framework to show exactly how attack results can fairly be compared. △ Less

Submitted 29 September, 2021; originally announced September 2021.

arXiv:2104.02610 [pdf, other]

On the Robustness of Vision Transformers to Adversarial Examples

Authors: Kaleel Mahmood, Rigel Mahmood, Marten van Dijk

Abstract: Recent advances in attention-based networks have shown that Vision Transformers can achieve state-of-the-art or near state-of-the-art results on many image classification tasks. This puts transformers in the unique position of being a promising alternative to traditional convolutional neural networks (CNNs). While CNNs have been carefully studied with respect to adversarial attacks, the same canno… ▽ More Recent advances in attention-based networks have shown that Vision Transformers can achieve state-of-the-art or near state-of-the-art results on many image classification tasks. This puts transformers in the unique position of being a promising alternative to traditional convolutional neural networks (CNNs). While CNNs have been carefully studied with respect to adversarial attacks, the same cannot be said of Vision Transformers. In this paper, we study the robustness of Vision Transformers to adversarial examples. Our analyses of transformer security is divided into three parts. First, we test the transformer under standard white-box and black-box attacks. Second, we study the transferability of adversarial examples between CNNs and transformers. We show that adversarial examples do not readily transfer between CNNs and transformers. Based on this finding, we analyze the security of a simple ensemble defense of CNNs and transformers. By creating a new attack, the self-attention blended gradient attack, we show that such an ensemble is not secure under a white-box adversary. However, under a black-box adversary, we show that an ensemble can achieve unprecedented robustness without sacrificing clean accuracy. Our analysis for this work is done using six types of white-box attacks and two types of black-box attacks. Our study encompasses multiple Vision Transformers, Big Transfer Models and CNN architectures trained on CIFAR-10, CIFAR-100 and ImageNet. △ Less

Submitted 4 June, 2021; v1 submitted 30 March, 2021; originally announced April 2021.

arXiv:2102.09030 [pdf, other]

Proactive DP: A Multple Target Optimization Framework for DP-SGD

Authors: Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen, Lam M. Nguyen, Phuong Ha Nguyen

Abstract: We introduce a multiple target optimization framework for DP-SGD referred to as pro-active DP. In contrast to traditional DP accountants, which are used to track the expenditure of privacy budgets, the pro-active DP scheme allows one to a-priori select parameters of DP-SGD based on a fixed privacy budget (in terms of $ε$ and $δ$) in such a way to optimize the anticipated utility (test accuracy) th… ▽ More We introduce a multiple target optimization framework for DP-SGD referred to as pro-active DP. In contrast to traditional DP accountants, which are used to track the expenditure of privacy budgets, the pro-active DP scheme allows one to a-priori select parameters of DP-SGD based on a fixed privacy budget (in terms of $ε$ and $δ$) in such a way to optimize the anticipated utility (test accuracy) the most. To achieve this objective, we first propose significant improvements to the moment account method, presenting a closed-form $(ε,δ)$-DP guarantee that connects all parameters in the DP-SGD setup. We show that DP-SGD is $(ε<0.5,δ=1/N)$-DP if $σ=\sqrt{2(ε+\ln(1/δ))/ε}$ with $T$ at least $\approx 2k^2/ε$ and $(2/e)^2k^2-1/2\geq \ln(N)$, where $T$ is the total number of rounds, and $K=kN$ is the total number of gradient computations where $k$ measures $K$ in number of epochs of size $N$ of the local data set. We prove that our expression is close to tight in that if $T$ is more than a constant factor $\approx 4$ smaller than the lower bound $\approx 2k^2/ε$, then the $(ε,δ)$-DP guarantee is violated. The above DP guarantee can be enhanced in thatDP-SGD is $(ε, δ)$-DP if $σ= \sqrt{2(ε+\ln(1/δ))/ε}$ with $T$ at least $\approx 2k^2/ε$ together with two additional, less intuitive, conditions that allow larger $ε\geq 0.5$. Our DP theory allows us to create a utility graph and DP calculator. These tools link privacy and utility objectives and search for optimal experiment setups, efficiently taking into account both accuracy and privacy objectives, as well as implementation goals. We furnish a comprehensive implementation flow of our proactive DP, with rigorous experiments to showcase the proof-of-concept. △ Less

Submitted 4 June, 2024; v1 submitted 17 February, 2021; originally announced February 2021.

Comments: arXiv admin note: text overlap with arXiv:2007.09208, changes in contents and title

arXiv:2010.14763 [pdf, other]

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Authors: Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen, Lam M. Nguyen, Quoc Tran-Dinh, Phuong Ha Nguyen

Abstract: Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computation… ▽ More Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show $O(\sqrt{K}$) communication rounds for heterogeneous data for strongly convex problems, where $K$ is the total number of gradient computations across all local compute nodes. For our scheme, we prove a \textit{tight} and novel non-trivial convergence analysis for strongly convex problems for {\em heterogeneous} data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets. △ Less

Submitted 26 February, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2007.09208 AISTATS 2021

arXiv:2007.09208 [pdf, other]

Asynchronous Federated Learning with Reduced Number of Rounds and with Differential Privacy from Less Aggregated Gaussian Noise

Authors: Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen, Lam M. Nguyen, Quoc Tran-Dinh, Phuong Ha Nguyen

Abstract: The feasibility of federated learning is highly constrained by the server-clients infrastructure in terms of network communication. Most newly launched smartphones and IoT devices are equipped with GPUs or sufficient computing hardware to run powerful AI models. However, in case of the original synchronous federated learning, client devices suffer waiting times and regular communication between cl… ▽ More The feasibility of federated learning is highly constrained by the server-clients infrastructure in terms of network communication. Most newly launched smartphones and IoT devices are equipped with GPUs or sufficient computing hardware to run powerful AI models. However, in case of the original synchronous federated learning, client devices suffer waiting times and regular communication between clients and server is required. This implies more sensitivity to local model training times and irregular or missed updates, hence, less or limited scalability to large numbers of clients and convergence rates measured in real time will suffer. We propose a new algorithm for asynchronous federated learning which eliminates waiting times and reduces overall network communication - we provide rigorous theoretical analysis for strongly convex objective functions and provide simulation results. By adding Gaussian noise we show how our algorithm can be made differentially private -- new theorems show how the aggregated added Gaussian noise is significantly reduced. △ Less

Submitted 17 July, 2020; originally announced July 2020.

arXiv:2006.10876 [pdf, other]

doi 10.3390/e23101359

Beware the Black-Box: on the Robustness of Recent Defenses to Adversarial Examples

Authors: Kaleel Mahmood, Deniz Gurevin, Marten van Dijk, Phuong Ha Nguyen

Abstract: Many defenses have recently been proposed at venues like NIPS, ICML, ICLR and CVPR. These defenses are mainly focused on mitigating white-box attacks. They do not properly examine black-box attacks. In this paper, we expand upon the analysis of these defenses to include adaptive black-box adversaries. Our evaluation is done on nine defenses including Barrage of Random Transforms, ComDefend, Ensemb… ▽ More Many defenses have recently been proposed at venues like NIPS, ICML, ICLR and CVPR. These defenses are mainly focused on mitigating white-box attacks. They do not properly examine black-box attacks. In this paper, we expand upon the analysis of these defenses to include adaptive black-box adversaries. Our evaluation is done on nine defenses including Barrage of Random Transforms, ComDefend, Ensemble Diversity, Feature Distillation, The Odds are Odd, Error Correcting Codes, Distribution Classifier Defense, K-Winner Take All and Buffer Zones. Our investigation is done using two black-box adversarial models and six widely studied adversarial attacks for CIFAR-10 and Fashion-MNIST datasets. Our analyses show most recent defenses (7 out of 9) provide only marginal improvements in security ($<25\%$), as compared to undefended networks. For every defense, we also show the relationship between the amount of data the adversary has at their disposal, and the effectiveness of adaptive black-box attacks. Overall, our results paint a clear picture: defenses need both thorough white-box and black-box analyses to be considered secure. We provide this large scale study and analyses to motivate the field to move towards the development of more robust black-box defenses. △ Less

Submitted 20 May, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

arXiv:2003.00430 [pdf, other]

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

Authors: Nhan H. Pham, Lam M. Nguyen, Dzung T. Phan, Phuong Ha Nguyen, Marten van Dijk, Quoc Tran-Dinh

Abstract: We propose a novel hybrid stochastic policy gradient estimator by combining an unbiased policy gradient estimator, the REINFORCE estimator, with another biased one, an adapted SARAH estimator for policy optimization. The hybrid policy gradient estimator is shown to be biased, but has variance reduced property. Using this estimator, we develop a new Proximal Hybrid Stochastic Policy Gradient Algori… ▽ More We propose a novel hybrid stochastic policy gradient estimator by combining an unbiased policy gradient estimator, the REINFORCE estimator, with another biased one, an adapted SARAH estimator for policy optimization. The hybrid policy gradient estimator is shown to be biased, but has variance reduced property. Using this estimator, we develop a new Proximal Hybrid Stochastic Policy Gradient Algorithm (ProxHSPGA) to solve a composite policy optimization problem that allows us to handle constraints or regularizers on the policy parameters. We first propose a single-looped algorithm then introduce a more practical restarting variant. We prove that both algorithms can achieve the best-known trajectory complexity $\mathcal{O}\left(\varepsilon^{-3}\right)$ to attain a first-order stationary point for the composite problem which is better than existing REINFORCE/GPOMDP $\mathcal{O}\left(\varepsilon^{-4}\right)$ and SVRPG $\mathcal{O}\left(\varepsilon^{-10/3}\right)$ in the non-composite setting. We evaluate the performance of our algorithm on several well-known examples in reinforcement learning. Numerical results show that our algorithm outperforms two existing methods on these examples. Moreover, the composite settings indeed have some advantages compared to the non-composite ones on certain problems. △ Less

Submitted 21 September, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

Comments: Accepted for publication at the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020)

Journal ref: Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR 108:374-385, 2020

arXiv:2002.08246 [pdf, other]

A Unified Convergence Analysis for Shuffling-Type Gradient Methods

Authors: Lam M. Nguyen, Quoc Tran-Dinh, Dzung T. Phan, Phuong Ha Nguyen, Marten van Dijk

Abstract: In this paper, we propose a unified convergence analysis for a class of generic shuffling-type gradient methods for solving finite-sum optimization problems. Our analysis works with any sampling without replacement strategy and covers many known variants such as randomized reshuffling, deterministic or randomized single permutation, and cyclic and incremental gradient schemes. We focus on two diff… ▽ More In this paper, we propose a unified convergence analysis for a class of generic shuffling-type gradient methods for solving finite-sum optimization problems. Our analysis works with any sampling without replacement strategy and covers many known variants such as randomized reshuffling, deterministic or randomized single permutation, and cyclic and incremental gradient schemes. We focus on two different settings: strongly convex and nonconvex problems, but also discuss the non-strongly convex case. Our main contribution consists of new non-asymptotic and asymptotic convergence rates for a wide class of shuffling-type gradient methods in both nonconvex and convex settings. We also study uniformly randomized shuffling variants with different learning rates and model assumptions. While our rate in the nonconvex case is new and significantly improved over existing works under standard assumptions, the rate on the strongly convex one matches the existing best-known rates prior to this paper up to a constant factor without imposing a bounded gradient condition. Finally, we empirically illustrate our theoretical results via two numerical examples: nonconvex logistic regression and neural network training examples. As byproducts, our results suggest some appropriate choices for diminishing learning rates in certain shuffling variants. △ Less

Submitted 19 September, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: Journal of Machine Learning Research, 2021

arXiv:2002.07161 [pdf, other]

doi 10.1088/1361-6560/ab9fcc

Surrogate-free machine learning-based organ dose reconstruction for pediatric abdominal radiotherapy

Authors: M. Virgolin, Z. Wang, B. V. Balgobind, I. W. E. M. van Dijk, J. Wiersma, P. S. Kroon, G. O. Janssens, M. van Herk, D. C. Hodgson, L. Zadravec Zaletel, C. R. N. Rasch, A. Bel, P. A. N. Bosman, T. Alderliesten

Abstract: To study radiotherapy-related adverse effects, detailed dose information (3D distribution) is needed for accurate dose-effect modeling. For childhood cancer survivors who underwent radiotherapy in the pre-CT era, only 2D radiographs were acquired, thus 3D dose distributions must be reconstructed from limited information. State-of-the-art methods achieve this by using 3D surrogate anatomies. These… ▽ More To study radiotherapy-related adverse effects, detailed dose information (3D distribution) is needed for accurate dose-effect modeling. For childhood cancer survivors who underwent radiotherapy in the pre-CT era, only 2D radiographs were acquired, thus 3D dose distributions must be reconstructed from limited information. State-of-the-art methods achieve this by using 3D surrogate anatomies. These can lack personalization and lead to coarse reconstructions. We present and validate a surrogate-free dose reconstruction method based on Machine Learning (ML). Abdominal planning CTs ($n$=142) of recently-treated childhood cancer patients were gathered, their organs at risk were segmented, and 300 artificial Wilms' tumor plans were sampled automatically. Each artificial plan was automatically emulated on the 142 CTs, resulting in 42,600 3D dose distributions from which dose-volume metrics were derived. Anatomical features were extracted from digitally reconstructed radiographs simulated from the CTs to resemble historical radiographs. Further, patient and radiotherapy plan features typically available from historical treatment records were collected. An evolutionary ML algorithm was then used to link features to dose-volume metrics. Besides 5-fold cross-validation, a further evaluation was done on an independent dataset of five CTs each associated with two clinical plans. Cross-validation resulted in Mean Absolute Errors (MAEs) $\leq$0.6 Gy for organs completely inside or outside the field. For organs positioned at the edge of the field, MAEs $\leq$1.7 Gy for D$_{mean}$, $\leq$2.9 Gy for D$_{2cc}$, and $\leq$13% for V$_{5Gy}$ and V$_{10Gy}$, were obtained, without systematic bias. Similar results were found for the independent dataset. Our novel, ML-based organ dose reconstruction method is not only accurate but also efficient, as the setup of a surrogate is no longer needed. △ Less

Submitted 10 February, 2021; v1 submitted 16 February, 2020; originally announced February 2020.

Comments: M. Virgolin and Z. Wang share first authorship

Journal ref: Physics in Medicine & Biology. 2020 Dec 8;65(24):245021

arXiv:1910.02785 [pdf, other]

BUZz: BUffer Zones for defending adversarial examples in image classification

Authors: Kaleel Mahmood, Phuong Ha Nguyen, Lam M. Nguyen, Thanh Nguyen, Marten van Dijk

Abstract: We propose a novel defense against all existing gradient based adversarial attacks on deep neural networks for image classification problems. Our defense is based on a combination of deep neural networks and simple image transformations. While straightforward in implementation, this defense yields a unique security property which we term buffer zones. We argue that our defense based on buffer zone… ▽ More We propose a novel defense against all existing gradient based adversarial attacks on deep neural networks for image classification problems. Our defense is based on a combination of deep neural networks and simple image transformations. While straightforward in implementation, this defense yields a unique security property which we term buffer zones. We argue that our defense based on buffer zones offers significant improvements over state-of-the-art defenses. We are able to achieve this improvement even when the adversary has access to the {\em entire} original training data set and unlimited query access to the defense. We verify our claim through experimentation using Fashion-MNIST and CIFAR-10: We demonstrate $<11\%$ attack success rate -- significantly lower than what other well-known state-of-the-art defenses offer -- at only a price of a $11-18\%$ drop in clean accuracy. By using a new intuitive metric, we explain why this trade-off offers a significant improvement over prior work. △ Less

Submitted 16 June, 2020; v1 submitted 3 October, 2019; originally announced October 2019.

arXiv:1905.00154 [pdf, ps, other]

On the Convergence Rates of Learning-based Signature Generation Schemes to Contain Self-propagating Malware

Authors: Saeed Valizadeh, Marten van Dijk

Abstract: In this paper, we investigate the importance of a defense system's learning rates to fight against the self-propagating class of malware such as worms and bots. To this end, we introduce a new propagation model based on the interactions between an adversary (and its agents) who wishes to construct a zombie army of a specific size, and a defender taking advantage of standard security tools and tech… ▽ More In this paper, we investigate the importance of a defense system's learning rates to fight against the self-propagating class of malware such as worms and bots. To this end, we introduce a new propagation model based on the interactions between an adversary (and its agents) who wishes to construct a zombie army of a specific size, and a defender taking advantage of standard security tools and technologies such as honeypots (HPs) and intrusion detection and prevention systems (IDPSes) in the network environment. As time goes on, the defender can incrementally learn from the collected/observed attack samples (e.g., malware payloads), and therefore being able to generate attack signatures. The generated signatures then are used for filtering next attack traffic and thus containing the attacker's progress in its malware propagation mission. Using simulation and numerical analysis, we evaluate the efficacy of signature generation algorithms and in general any learning-based scheme in bringing an adversary's maneuvering in the environment to a halt as an adversarial containment strategy. △ Less

Submitted 30 April, 2019; originally announced May 2019.

Comments: This work was funded by NSF grant CNS-1413996 "MACS: A Modular Approach to Cloud Security."

arXiv:1901.07648 [pdf, other]

Finite-Sum Smooth Optimization with SARAH

Authors: Lam M. Nguyen, Marten van Dijk, Dzung T. Phan, Phuong Ha Nguyen, Tsui-Wei Weng, Jayant R. Kalagnanam

Abstract: The total complexity (measured as the total number of gradient computations) of a stochastic first-order optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function $F(w)=\frac{1}{n} \sum_{i=1}^n f_i(w)$ has been proven to be at least $Ω(\sqrt{n}/ε)$ for $n \leq \mathcal{O}(ε^{-2})$ where $ε$ denotes the attained accuracy… ▽ More The total complexity (measured as the total number of gradient computations) of a stochastic first-order optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function $F(w)=\frac{1}{n} \sum_{i=1}^n f_i(w)$ has been proven to be at least $Ω(\sqrt{n}/ε)$ for $n \leq \mathcal{O}(ε^{-2})$ where $ε$ denotes the attained accuracy $\mathbb{E}[ \|\nabla F(\tilde{w})\|^2] \leq ε$ for the outputted approximation $\tilde{w}$ (Fang et al., 2018). In this paper, we provide a convergence analysis for a slightly modified version of the SARAH algorithm (Nguyen et al., 2017a;b) and achieve total complexity that matches the lower-bound worst case complexity in (Fang et al., 2018) up to a constant factor when $n \leq \mathcal{O}(ε^{-2})$ for nonconvex problems. For convex optimization, we propose SARAH++ with sublinear convergence for general convex and linear convergence for strongly convex problems; and we provide a practical version for which numerical experiments on various datasets show an improved performance. △ Less

Submitted 22 April, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

arXiv:1901.07634

DTN: A Learning Rate Scheme with Convergence Rate of $\mathcal{O}(1/t)$ for SGD

Authors: Lam M. Nguyen, Phuong Ha Nguyen, Dzung T. Phan, Jayant R. Kalagnanam, Marten van Dijk

Abstract: This paper has some inconsistent results, i.e., we made some failed claims because we did some mistakes for using the test criterion for a series. Precisely, our claims on the convergence rate of $\mathcal{O}(1/t)$ of SGD presented in Theorem 1, Corollary 1, Theorem 2 and Corollary 2 are wrongly derived because they are based on Lemma 5. In Lemma 5, we do not correctly use the test criterion for a… ▽ More This paper has some inconsistent results, i.e., we made some failed claims because we did some mistakes for using the test criterion for a series. Precisely, our claims on the convergence rate of $\mathcal{O}(1/t)$ of SGD presented in Theorem 1, Corollary 1, Theorem 2 and Corollary 2 are wrongly derived because they are based on Lemma 5. In Lemma 5, we do not correctly use the test criterion for a series. Hence, the result of Lemma 5 is not valid. We would like to thank the community for pointing out this mistake! △ Less

Submitted 27 February, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

Comments: This paper has inconsistent results, i.e., we made some failed claims because we did some mistakes for using the test criterion for a series

arXiv:1901.01598 [pdf, other]

Toward a Theory of Cyber Attacks

Authors: Saeed Valizadeh, Marten van Dijk

Abstract: We provide a general methodology for analyzing defender-attacker based "games" in which we model such games as Markov models and introduce a capacity region to analyze how defensive and adversarial strategies impact security. Such a framework allows us to analyze under what kind of conditions we can prove statements (about an attack objective $k$) of the form "if the attacker has a time budget… ▽ More We provide a general methodology for analyzing defender-attacker based "games" in which we model such games as Markov models and introduce a capacity region to analyze how defensive and adversarial strategies impact security. Such a framework allows us to analyze under what kind of conditions we can prove statements (about an attack objective $k$) of the form "if the attacker has a time budget $T_{bud}$, then the probability that the attacker can reach an attack objective $\geq k$ is at most $poly(T_{bud})negl(k)$". We are interested in such rigorous cryptographic security guarantees (that describe worst-case guarantees) as these shed light on the requirements of a defender's strategy for preventing more and more the progress of an attack, in terms of the "learning rate" of a defender's strategy. We explain the damage an attacker can achieve by a "containment parameter" describing the maximally reached attack objective within a specific time window. △ Less

Submitted 6 January, 2019; originally announced January 2019.

Comments: This work was funded by NSF grant CNS-1413996 "MACS: A Modular Approach to Cloud Security"

arXiv:1811.12403 [pdf, other]

New Convergence Aspects of Stochastic Gradient Algorithms

Authors: Lam M. Nguyen, Phuong Ha Nguyen, Peter Richtárik, Katya Scheinberg, Martin Takáč, Marten van Dijk

Abstract: The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is violated for cases where the objective function is strongly convex. In Bottou et al. (2018), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with… ▽ More The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is violated for cases where the objective function is strongly convex. In Bottou et al. (2018), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. We show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime. We then move on to the asynchronous parallel setting, and prove convergence of Hogwild! algorithm in the same regime in the case of diminished learning rate. It is well-known that SGD converges if a sequence of learning rates $\{η_t\}$ satisfies $\sum_{t=0}^\infty η_t \rightarrow \infty$ and $\sum_{t=0}^\infty η^2_t < \infty$. We show the convergence of SGD for strongly convex objective function without using bounded gradient assumption when $\{η_t\}$ is a diminishing sequence and $\sum_{t=0}^\infty η_t \rightarrow \infty$. In other words, we extend the current state-of-the-art class of learning rates satisfying the convergence of SGD. △ Less

Submitted 7 November, 2019; v1 submitted 9 November, 2018; originally announced November 2018.

Comments: Journal of Machine Learning Research. arXiv admin note: substantial text overlap with arXiv:1802.03801

arXiv:1810.04723 [pdf, other]

Tight Dimension Independent Lower Bound on the Expected Convergence Rate for Diminishing Step Sizes in SGD

Authors: Phuong Ha Nguyen, Lam M. Nguyen, Marten van Dijk

Abstract: We study the convergence of Stochastic Gradient Descent (SGD) for strongly convex objective functions. We prove for all $t$ a lower bound on the expected convergence rate after the $t$-th SGD iteration; the lower bound is over all possible sequences of diminishing step sizes. It implies that recently proposed sequences of step sizes at ICML 2018 and ICML 2019 are {\em universally} close to optimal… ▽ More We study the convergence of Stochastic Gradient Descent (SGD) for strongly convex objective functions. We prove for all $t$ a lower bound on the expected convergence rate after the $t$-th SGD iteration; the lower bound is over all possible sequences of diminishing step sizes. It implies that recently proposed sequences of step sizes at ICML 2018 and ICML 2019 are {\em universally} close to optimal in that the expected convergence rate after {\em each} iteration is within a factor $32$ of our lower bound. This factor is independent of dimension $d$. We offer a framework for comparing with lower bounds in state-of-the-art literature and when applied to SGD for strongly convex objective functions our lower bound is a significant factor $775\cdot d$ larger compared to existing work. △ Less

Submitted 7 November, 2019; v1 submitted 10 October, 2018; originally announced October 2018.

Comments: The 33th Annual Conference on Neural Information Processing Systems (NeurIPS 2019)

arXiv:1810.04100 [pdf, other]

Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD

Authors: Marten van Dijk, Lam M. Nguyen, Phuong Ha Nguyen, Dzung T. Phan

Abstract: We study Stochastic Gradient Descent (SGD) with diminishing step sizes for convex objective functions. We introduce a definitional framework and theory that defines and characterizes a core property, called curvature, of convex objective functions. In terms of curvature we can derive a new inequality that can be used to compute an optimal sequence of diminishing step sizes by solving a differentia… ▽ More We study Stochastic Gradient Descent (SGD) with diminishing step sizes for convex objective functions. We introduce a definitional framework and theory that defines and characterizes a core property, called curvature, of convex objective functions. In terms of curvature we can derive a new inequality that can be used to compute an optimal sequence of diminishing step sizes by solving a differential equation. Our exact solutions confirm known results in literature and allows us to fully characterize a new regularizer with its corresponding expected convergence rates. △ Less

Submitted 13 May, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

Journal ref: Proceedings of the 36th International Conference on Machine Learning, PMLR 97, 2019

arXiv:1807.11046 [pdf, other]

TREVERSE: Trial-and-Error Lightweight Secure Reverse Authentication with Simulatable PUFs

Authors: Yansong Gao, Marten van Dijk, Lei Xu, Wei Yang, Surya Nepal, Damith C. Ranasinghe

Abstract: A physical unclonable function (PUF) generates hardware intrinsic volatile secrets by exploiting uncontrollable manufacturing randomness. Although PUFs provide the potential for lightweight and secure authentication for increasing numbers of low-end Internet of Things devices, practical and secure mechanisms remain elusive. We aim to explore simulatable PUFs (SimPUFs) that are physically unclonabl… ▽ More A physical unclonable function (PUF) generates hardware intrinsic volatile secrets by exploiting uncontrollable manufacturing randomness. Although PUFs provide the potential for lightweight and secure authentication for increasing numbers of low-end Internet of Things devices, practical and secure mechanisms remain elusive. We aim to explore simulatable PUFs (SimPUFs) that are physically unclonable but efficiently modeled mathematically through privileged one-time PUF access to address the above problem. Given a challenge, a securely stored SimPUF in possession of a trusted server computes the corresponding response and its bit-specific reliability. Consequently, naturally noisy PUF responses generated by a resource limited prover can be immediately processed by a one-way function (OWF) and transmitted to the server, because the resourceful server can exploit the SimPUF to perform a trial-and-error search over likely error patterns to recover the noisy response to authenticate the prover. Security of trial-and-error reverse (TREVERSE) authentication under the random oracle model is guaranteed by the hardness of inverting the OWF. We formally evaluate the TREVERSE authentication capability with two SimPUFs experimentally derived from popular silicon PUFs. △ Less

Submitted 3 May, 2020; v1 submitted 29 July, 2018; originally announced July 2018.

Comments: 23 pages, 16 figures

Journal ref: IEEE Transactions on Dependable and Secure Computing, 2020

arXiv:1804.04783 [pdf, ps, other]

Comments on "Defeating HaTCh: Building Malicious IP Cores"

Authors: Syed Kamran Haider, Chenglu **, Marten van Dijk

Abstract: Recently, Haider et al. introduced the first rigorous hardware Trojan detection algorithm called HaTCh. The foundation of HaTCh is a formal framework of hardware Trojan design, which formally characterizes all the hardware Trojans based on its properties. However, Bhardwaj et al. recently published one paper "Defeating HaTCh: Building Malicious IP Cores", which incorrectly claims that their newly… ▽ More Recently, Haider et al. introduced the first rigorous hardware Trojan detection algorithm called HaTCh. The foundation of HaTCh is a formal framework of hardware Trojan design, which formally characterizes all the hardware Trojans based on its properties. However, Bhardwaj et al. recently published one paper "Defeating HaTCh: Building Malicious IP Cores", which incorrectly claims that their newly designed hardware Trojan can evade the detection by HaTCh. In this paper, we explain why the claim of "defeating HaTCh" is incorrect, and we clarify several common misunderstandings about HaTCh. △ Less

Submitted 4 October, 2018; v1 submitted 13 April, 2018; originally announced April 2018.

arXiv:1802.03801 [pdf, other]

SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

Authors: Lam M. Nguyen, Phuong Ha Nguyen, Marten van Dijk, Peter Richtárik, Katya Scheinberg, Martin Takáč

Abstract: Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always viol… ▽ More Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always violated for cases where the objective function is strongly convex. In (Bottou et al.,2016), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. Here we show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime, which results in more relaxed conditions than those in (Bottou et al.,2016). We then move on the asynchronous parallel setting, and prove convergence of Hogwild! algorithm in the same regime, obtaining the first convergence results for this method in the case of diminished learning rate. △ Less

Submitted 8 June, 2018; v1 submitted 11 February, 2018; originally announced February 2018.

Journal ref: Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3747-3755, 2018

arXiv:1706.03852 [pdf, other]

Revisiting Definitional Foundations of Oblivious RAM for Secure Processor Implementations

Authors: Syed Kamran Haider, Omer Khan, Marten van Dijk

Abstract: Oblivious RAM (ORAM) is a renowned technique to hide the access patterns of an application to an untrusted memory. According to the standard ORAM definition presented by Goldreich and Ostrovsky, two ORAM access sequences must be computationally indistinguishable if the lengths of these sequences are identically distributed. An artifact of this definition is that it does not apply to modern ORAM im… ▽ More Oblivious RAM (ORAM) is a renowned technique to hide the access patterns of an application to an untrusted memory. According to the standard ORAM definition presented by Goldreich and Ostrovsky, two ORAM access sequences must be computationally indistinguishable if the lengths of these sequences are identically distributed. An artifact of this definition is that it does not apply to modern ORAM implementations adapted in current secure processors technology because of their arbitrary lengths of memory access sequences depending on programs' behaviors (their termination times). As a result, the ORAM definition does not directly apply; the theoretical foundations of ORAM do not clearly argue about the timing and termination channels. This paper conducts a first rigorous study of the standard Goldreich-Ostrovsky ORAM definition in view of modern practical ORAMs (e.g., Path ORAM) and demonstrates the gap between theoretical foundations and real implementations. A new ORAM formulation which clearly separates out termination channel leakage is proposed. It is shown how this definition implies the standard ORAM definition (for finite length input access sequences) and better fits the modern practical ORAM implementations. The proposed definition relaxes the constraints around the stash size and overflow probability for Path ORAM, and essentially transforms its security argument into a performance consideration problem. Finally, a `strong' ORAM formulation which clearly includes obfuscation of termination leakage is shown to imply our new ORAM formulation and applies to ORAM for outsourced disk storage. In this strong formulation constraints are not relaxed and the security argument for Path ORAM remains complex as one needs to prove that the stash overflows with negligible probability. △ Less

Submitted 21 October, 2017; v1 submitted 12 June, 2017; originally announced June 2017.

arXiv:1703.07427 [pdf, other]

Intrinsically Reliable and Lightweight Physical Obfuscated Keys

Authors: Raihan Sayeed Khan, Nadim Kanan, Chenglu **, Jake Scoggin, Nafisa Noor, Sadid Muneer, Faruk Dirisaglik, Phuong Ha Nguyen, Helena Silva, Marten van Dijk, Ali Gokirmak

Abstract: Physical Obfuscated Keys (POKs) allow tamper-resistant storage of random keys based on physical disorder. The output bits of current POK designs need to be first corrected due to measurement noise and next de-correlated since the original output bits may not be i.i.d. (independent and identically distributed) and also public helper information for error correction necessarily correlates the correc… ▽ More Physical Obfuscated Keys (POKs) allow tamper-resistant storage of random keys based on physical disorder. The output bits of current POK designs need to be first corrected due to measurement noise and next de-correlated since the original output bits may not be i.i.d. (independent and identically distributed) and also public helper information for error correction necessarily correlates the corrected output bits.For this reason, current designs include an interface for error correction and/or output reinforcement, and privacy amplification for compressing the corrected output to a uniform random bit string. We propose two intrinsically reliable POK designs with only XOR circuitry for privacy amplification (without need for reliability enhancement) by exploiting variability of lithographic process and variability of granularity in phase change memory (PCM) materials. The two designs are demonstrated through experiments and simulations. △ Less

Submitted 21 March, 2017; originally announced March 2017.

arXiv:1702.03965 [pdf, other]

Connecting the Dots: Privacy Leakage via Write-Access Patterns to the Main Memory

Authors: Tara Merin John, Syed Kamran Haider, Hamza Omar, Marten van Dijk

Abstract: Data-dependent access patterns of an application to an untrusted storage system are notorious for leaking sensitive information about the user's data. Previous research has shown how an adversary capable of monitoring both read and write requests issued to the memory can correlate them with the application to learn its sensitive data. However, information leakage through only the write access patt… ▽ More Data-dependent access patterns of an application to an untrusted storage system are notorious for leaking sensitive information about the user's data. Previous research has shown how an adversary capable of monitoring both read and write requests issued to the memory can correlate them with the application to learn its sensitive data. However, information leakage through only the write access patterns is less obvious and not well studied in the current literature. In this work, we demonstrate an actual attack on power-side-channel resistant Montgomery's ladder based modular exponentiation algorithm commonly used in public key cryptography. We infer the complete 512-bit secret exponent in $\sim3.5$ minutes by virtue of just the write access patterns of the algorithm to the main memory. In order to learn the victim algorithm's write access patterns under realistic settings, we exploit a compromised DMA device to take frequent snapshots of the application's address space, and then run a simple differential analysis on these snapshots to find the write access sequence. The attack has been shown on an Intel Core(TM) i7-4790 3.60GHz processor based system. We further discuss a possible attack on McEliece public-key cryptosystem that also exploits the write-access patterns to learn the secret key. △ Less

Submitted 17 June, 2017; v1 submitted 13 February, 2017; originally announced February 2017.

Comments: A 250 word preliminary abstract of this work has been accepted for publication and a poster presentation at Hardware Oriented Security and Trust (HOST) 2017. Added Section 5: Leakage under Caching Effects

arXiv:1611.01571 [pdf, other]

Flat ORAM: A Simplified Write-Only Oblivious RAM Construction for Secure Processors

Authors: Syed Kamran Haider, Marten van Dijk

Abstract: Oblivious RAM (ORAM) is a cryptographic primitive which obfuscates the access patterns to a storage thereby preventing privacy leakage. So far in the current literature, only `fully functional' ORAMs are widely studied which can protect, at a cost of considerable performance penalty, against the strong adversaries who can monitor all read and write operations. However, recent research has shown th… ▽ More Oblivious RAM (ORAM) is a cryptographic primitive which obfuscates the access patterns to a storage thereby preventing privacy leakage. So far in the current literature, only `fully functional' ORAMs are widely studied which can protect, at a cost of considerable performance penalty, against the strong adversaries who can monitor all read and write operations. However, recent research has shown that information can still be leaked even if only the write access pattern (not reads) is visible to the adversary. For such weaker adversaries, a fully functional ORAM turns out to be an overkill causing unnecessary overheads. Instead, a simple `write-only' ORAM is sufficient, and, more interestingly, is preferred as it can offer far more performance and energy efficiency than a fully functional ORAM. In this work, we present Flat ORAM: an efficient write-only ORAM scheme which outperforms the closest existing write-only ORAM called HIVE. HIVE suffers from performance bottlenecks while managing the memory occupancy information vital for correctness of the protocol. Flat ORAM resolves these bottlenecks by introducing a simple idea of Occupancy Map (OccMap) which efficiently manages the memory occupancy information resulting in far better performance. Our simulation results show that, on average, Flat ORAM only incurs a moderate slowdown of $3\times$ over the insecure DRAM for memory intensive benchmarks among Splash2 and $1.6\times$ for SPEC06. Compared to HIVE, Flat ORAM offers $50\%$ performance gain on average and up to $80\%$ energy savings. △ Less

Submitted 10 September, 2017; v1 submitted 4 November, 2016; originally announced November 2016.

arXiv:1605.08413 [pdf, other]

Advancing the State-of-the-Art in Hardware Trojans Design

Authors: Syed Kamran Haider, Chenglu **, Marten van Dijk

Abstract: Electronic Design Automation (EDA) industry heavily reuses third party IP cores. These IP cores are vulnerable to insertion of Hardware Trojans (HTs) at design time by third party IP core providers or by malicious insiders in the design team. State of the art research has shown that existing HT detection techniques, which claim to detect all publicly available HT benchmarks, can still be defeated… ▽ More Electronic Design Automation (EDA) industry heavily reuses third party IP cores. These IP cores are vulnerable to insertion of Hardware Trojans (HTs) at design time by third party IP core providers or by malicious insiders in the design team. State of the art research has shown that existing HT detection techniques, which claim to detect all publicly available HT benchmarks, can still be defeated by carefully designing new sophisticated HTs. The reason being that these techniques consider the HT landscape to be limited only to the publicly known HT benchmarks, or other similar (simple) HTs. However the adversary is not limited to these HTs and may devise new HT design principles to bypass these countermeasures. In this paper, we discover certain crucial properties of HTs which lead to the definition of an exponentially large class of Deterministic Hardware Trojans $H_D$ that an adversary can (but is not limited to) design. The discovered properties serve as HT design principles, based on which we design a new HT called 'XOR-LFSR' and present it as a 'proof-of-concept' example from the class $H_D$. These design principles help us understand the tremendous ways an adversary has to design a HT, and show that the existing publicly known HT benchmarks are just the tip of the iceberg on this huge landscape. This work, therefore, stresses that instead of guaranteeing a certain (low) false negative rate for a small constant set of publicly known HTs, a rigorous HT detection tool should take into account these newly discovered HT design principles and hence guarantee the detection of an exponentially large class (exponential in number of wires in IP core) of HTs with negligible false negative rate. △ Less

Submitted 12 April, 2017; v1 submitted 26 May, 2016; originally announced May 2016.

Comments: Updated Definition 10. Invited to 60th IEEE International Midwest Symposium on Circuits and Systems

arXiv:1202.5150 [pdf, other]

Path ORAM: An Extremely Simple Oblivious RAM Protocol

Authors: Emil Stefanov, Marten van Dijk, Elaine Shi, T-H. Hubert Chan, Christopher Fletcher, Ling Ren, Xiangyao Yu, Srinivas Devadas

Abstract: We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme known to date with small client storage. We formally prove that Path ORAM has a O(log N) bandwidth cost for blocks of size B = Omega(log^2 N) bits. For such block sizes, Path ORAM is asymptotically better than the best kno… ▽ More We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme known to date with small client storage. We formally prove that Path ORAM has a O(log N) bandwidth cost for blocks of size B = Omega(log^2 N) bits. For such block sizes, Path ORAM is asymptotically better than the best known ORAM schemes with small client storage. Due to its practicality, Path ORAM has been adopted in the design of secure processors since its proposal. △ Less

Submitted 13 January, 2014; v1 submitted 23 February, 2012; originally announced February 2012.

arXiv:cs/0605109 [pdf, ps, other]

Knowledge Flow Analysis for Security Protocols

Authors: Emina Torlak, Marten van Dijk, Blaise Gassend, Daniel Jackson, Srinivas Devadas

Abstract: Knowledge flow analysis offers a simple and flexible way to find flaws in security protocols. A protocol is described by a collection of rules constraining the propagation of knowledge amongst principals. Because this characterization corresponds closely to informal descriptions of protocols, it allows a succinct and natural formalization; because it abstracts away message ordering, and handles… ▽ More Knowledge flow analysis offers a simple and flexible way to find flaws in security protocols. A protocol is described by a collection of rules constraining the propagation of knowledge amongst principals. Because this characterization corresponds closely to informal descriptions of protocols, it allows a succinct and natural formalization; because it abstracts away message ordering, and handles communications between principals and applications of cryptographic primitives uniformly, it is readily represented in a standard logic. A generic framework in the Alloy modelling language is presented, and instantiated for two standard protocols, and a new key management scheme. △ Less

Submitted 24 May, 2006; originally announced May 2006.

Comments: 20 pages

Report number: MIT-CSAIL-TR-2005-066

arXiv:cs/0605097 [pdf, ps, other]

A Generalized Two-Phase Analysis of Knowledge Flows in Security Protocols

Authors: Marten van Dijk, Emina Torlak, Blaise Gassend, Srinivas Devadas

Abstract: We introduce knowledge flow analysis, a simple and flexible formalism for checking cryptographic protocols. Knowledge flows provide a uniform language for expressing the actions of principals, assump- tions about intruders, and the properties of cryptographic primitives. Our approach enables a generalized two-phase analysis: we extend the two-phase theory by identifying the necessary and suffici… ▽ More We introduce knowledge flow analysis, a simple and flexible formalism for checking cryptographic protocols. Knowledge flows provide a uniform language for expressing the actions of principals, assump- tions about intruders, and the properties of cryptographic primitives. Our approach enables a generalized two-phase analysis: we extend the two-phase theory by identifying the necessary and sufficient proper- ties of a broad class of cryptographic primitives for which the theory holds. We also contribute a library of standard primitives and show that they satisfy our criteria. △ Less

Submitted 22 May, 2006; originally announced May 2006.

Comments: 16 pages

Showing 1–42 of 42 results for author: van Dijk, M