Search | arXiv e-print repository

Why is parameter averaging beneficial in SGD? An objective smoothing perspective

Authors: Atsushi Nitanda, Ryuhei Kikuchi, Shugo Maeda, Denny Wu

Abstract: It is often observed that stochastic gradient descent (SGD) and its variants implicitly select a solution with good generalization performance; such implicit bias is often characterized in terms of the sharpness of the minima. Kleinberg et al. (2018) connected this bias with the smoothing effect of SGD which eliminates sharp local minima by the convolution using the stochastic gradient noise. We f… ▽ More It is often observed that stochastic gradient descent (SGD) and its variants implicitly select a solution with good generalization performance; such implicit bias is often characterized in terms of the sharpness of the minima. Kleinberg et al. (2018) connected this bias with the smoothing effect of SGD which eliminates sharp local minima by the convolution using the stochastic gradient noise. We follow this line of research and study the commonly-used averaged SGD algorithm, which has been empirically observed in Izmailov et al. (2018) to prefer a flat minimum and therefore achieves better generalization. We prove that in certain problem settings, averaged SGD can efficiently optimize the smoothed objective which avoids sharp local minima. In experiments, we verify our theory and show that parameter averaging with an appropriate step size indeed leads to significant improvement in the performance of SGD. △ Less

Submitted 26 May, 2024; v1 submitted 18 February, 2023; originally announced February 2023.

Comments: 27pages, AISTATS2024

arXiv:2112.12906 [pdf, ps, other]

Efficient decision tree training with new data structure for secure multi-party computation

Authors: Koki Hamada, Dai Ikarashi, Ryo Kikuchi, Koji Chida

Abstract: We propose a secure multi-party computation (MPC) protocol that constructs a secret-shared decision tree for a given secret-shared dataset. The previous MPC-based decision tree training protocol (Abspoel et al. 2021) requires $O(2^hmn\log n)$ comparisons, being exponential in the tree height $h$ and with $n$ and $m$ being the number of rows and that of attributes in the dataset, respectively. The… ▽ More We propose a secure multi-party computation (MPC) protocol that constructs a secret-shared decision tree for a given secret-shared dataset. The previous MPC-based decision tree training protocol (Abspoel et al. 2021) requires $O(2^hmn\log n)$ comparisons, being exponential in the tree height $h$ and with $n$ and $m$ being the number of rows and that of attributes in the dataset, respectively. The cause of the exponential number of comparisons in $h$ is that the decision tree training algorithm is based on the divide-and-conquer paradigm, where dummy rows are added after each split in order to hide the number of rows in the dataset. We resolve this issue via secure data structure that enables us to compute an aggregate value for every group while hiding the grou** information. By using this data structure, we can train a decision tree without adding dummy rows while hiding the size of the intermediate data. We specifically describes a decision tree training protocol that requires only $O(hmn\log n)$ comparisons when the input attributes are continuous and the output attribute is binary. Note that the order is now \emph{linear} in the tree height $h$. To demonstrate the practicality of our protocol, we implement it in an MPC framework based on a three-party secret sharing scheme. Our implementation results show that our protocol trains a decision tree with a height of 5 in 33 seconds for a dataset of 100,000 rows and 10 attributes. △ Less

Submitted 23 December, 2021; originally announced December 2021.

Comments: 19 pages

arXiv:2106.02203 [pdf, other]

Adam in Private: Secure and Fast Training of Deep Neural Networks with Adaptive Moment Estimation

Authors: Nuttapong Attrapadung, Koki Hamada, Dai Ikarashi, Ryo Kikuchi, Takahiro Matsuda, Ibuki Mishina, Hiraku Morita, Jacob C. N. Schuldt

Abstract: Privacy-preserving machine learning (PPML) aims at enabling machine learning (ML) algorithms to be used on sensitive data. We contribute to this line of research by proposing a framework that allows efficient and secure evaluation of full-fledged state-of-the-art ML algorithms via secure multi-party computation (MPC). This is in contrast to most prior works, which substitute ML algorithms with app… ▽ More Privacy-preserving machine learning (PPML) aims at enabling machine learning (ML) algorithms to be used on sensitive data. We contribute to this line of research by proposing a framework that allows efficient and secure evaluation of full-fledged state-of-the-art ML algorithms via secure multi-party computation (MPC). This is in contrast to most prior works, which substitute ML algorithms with approximated "MPC-friendly" variants. A drawback of the latter approach is that fine-tuning of the combined ML and MPC algorithms is required, which might lead to less efficient algorithms or inferior quality ML. This is an issue for secure deep neural networks (DNN) training in particular, as this involves arithmetic algorithms thought to be "MPC-unfriendly", namely, integer division, exponentiation, inversion, and square root. In this work, we propose secure and efficient protocols for the above seemingly MPC-unfriendly computations. Our protocols are three-party protocols in the honest-majority setting, and we propose both passively secure and actively secure with abort variants. A notable feature of our protocols is that they simultaneously provide high accuracy and efficiency. This framework enables us to efficiently and securely compute modern ML algorithms such as Adam and the softmax function "as is", without resorting to approximations. As a result, we obtain secure DNN training that outperforms state-of-the-art three-party systems; our full training is up to 6.7 times faster than just the online phase of the recently proposed FALCON@PETS'21 on a standard benchmark network. We further perform measurements on real-world DNNs, AlexNet and VGG16. The performance of our framework is up to a factor of about 12-14 faster for AlexNet and 46-48 faster for VGG16 to achieve an accuracy of 70% and 75%, respectively, when compared to FALCON. △ Less

Submitted 3 June, 2021; originally announced June 2021.

Comments: 24 pages, 13 tables

arXiv:1504.05353 [pdf, ps, other]

k-anonymous Microdata Release via Post Randomisation Method

Authors: Dai Ikarashi, Ryo Kikuchi, Koji Chida, Katsumi Takahashi

Abstract: The problem of the release of anonymized microdata is an important topic in the fields of statistical disclosure control (SDC) and privacy preserving data publishing (PPDP), and yet it remains sufficiently unsolved. In these research fields, k-anonymity has been widely studied as an anonymity notion for mainly deterministic anonymization algorithms, and some probabilistic relaxations have been dev… ▽ More The problem of the release of anonymized microdata is an important topic in the fields of statistical disclosure control (SDC) and privacy preserving data publishing (PPDP), and yet it remains sufficiently unsolved. In these research fields, k-anonymity has been widely studied as an anonymity notion for mainly deterministic anonymization algorithms, and some probabilistic relaxations have been developed. However, they are not sufficient due to their limitations, i.e., being weaker than the original k-anonymity or requiring strong parametric assumptions. First we propose Pk-anonymity, a new probabilistic k-anonymity, and prove that Pk-anonymity is a mathematical extension of k-anonymity rather than a relaxation. Furthermore, Pk-anonymity requires no parametric assumptions. This property has a significant meaning in the viewpoint that it enables us to compare privacy levels of probabilistic microdata release algorithms with deterministic ones. Second, we apply Pk-anonymity to the post randomization method (PRAM), which is an SDC algorithm based on randomization. PRAM is proven to satisfy Pk-anonymity in a controlled way, i.e, one can control PRAM's parameter so that Pk-anonymity is satisfied. On the other hand, PRAM is also known to satisfy ${\varepsilon}$-differential privacy, a recent popular and strong privacy notion. This fact means that our results significantly enhance PRAM since it implies the satisfaction of both important notions: k-anonymity and ${\varepsilon}$-differential privacy. △ Less

Submitted 21 April, 2015; originally announced April 2015.

Comments: 22 pages, 4 figures

Showing 1–4 of 4 results for author: Kikuchi, R