-
Precision and Adaptability of YOLOv5 and YOLOv8 in Dynamic Robotic Environments
Authors:
Victor A. Kich,
Muhammad A. Muttaqien,
Junya Toyama,
Ryutaro Miyoshi,
Yosuke Ida,
Akihisa Ohya,
Hisashi Date
Abstract:
Recent advancements in real-time object detection frameworks have spurred extensive research into their application in robotic systems. This study provides a comparative analysis of YOLOv5 and YOLOv8 models, challenging the prevailing assumption of the latter's superiority in performance metrics. Contrary to initial expectations, YOLOv5 models demonstrated comparable, and in some cases superior, p…
▽ More
Recent advancements in real-time object detection frameworks have spurred extensive research into their application in robotic systems. This study provides a comparative analysis of YOLOv5 and YOLOv8 models, challenging the prevailing assumption of the latter's superiority in performance metrics. Contrary to initial expectations, YOLOv5 models demonstrated comparable, and in some cases superior, precision in object detection tasks. Our analysis delves into the underlying factors contributing to these findings, examining aspects such as model architecture complexity, training dataset variances, and real-world applicability. Through rigorous testing and an ablation study, we present a nuanced understanding of each model's capabilities, offering insights into the selection and optimization of object detection frameworks for robotic applications. Implications of this research extend to the design of more efficient and contextually adaptive systems, emphasizing the necessity for a holistic approach to evaluating model performance.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Fast Regularized Discrete Optimal Transport with Group-Sparse Regularizers
Authors:
Yasutoshi Ida,
Sekitoshi Kanai,
Kazuki Adachi,
Atsutoshi Kumagai,
Yasuhiro Fujiwara
Abstract:
Regularized discrete optimal transport (OT) is a powerful tool to measure the distance between two discrete distributions that have been constructed from data samples on two different domains. While it has a wide range of applications in machine learning, in some cases the sampled data from only one of the domains will have class labels such as unsupervised domain adaptation. In this kind of probl…
▽ More
Regularized discrete optimal transport (OT) is a powerful tool to measure the distance between two discrete distributions that have been constructed from data samples on two different domains. While it has a wide range of applications in machine learning, in some cases the sampled data from only one of the domains will have class labels such as unsupervised domain adaptation. In this kind of problem setting, a group-sparse regularizer is frequently leveraged as a regularization term to handle class labels. In particular, it can preserve the label structure on the data samples by corresponding the data samples with the same class label to one group-sparse regularization term. As a result, we can measure the distance while utilizing label information by solving the regularized optimization problem with gradient-based algorithms. However, the gradient computation is expensive when the number of classes or data samples is large because the number of regularization terms and their respective sizes also turn out to be large. This paper proposes fast discrete OT with group-sparse regularizers. Our method is based on two ideas. The first is to safely skip the computations of the gradients that must be zero. The second is to efficiently extract the gradients that are expected to be nonzero. Our method is guaranteed to return the same value of the objective function as that of the original method. Experiments show that our method is up to 8.6 times faster than the original method without degrading accuracy.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Fast Saturating Gate for Learning Long Time Scales with Recurrent Neural Networks
Authors:
Kentaro Ohno,
Sekitoshi Kanai,
Yasutoshi Ida
Abstract:
Gate functions in recurrent models, such as an LSTM and GRU, play a central role in learning various time scales in modeling time series data by using a bounded activation function. However, it is difficult to train gates to capture extremely long time scales due to gradient vanishing of the bounded function for large inputs, which is known as the saturation problem. We closely analyze the relatio…
▽ More
Gate functions in recurrent models, such as an LSTM and GRU, play a central role in learning various time scales in modeling time series data by using a bounded activation function. However, it is difficult to train gates to capture extremely long time scales due to gradient vanishing of the bounded function for large inputs, which is known as the saturation problem. We closely analyze the relation between saturation of the gate function and efficiency of the training. We prove that the gradient vanishing of the gate function can be mitigated by accelerating the convergence of the saturating function, i.e., making the output of the function converge to 0 or 1 faster. Based on the analysis results, we propose a gate function called fast gate that has a doubly exponential convergence rate with respect to inputs by simple function composition. We empirically show that our method outperforms previous methods in accuracy and computational efficiency on benchmark tasks involving extremely long time scales.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
One-vs-the-Rest Loss to Focus on Important Samples in Adversarial Training
Authors:
Sekitoshi Kanai,
Shin'ya Yamaguchi,
Masanori Yamada,
Hiroshi Takahashi,
Kentaro Ohno,
Yasutoshi Ida
Abstract:
This paper proposes a new loss function for adversarial training. Since adversarial training has difficulties, e.g., necessity of high model capacity, focusing on important data points by weighting cross-entropy loss has attracted much attention. However, they are vulnerable to sophisticated attacks, e.g., Auto-Attack. This paper experimentally reveals that the cause of their vulnerability is thei…
▽ More
This paper proposes a new loss function for adversarial training. Since adversarial training has difficulties, e.g., necessity of high model capacity, focusing on important data points by weighting cross-entropy loss has attracted much attention. However, they are vulnerable to sophisticated attacks, e.g., Auto-Attack. This paper experimentally reveals that the cause of their vulnerability is their small margins between logits for the true label and the other labels. Since neural networks classify the data points based on the logits, logit margins should be large enough to avoid flip** the largest logit by the attacks. Importance-aware methods do not increase logit margins of important samples but decrease those of less-important samples compared with cross-entropy loss. To increase logit margins of important samples, we propose switching one-vs-the-rest loss (SOVR), which switches from cross-entropy to one-vs-the-rest loss for important samples that have small logit margins. We prove that one-vs-the-rest loss increases logit margins two times larger than the weighted cross-entropy loss for a simple problem. We experimentally confirm that SOVR increases logit margins of important samples unlike existing methods and achieves better robustness against Auto-Attack than importance-aware methods.
△ Less
Submitted 26 April, 2023; v1 submitted 20 July, 2022;
originally announced July 2022.
-
Meta-ticket: Finding optimal subnetworks for few-shot learning within randomly initialized neural networks
Authors:
Daiki Chijiwa,
Shin'ya Yamaguchi,
Atsutoshi Kumagai,
Yasutoshi Ida
Abstract:
Few-shot learning for neural networks (NNs) is an important problem that aims to train NNs with a few data. The main challenge is how to avoid overfitting since over-parameterized NNs can easily overfit to such small dataset. Previous work (e.g. MAML by Finn et al. 2017) tackles this challenge by meta-learning, which learns how to learn from a few data by using various tasks. On the other hand, on…
▽ More
Few-shot learning for neural networks (NNs) is an important problem that aims to train NNs with a few data. The main challenge is how to avoid overfitting since over-parameterized NNs can easily overfit to such small dataset. Previous work (e.g. MAML by Finn et al. 2017) tackles this challenge by meta-learning, which learns how to learn from a few data by using various tasks. On the other hand, one conventional approach to avoid overfitting is restricting hypothesis spaces by endowing sparse NN structures like convolution layers in computer vision. However, although such manually-designed sparse structures are sample-efficient for sufficiently large datasets, they are still insufficient for few-shot learning. Then the following questions naturally arise: (1) Can we find sparse structures effective for few-shot learning by meta-learning? (2) What benefits will it bring in terms of meta-generalization? In this work, we propose a novel meta-learning approach, called Meta-ticket, to find optimal sparse subnetworks for few-shot learning within randomly initialized NNs. We empirically validated that Meta-ticket successfully discover sparse subnetworks that can learn specialized features for each given task. Due to this task-wise adaptation ability, Meta-ticket achieves superior meta-generalization compared to MAML-based methods especially with large NNs. The code is available at: https://github.com/dchiji-ntt/meta-ticket
△ Less
Submitted 9 February, 2023; v1 submitted 31 May, 2022;
originally announced May 2022.
-
Pruning Randomly Initialized Neural Networks with Iterative Randomization
Authors:
Daiki Chijiwa,
Shin'ya Yamaguchi,
Yasutoshi Ida,
Kenji Umakoshi,
Tomohiro Inoue
Abstract:
Pruning the weights of randomly initialized neural networks plays an important role in the context of lottery ticket hypothesis. Ramanujan et al. (2020) empirically showed that only pruning the weights can achieve remarkable performance instead of optimizing the weight values. However, to achieve the same level of performance as the weight optimization, the pruning approach requires more parameter…
▽ More
Pruning the weights of randomly initialized neural networks plays an important role in the context of lottery ticket hypothesis. Ramanujan et al. (2020) empirically showed that only pruning the weights can achieve remarkable performance instead of optimizing the weight values. However, to achieve the same level of performance as the weight optimization, the pruning approach requires more parameters in the networks before pruning and thus more memory space. To overcome this parameter inefficiency, we introduce a novel framework to prune randomly initialized neural networks with iteratively randomizing weight values (IteRand). Theoretically, we prove an approximation theorem in our framework, which indicates that the randomizing operations are provably effective to reduce the required number of the parameters. We also empirically demonstrate the parameter efficiency in multiple experiments on CIFAR-10 and ImageNet. The code is available at: https://github.com/dchiji-ntt/iterand
△ Less
Submitted 5 April, 2022; v1 submitted 17 June, 2021;
originally announced June 2021.
-
Smoothness Analysis of Adversarial Training
Authors:
Sekitoshi Kanai,
Masanori Yamada,
Hiroshi Takahashi,
Yuki Yamanaka,
Yasutoshi Ida
Abstract:
Deep neural networks are vulnerable to adversarial attacks. Recent studies about adversarial robustness focus on the loss landscape in the parameter space since it is related to optimization and generalization performance. These studies conclude that the difficulty of adversarial training is caused by the non-smoothness of the loss function: i.e., its gradient is not Lipschitz continuous. However,…
▽ More
Deep neural networks are vulnerable to adversarial attacks. Recent studies about adversarial robustness focus on the loss landscape in the parameter space since it is related to optimization and generalization performance. These studies conclude that the difficulty of adversarial training is caused by the non-smoothness of the loss function: i.e., its gradient is not Lipschitz continuous. However, this analysis ignores the dependence of adversarial attacks on model parameters. Since adversarial attacks are optimized for models, they should depend on the parameters. Considering this dependence, we analyze the smoothness of the loss function of adversarial training using the optimal attacks for the model parameter in more detail. We reveal that the constraint of adversarial attacks is one cause of the non-smoothness and that the smoothness depends on the types of the constraints. Specifically, the $L_\infty$ constraint can cause non-smoothness more than the $L_2$ constraint. Moreover, our analysis implies that if we flatten the loss function with respect to input data, the Lipschitz constant of the gradient of adversarial loss tends to increase. To address the non-smoothness, we show that EntropySGD smoothens the non-smooth loss and improves the performance of adversarial training.
△ Less
Submitted 15 June, 2021; v1 submitted 1 March, 2021;
originally announced March 2021.
-
The Thermodynamic Approach to Whole-Life Insurance: A Method for Evaluation of Surrender Risk
Authors:
JirĂ´ Akahori,
Yuuki Ida,
Maho Nishida,
Shuji Tamada
Abstract:
We introduce a collective model for life insurance where the heterogeneity of each insured, including the health state, is modeled by a diffusion process. This model is influenced by concepts in statistical mechanics. Using the proposed framework, one can describe the total pay-off as a functional of the diffusion process, which can be used to derive a level premium that evaluates the risk of laps…
▽ More
We introduce a collective model for life insurance where the heterogeneity of each insured, including the health state, is modeled by a diffusion process. This model is influenced by concepts in statistical mechanics. Using the proposed framework, one can describe the total pay-off as a functional of the diffusion process, which can be used to derive a level premium that evaluates the risk of lapses due tothe so-called adverse selection. Two numerically tractable models are presented to exemplify the flexibility of the proposed framework.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
Constraining Logits by Bounded Function for Adversarial Robustness
Authors:
Sekitoshi Kanai,
Masanori Yamada,
Shin'ya Yamaguchi,
Hiroshi Takahashi,
Yasutoshi Ida
Abstract:
We propose a method for improving adversarial robustness by addition of a new bounded function just before softmax. Recent studies hypothesize that small logits (inputs of softmax) by logit regularization can improve adversarial robustness of deep learning. Following this hypothesis, we analyze norms of logit vectors at the optimal point under the assumption of universal approximation and explore…
▽ More
We propose a method for improving adversarial robustness by addition of a new bounded function just before softmax. Recent studies hypothesize that small logits (inputs of softmax) by logit regularization can improve adversarial robustness of deep learning. Following this hypothesis, we analyze norms of logit vectors at the optimal point under the assumption of universal approximation and explore new methods for constraining logits by addition of a bounded function before softmax. We theoretically and empirically reveal that small logits by addition of a common activation function, e.g., hyperbolic tangent, do not improve adversarial robustness since input vectors of the function (pre-logit vectors) can have large norms. From the theoretical findings, we develop the new bounded function. The addition of our function improves adversarial robustness because it makes logit and pre-logit vectors have small norms. Since our method only adds one activation function before softmax, it is easy to combine our method with adversarial training. Our experiments demonstrate that our method is comparable to logit regularization methods in terms of accuracies on adversarially perturbed datasets without adversarial training. Furthermore, it is superior or comparable to logit regularization methods and a recent defense method (TRADES) when using adversarial training.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
Absum: Simple Regularization Method for Reducing Structural Sensitivity of Convolutional Neural Networks
Authors:
Sekitoshi Kanai,
Yasutoshi Ida,
Yasuhiro Fujiwara,
Masanori Yamada,
Shuichi Adachi
Abstract:
We propose Absum, which is a regularization method for improving adversarial robustness of convolutional neural networks (CNNs). Although CNNs can accurately recognize images, recent studies have shown that the convolution operations in CNNs commonly have structural sensitivity to specific noise composed of Fourier basis functions. By exploiting this sensitivity, they proposed a simple black-box a…
▽ More
We propose Absum, which is a regularization method for improving adversarial robustness of convolutional neural networks (CNNs). Although CNNs can accurately recognize images, recent studies have shown that the convolution operations in CNNs commonly have structural sensitivity to specific noise composed of Fourier basis functions. By exploiting this sensitivity, they proposed a simple black-box adversarial attack: Single Fourier attack. To reduce structural sensitivity, we can use regularization of convolution filter weights since the sensitivity of linear transform can be assessed by the norm of the weights. However, standard regularization methods can prevent minimization of the loss function because they impose a tight constraint for obtaining high robustness. To solve this problem, Absum imposes a loose constraint; it penalizes the absolute values of the summation of the parameters in the convolution layers. Absum can improve robustness against single Fourier attack while being as simple and efficient as standard regularization methods (e.g., weight decay and L1 regularization). Our experiments demonstrate that Absum improves robustness against single Fourier attack more than standard regularization methods. Furthermore, we reveal that robust CNNs with Absum are more robust against transferred attacks due to decreasing the common sensitivity and against high-frequency noise than standard regularization methods. We also reveal that Absum can improve robustness against gradient-based attacks (projected gradient descent) when used with adversarial training.
△ Less
Submitted 19 September, 2019;
originally announced September 2019.
-
Network Implosion: Effective Model Compression for ResNets via Static Layer Pruning and Retraining
Authors:
Yasutoshi Ida,
Yasuhiro Fujiwara
Abstract:
Residual Networks with convolutional layers are widely used in the field of machine learning. Since they effectively extract features from input data by stacking multiple layers, they can achieve high accuracy in many applications. However, the stacking of many layers raises their computation costs. To address this problem, we propose Network Implosion, it erases multiple layers from Residual Netw…
▽ More
Residual Networks with convolutional layers are widely used in the field of machine learning. Since they effectively extract features from input data by stacking multiple layers, they can achieve high accuracy in many applications. However, the stacking of many layers raises their computation costs. To address this problem, we propose Network Implosion, it erases multiple layers from Residual Networks without degrading accuracy. Our key idea is to introduce a priority term that identifies the importance of a layer; we can select unimportant layers according to the priority and erase them after the training. In addition, we retrain the networks to avoid critical drops in accuracy after layer erasure. A theoretical assessment reveals that our erasure and retraining scheme can erase layers without accuracy drop, and achieve higher accuracy than is possible with training from scratch. Our experiments show that Network Implosion can, for classification on Cifar-10/100 and ImageNet, reduce the number of layers by 24.00 to 42.86 percent without any drop in accuracy.
△ Less
Submitted 10 June, 2019;
originally announced June 2019.
-
Towards the Exact Simulation Using Hyperbolic Brownian Motion
Authors:
Yuuki Ida,
Yuri Imamura
Abstract:
In the present paper, an expansion of the transition density of Hyperbolic Brownian motion with drift is given, which is potentially useful for pricing and hedging of options under stochastic volatility models. We work on a condition on the drift which dramatically simplifies the proof.
In the present paper, an expansion of the transition density of Hyperbolic Brownian motion with drift is given, which is potentially useful for pricing and hedging of options under stochastic volatility models. We work on a condition on the drift which dramatically simplifies the proof.
△ Less
Submitted 2 May, 2017;
originally announced May 2017.
-
Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks
Authors:
Yasutoshi Ida,
Yasuhiro Fujiwara,
Sotetsu Iwamura
Abstract:
Adaptive learning rate algorithms such as RMSProp are widely used for training deep neural networks. RMSProp offers efficient training since it uses first order gradients to approximate Hessian-based preconditioning. However, since the first order gradients include noise caused by stochastic optimization, the approximation may be inaccurate. In this paper, we propose a novel adaptive learning rate…
▽ More
Adaptive learning rate algorithms such as RMSProp are widely used for training deep neural networks. RMSProp offers efficient training since it uses first order gradients to approximate Hessian-based preconditioning. However, since the first order gradients include noise caused by stochastic optimization, the approximation may be inaccurate. In this paper, we propose a novel adaptive learning rate algorithm called SDProp. Its key idea is effective handling of the noise by preconditioning based on covariance matrix. For various neural networks, our approach is more efficient and effective than RMSProp and its variant.
△ Less
Submitted 28 September, 2017; v1 submitted 31 May, 2016;
originally announced May 2016.
-
Polar Antiferromagnets Produced with Orbital-Order
Authors:
Naoki Ogawa,
Yasushi Ogimoto,
Yoshiaki Ida,
Yusuke Nomura,
Ryotaro Arita,
Kenjiro Miyano
Abstract:
Polar magnetic states are realized in pseudocubic manganite thin films fabricated on high-index substrates, in which a Jahn-Teller (JT) distortion remains an active variable. Several types of orbital-orders were found to develop large optical second harmonic generation, signaling broken-inversion-symmetry distinct from their bulk forms and films on (100) substrates. The observed symmetry-lifting a…
▽ More
Polar magnetic states are realized in pseudocubic manganite thin films fabricated on high-index substrates, in which a Jahn-Teller (JT) distortion remains an active variable. Several types of orbital-orders were found to develop large optical second harmonic generation, signaling broken-inversion-symmetry distinct from their bulk forms and films on (100) substrates. The observed symmetry-lifting and first-principles calculation both indicate that the modified JT q2 mode drives Mn-site off-centering upon orbital order, leading to the possible cooperation of "Mn-site polarization" and magnetism.
△ Less
Submitted 5 December, 2011; v1 submitted 5 December, 2011;
originally announced December 2011.