Search | arXiv e-print repository

Test-time Adaptation Meets Image Enhancement: Improving Accuracy via Uncertainty-aware Logit Switching

Authors: Shohei Enomoto, Naoya Hasegawa, Kazuki Adachi, Taku Sasaki, Shin'ya Yamaguchi, Satoshi Suzuki, Takeharu Eda

Abstract: Deep neural networks have achieved remarkable success in a variety of computer vision applications. However, there is a problem of degrading accuracy when the data distribution shifts between training and testing. As a solution of this problem, Test-time Adaptation~(TTA) has been well studied because of its practicality. Although TTA methods increase accuracy under distribution shift by updating t… ▽ More Deep neural networks have achieved remarkable success in a variety of computer vision applications. However, there is a problem of degrading accuracy when the data distribution shifts between training and testing. As a solution of this problem, Test-time Adaptation~(TTA) has been well studied because of its practicality. Although TTA methods increase accuracy under distribution shift by updating the model at test time, using high-uncertainty predictions is known to degrade accuracy. Since the input image is the root of the distribution shift, we incorporate a new perspective on enhancing the input image into TTA methods to reduce the prediction's uncertainty. We hypothesize that enhancing the input image reduces prediction's uncertainty and increase the accuracy of TTA methods. On the basis of our hypothesis, we propose a novel method: Test-time Enhancer and Classifier Adaptation~(TECA). In TECA, the classification model is combined with the image enhancement model that transforms input images into recognition-friendly ones, and these models are updated by existing TTA methods. Furthermore, we found that the prediction from the enhanced image does not always have lower uncertainty than the prediction from the original image. Thus, we propose logit switching, which compares the uncertainty measure of these predictions and outputs the lower one. In our experiments, we evaluate TECA with various TTA methods and show that TECA reduces prediction's uncertainty and increases accuracy of TTA methods despite having no hyperparameters and little parameter overhead. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted to IJCNN2024

arXiv:2403.14114 [pdf, other]

Test-time Similarity Modification for Person Re-identification toward Temporal Distribution Shift

Authors: Kazuki Adachi, Shohei Enomoto, Taku Sasaki, Shin'ya Yamaguchi

Abstract: Person re-identification (re-id), which aims to retrieve images of the same person in a given image from a database, is one of the most practical image recognition applications. In the real world, however, the environments that the images are taken from change over time. This causes a distribution shift between training and testing and degrades the performance of re-id. To maintain re-id performan… ▽ More Person re-identification (re-id), which aims to retrieve images of the same person in a given image from a database, is one of the most practical image recognition applications. In the real world, however, the environments that the images are taken from change over time. This causes a distribution shift between training and testing and degrades the performance of re-id. To maintain re-id performance, models should continue adapting to the test environment's temporal changes. Test-time adaptation (TTA), which aims to adapt models to the test environment with only unlabeled test data, is a promising way to handle this problem because TTA can adapt models instantly in the test environment. However, the previous TTA methods are designed for classification and cannot be directly applied to re-id. This is because the set of people's identities in the dataset differs between training and testing in re-id, whereas the set of classes is fixed in the current TTA methods designed for classification. To improve re-id performance in changing test environments, we propose TEst-time similarity Modification for Person re-identification (TEMP), a novel TTA method for re-id. TEMP is the first fully TTA method for re-id, which does not require any modification to pre-training. Inspired by TTA methods that refine the prediction uncertainty in classification, we aim to refine the uncertainty in re-id. However, the uncertainty cannot be computed in the same way as classification in re-id since it is an open-set task, which does not share person labels between training and testing. Hence, we propose re-id entropy, an alternative uncertainty measure for re-id computed based on the similarity between the feature vectors. Experiments show that the re-id entropy can measure the uncertainty on re-id and TEMP improves the performance of re-id in online settings where the distribution changes over time. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted to IJCNN2024

arXiv:2403.10097 [pdf, other]

Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks

Authors: Shin'ya Yamaguchi, Sekitoshi Kanai, Kazuki Adachi, Daiki Chijiwa

Abstract: While fine-tuning is a de facto standard method for training deep neural networks, it still suffers from overfitting when using small target datasets. Previous methods improve fine-tuning performance by maintaining knowledge of the source datasets or introducing regularization terms such as contrastive loss. However, these methods require auxiliary source information (e.g., source labels or datase… ▽ More While fine-tuning is a de facto standard method for training deep neural networks, it still suffers from overfitting when using small target datasets. Previous methods improve fine-tuning performance by maintaining knowledge of the source datasets or introducing regularization terms such as contrastive loss. However, these methods require auxiliary source information (e.g., source labels or datasets) or heavy additional computations. In this paper, we propose a simple method called adaptive random feature regularization (AdaRand). AdaRand helps the feature extractors of training models to adaptively change the distribution of feature vectors for downstream classification tasks without auxiliary source information and with reasonable computation costs. To this end, AdaRand minimizes the gap between feature vectors and random reference vectors that are sampled from class conditional Gaussian distributions. Furthermore, AdaRand dynamically updates the conditional distribution to follow the currently updated feature extractors and balance the distance between classes in feature spaces. Our experiments show that AdaRand outperforms the other fine-tuning regularization, which requires auxiliary source information and heavy computation costs. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2311.13090 [pdf, other]

On the Limitation of Diffusion Models for Synthesizing Training Datasets

Authors: Shin'ya Yamaguchi, Takuma Fukuda

Abstract: Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real datasets even when using state-of-the-art diffusion models. This means that modern diffusion models do not perfectly represent the data distribution for the purpos… ▽ More Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real datasets even when using state-of-the-art diffusion models. This means that modern diffusion models do not perfectly represent the data distribution for the purpose of replicating datasets for training discriminative tasks. This paper investigates the gap between synthetic and real samples by analyzing the synthetic samples reconstructed from real samples through the diffusion and reverse process. By varying the time steps starting the reverse process in the reconstruction, we can control the trade-off between the information in the original real data and the information added by diffusion models. Through assessing the reconstructed samples and trained models, we found that the synthetic data are concentrated in modes of the training data distribution as the reverse step increases, and thus, they are difficult to cover the outer edges of the distribution. Our findings imply that modern diffusion models are insufficient to replicate training data distribution perfectly, and there is room for the improvement of generative modeling in the replication of training datasets. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: NeurIPS 2023 SyntheticData4ML Workshop

arXiv:2310.03913 [pdf, other]

TRAIL Team Description Paper for RoboCup@Home 2023

Authors: Chikaha Tsuji, Dai Komukai, Mimo Shirasaka, Hikaru Wada, Tsunekazu Omija, Aoi Horo, Daiki Furuta, Saki Yamaguchi, So Ikoma, Soshi Tsunashima, Masato Kobayashi, Koki Ishimoto, Yuya Ikeda, Tatsuya Matsushima, Yusuke Iwasawa, Yutaka Matsuo

Abstract: Our team, TRAIL, consists of AI/ML laboratory members from The University of Tokyo. We leverage our extensive research experience in state-of-the-art machine learning to build general-purpose in-home service robots. We previously participated in two competitions using Human Support Robot (HSR): RoboCup@Home Japan Open 2020 (DSPL) and World Robot Summit 2020, equivalent to RoboCup World Tournament.… ▽ More Our team, TRAIL, consists of AI/ML laboratory members from The University of Tokyo. We leverage our extensive research experience in state-of-the-art machine learning to build general-purpose in-home service robots. We previously participated in two competitions using Human Support Robot (HSR): RoboCup@Home Japan Open 2020 (DSPL) and World Robot Summit 2020, equivalent to RoboCup World Tournament. Throughout the competitions, we showed that a data-driven approach is effective for performing in-home tasks. Aiming for further development of building a versatile and fast-adaptable system, in RoboCup @Home 2023, we unify three technologies that have recently been evaluated as components in the fields of deep learning and robot learning into a real household robot system. In addition, to stimulate research all over the RoboCup@Home community, we build a platform that manages data collected from each site belonging to the community around the world, taking advantage of the characteristics of the community. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2309.16143 [pdf, other]

Generative Semi-supervised Learning with Meta-Optimized Synthetic Samples

Authors: Shin'ya Yamaguchi

Abstract: Semi-supervised learning (SSL) is a promising approach for training deep classification models using labeled and unlabeled datasets. However, existing SSL methods rely on a large unlabeled dataset, which may not always be available in many real-world applications due to legal constraints (e.g., GDPR). In this paper, we investigate the research question: Can we train SSL models without real unlabel… ▽ More Semi-supervised learning (SSL) is a promising approach for training deep classification models using labeled and unlabeled datasets. However, existing SSL methods rely on a large unlabeled dataset, which may not always be available in many real-world applications due to legal constraints (e.g., GDPR). In this paper, we investigate the research question: Can we train SSL models without real unlabeled datasets? Instead of using real unlabeled datasets, we propose an SSL method using synthetic datasets generated from generative foundation models trained on datasets containing millions of samples in diverse domains (e.g., ImageNet). Our main concepts are identifying synthetic samples that emulate unlabeled samples from generative foundation models and training classifiers using these synthetic samples. To achieve this, our method is formulated as an alternating optimization problem: (i) meta-learning of generative foundation models and (ii) SSL of classifiers using real labeled and synthetic unlabeled samples. For (i), we propose a meta-learning objective that optimizes latent variables to generate samples that resemble real labeled samples and minimize the validation loss. For (ii), we propose a simple unsupervised loss function that regularizes the feature extractors of classifiers to maximize the performance improvement obtained from synthetic samples. We confirm that our method outperforms baselines using generative foundation models on SSL. We also demonstrate that our methods outperform SSL using real unlabeled datasets in scenarios with extremely small amounts of labeled datasets. This suggests that synthetic samples have the potential to provide improvement gains more efficiently than real unlabeled data. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: Accepted to the 15th Asian Conference on Machine Learning (ACML2023); a preprint of the camera-ready version

arXiv:2308.16454 [pdf, other]

Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff

Authors: Satoshi Suzuki, Shin'ya Yamaguchi, Shoichiro Takeda, Sekitoshi Kanai, Naoki Makishima, Atsushi Ando, Ryo Masumura

Abstract: This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs). Although adversarial training (AT) improves robustness, it degrades the standard accuracy, thus yielding the tradeoff. To mitigate this tradeoff, we propose a novel AT method called ARREST, which comprises three components: (i) adversarial finetu… ▽ More This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs). Although adversarial training (AT) improves robustness, it degrades the standard accuracy, thus yielding the tradeoff. To mitigate this tradeoff, we propose a novel AT method called ARREST, which comprises three components: (i) adversarial finetuning (AFT), (ii) representation-guided knowledge distillation (RGKD), and (iii) noisy replay (NR). AFT trains a DNN on adversarial examples by initializing its parameters with a DNN that is standardly pretrained on clean examples. RGKD and NR respectively entail a regularization term and an algorithm to preserve latent representations of clean examples during AFT. RGKD penalizes the distance between the representations of the standardly pretrained and AFT DNNs. NR switches input adversarial examples to nonadversarial ones when the representation changes significantly during AFT. By combining these components, ARREST achieves both high standard accuracy and robustness. Experimental results demonstrate that ARREST mitigates the tradeoff more effectively than previous AT-based methods do. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: Accepted by International Conference on Computer Vision (ICCV) 2023

arXiv:2307.13899 [pdf, other]

Regularizing Neural Networks with Meta-Learning Generative Models

Authors: Shin'ya Yamaguchi, Daiki Chijiwa, Sekitoshi Kanai, Atsutoshi Kumagai, Hisashi Kashima

Abstract: This paper investigates methods for improving generative data augmentation for deep learning. Generative data augmentation leverages the synthetic samples produced by generative models as an additional dataset for classification with small dataset settings. A key challenge of generative data augmentation is that the synthetic data contain uninformative samples that degrade accuracy. This is becaus… ▽ More This paper investigates methods for improving generative data augmentation for deep learning. Generative data augmentation leverages the synthetic samples produced by generative models as an additional dataset for classification with small dataset settings. A key challenge of generative data augmentation is that the synthetic data contain uninformative samples that degrade accuracy. This is because the synthetic samples do not perfectly represent class categories in real data and uniform sampling does not necessarily provide useful samples for tasks. In this paper, we present a novel strategy for generative data augmentation called meta generative regularization (MGR). To avoid the degradation of generative data augmentation, MGR utilizes synthetic samples in the regularization term for feature extractors instead of in the loss function, e.g., cross-entropy. These synthetic samples are dynamically determined to minimize the validation losses through meta-learning. We observed that MGR can avoid the performance degradation of naïve generative data augmentation and boost the baselines. Experiments on six datasets showed that MGR is effective particularly when datasets are smaller and stably outperforms baselines. △ Less

Submitted 23 October, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2306.10656 [pdf, other]

Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

Authors: Kenta Oono, Nontawat Charoenphakdee, Kotatsu Bito, Zhengyan Gao, Yoshiaki Ota, Shoichiro Yamaguchi, Yohei Sugawara, Shin-ichi Maeda, Kunihiko Miyoshi, Yuki Saito, Koki Tsuda, Hiroshi Maruyama, Kohei Hayashi

Abstract: Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental conditions. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose Virtual Human Generative Model (VHGM), a machine learning model for estimating attributes about healt… ▽ More Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental conditions. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose Virtual Human Generative Model (VHGM), a machine learning model for estimating attributes about healthcare, lifestyles, and personalities. VHGM is a deep generative model trained with masked modeling to learn the joint distribution of attributes conditioned on known ones. Using heterogeneous tabular datasets, VHGM learns more than 1,800 attributes efficiently. We numerically evaluate the performance of VHGM and its training techniques. As a proof-of-concept of VHGM, we present several applications demonstrating user scenarios, such as virtual measurements of healthcare attributes and hypothesis verifications of lifestyles. △ Less

Submitted 14 August, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

Comments: 14 pages, 4 figures

arXiv:2306.05641 [pdf, other]

Revisiting Permutation Symmetry for Merging Models between Different Datasets

Authors: Masanori Yamada, Tomoya Yamashita, Shin'ya Yamaguchi, Daiki Chijiwa

Abstract: Model merging is a new approach to creating a new model by combining the weights of different trained models. Previous studies report that model merging works well for models trained on a single dataset with different random seeds, while model merging between different datasets is difficult. Merging knowledge from different datasets has practical significance, but it has not been well investigated… ▽ More Model merging is a new approach to creating a new model by combining the weights of different trained models. Previous studies report that model merging works well for models trained on a single dataset with different random seeds, while model merging between different datasets is difficult. Merging knowledge from different datasets has practical significance, but it has not been well investigated. In this paper, we investigate the properties of merging models between different datasets. Through theoretical and empirical analyses, we find that the accuracy of the merged model decreases more significantly as the datasets diverge more and that the different loss landscapes for each dataset make model merging between different datasets difficult. We also show that merged models require datasets for merging in order to achieve a high accuracy. Furthermore, we show that condensed datasets created by dataset condensation can be used as substitutes for the original datasets when merging models. We conduct experiments for model merging between different datasets. When merging between MNIST and Fashion- MNIST models, the accuracy significantly improves by 28% using the dataset and 25% using the condensed dataset compared with not using the dataset. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: 18 pages; comments are welcome

arXiv:2304.12304 [pdf]

A Survey on Multi-Resident Activity Recognition in Smart Environments

Authors: Farhad MortezaPour Shiri, Thinagaran Perumal, Norwati Mustapha, Raihani Mohamed, Mohd Anuaruddin Bin Ahmadon, Shingo Yamaguchi

Abstract: Human activity recognition (HAR) is a rapidly growing field that utilizes smart devices, sensors, and algorithms to automatically classify and identify the actions of individuals within a given environment. These systems have a wide range of applications, including assisting with caring tasks, increasing security, and improving energy efficiency. However, there are several challenges that must be… ▽ More Human activity recognition (HAR) is a rapidly growing field that utilizes smart devices, sensors, and algorithms to automatically classify and identify the actions of individuals within a given environment. These systems have a wide range of applications, including assisting with caring tasks, increasing security, and improving energy efficiency. However, there are several challenges that must be addressed in order to effectively utilize HAR systems in multi-resident environments. One of the key challenges is accurately associating sensor observations with the identities of the individuals involved, which can be particularly difficult when residents are engaging in complex and collaborative activities. This paper provides a brief overview of the design and implementation of HAR systems, including a summary of the various data collection devices and approaches used for human activity identification. It also reviews previous research on the use of these systems in multi-resident environments and offers conclusions on the current state of the art in the field. △ Less

Submitted 24 April, 2023; originally announced April 2023.

Comments: 16 pages, to appear in Evolution of Information, Communication and Computing Systems (EICCS) Book Series

arXiv:2210.05268 [pdf, other]

Component-Wise Natural Gradient Descent -- An Efficient Neural Network Optimization

Authors: Tran Van Sang, Mhd Irvan, Rie Shigetomi Yamaguchi, Toshiyuki Nakata

Abstract: Natural Gradient Descent (NGD) is a second-order neural network training that preconditions the gradient descent with the inverse of the Fisher Information Matrix (FIM). Although NGD provides an efficient preconditioner, it is not practicable due to the expensive computation required when inverting the FIM. This paper proposes a new NGD variant algorithm named Component-Wise Natural Gradient Desce… ▽ More Natural Gradient Descent (NGD) is a second-order neural network training that preconditions the gradient descent with the inverse of the Fisher Information Matrix (FIM). Although NGD provides an efficient preconditioner, it is not practicable due to the expensive computation required when inverting the FIM. This paper proposes a new NGD variant algorithm named Component-Wise Natural Gradient Descent (CW-NGD). CW-NGD is composed of 2 steps. Similar to several existing works, the first step is to consider the FIM matrix as a block-diagonal matrix whose diagonal blocks correspond to the FIM of each layer's weights. In the second step, unique to CW-NGD, we analyze the layer's structure and further decompose the layer's FIM into smaller segments whose derivatives are approximately independent. As a result, individual layers' FIMs are approximated in a block-diagonal form that trivially supports the inversion. The segment decomposition strategy is varied by layer structure. Specifically, we analyze the dense and convolutional layers and design their decomposition strategies appropriately. In an experiment of training a network containing these 2 types of layers, we empirically prove that CW-NGD requires fewer iterations to converge compared to the state-of-the-art first-order and second-order methods. △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2207.10283 [pdf, other]

One-vs-the-Rest Loss to Focus on Important Samples in Adversarial Training

Authors: Sekitoshi Kanai, Shin'ya Yamaguchi, Masanori Yamada, Hiroshi Takahashi, Kentaro Ohno, Yasutoshi Ida

Abstract: This paper proposes a new loss function for adversarial training. Since adversarial training has difficulties, e.g., necessity of high model capacity, focusing on important data points by weighting cross-entropy loss has attracted much attention. However, they are vulnerable to sophisticated attacks, e.g., Auto-Attack. This paper experimentally reveals that the cause of their vulnerability is thei… ▽ More This paper proposes a new loss function for adversarial training. Since adversarial training has difficulties, e.g., necessity of high model capacity, focusing on important data points by weighting cross-entropy loss has attracted much attention. However, they are vulnerable to sophisticated attacks, e.g., Auto-Attack. This paper experimentally reveals that the cause of their vulnerability is their small margins between logits for the true label and the other labels. Since neural networks classify the data points based on the logits, logit margins should be large enough to avoid flip** the largest logit by the attacks. Importance-aware methods do not increase logit margins of important samples but decrease those of less-important samples compared with cross-entropy loss. To increase logit margins of important samples, we propose switching one-vs-the-rest loss (SOVR), which switches from cross-entropy to one-vs-the-rest loss for important samples that have small logit margins. We prove that one-vs-the-rest loss increases logit margins two times larger than the weighted cross-entropy loss for a simple problem. We experimentally confirm that SOVR increases logit margins of important samples unlike existing methods and achieves better robustness against Auto-Attack than importance-aware methods. △ Less

Submitted 26 April, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

Comments: ICML2023, 26 pages, 19 figures

arXiv:2205.15619 [pdf, other]

Meta-ticket: Finding optimal subnetworks for few-shot learning within randomly initialized neural networks

Authors: Daiki Chijiwa, Shin'ya Yamaguchi, Atsutoshi Kumagai, Yasutoshi Ida

Abstract: Few-shot learning for neural networks (NNs) is an important problem that aims to train NNs with a few data. The main challenge is how to avoid overfitting since over-parameterized NNs can easily overfit to such small dataset. Previous work (e.g. MAML by Finn et al. 2017) tackles this challenge by meta-learning, which learns how to learn from a few data by using various tasks. On the other hand, on… ▽ More Few-shot learning for neural networks (NNs) is an important problem that aims to train NNs with a few data. The main challenge is how to avoid overfitting since over-parameterized NNs can easily overfit to such small dataset. Previous work (e.g. MAML by Finn et al. 2017) tackles this challenge by meta-learning, which learns how to learn from a few data by using various tasks. On the other hand, one conventional approach to avoid overfitting is restricting hypothesis spaces by endowing sparse NN structures like convolution layers in computer vision. However, although such manually-designed sparse structures are sample-efficient for sufficiently large datasets, they are still insufficient for few-shot learning. Then the following questions naturally arise: (1) Can we find sparse structures effective for few-shot learning by meta-learning? (2) What benefits will it bring in terms of meta-generalization? In this work, we propose a novel meta-learning approach, called Meta-ticket, to find optimal sparse subnetworks for few-shot learning within randomly initialized NNs. We empirically validated that Meta-ticket successfully discover sparse subnetworks that can learn specialized features for each given task. Due to this task-wise adaptation ability, Meta-ticket achieves superior meta-generalization compared to MAML-based methods especially with large NNs. The code is available at: https://github.com/dchiji-ntt/meta-ticket △ Less

Submitted 9 February, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

arXiv:2204.13263 [pdf, other]

Covariance-aware Feature Alignment with Pre-computed Source Statistics for Test-time Adaptation to Multiple Image Corruptions

Authors: Kazuki Adachi, Shin'ya Yamaguchi, Atsutoshi Kumagai

Abstract: Real-world image recognition systems often face corrupted input images, which cause distribution shifts and degrade the performance of models. These systems often use a single prediction model in a central server and process images sent from various environments, such as cameras distributed in cities or cars. Such single models face images corrupted in heterogeneous ways in test time. Thus, they r… ▽ More Real-world image recognition systems often face corrupted input images, which cause distribution shifts and degrade the performance of models. These systems often use a single prediction model in a central server and process images sent from various environments, such as cameras distributed in cities or cars. Such single models face images corrupted in heterogeneous ways in test time. Thus, they require to instantly adapt to the multiple corruptions during testing rather than being re-trained at a high cost. Test-time adaptation (TTA), which aims to adapt models without accessing the training dataset, is one of the settings that can address this problem. Existing TTA methods indeed work well on a single corruption. However, the adaptation ability is limited when multiple types of corruption occur, which is more realistic. We hypothesize this is because the distribution shift is more complicated, and the adaptation becomes more difficult in case of multiple corruptions. In fact, we experimentally found that a larger distribution gap remains after TTA. To address the distribution gap during testing, we propose a novel TTA method named Covariance-Aware Feature alignment (CAFe). We empirically show that CAFe outperforms prior TTA methods on image corruptions, including multiple types of corruptions. △ Less

Submitted 29 June, 2023; v1 submitted 27 April, 2022; originally announced April 2022.

Comments: Extended version of the paper accepted to ICIP 2023

arXiv:2204.12833 [pdf, other]

Transfer Learning with Pre-trained Conditional Generative Models

Authors: Shin'ya Yamaguchi, Sekitoshi Kanai, Atsutoshi Kumagai, Daiki Chijiwa, Hisashi Kashima

Abstract: Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods always assume at least one of (i) source and target task label spaces overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, holding these assumptions is difficult in practical settings because the target task ra… ▽ More Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods always assume at least one of (i) source and target task label spaces overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, holding these assumptions is difficult in practical settings because the target task rarely has the same labels as the source task, the source dataset access is restricted due to storage costs and privacy, and the target architecture is often specialized to each task. To transfer source knowledge without these assumptions, we propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training (PP) and pseudo semi-supervised learning (P-SSL). PP trains a target architecture with an artificial dataset synthesized by using conditional source generative models. P-SSL applies SSL algorithms to labeled target data and unlabeled pseudo samples, which are generated by cascading the source classifier and generative models to condition them with target samples. Our experimental results indicate that our method can outperform the baselines of scratch training and knowledge distillation. △ Less

Submitted 29 September, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

Comments: 24 pages, 6 figures

arXiv:2202.13062 [pdf, other]

Learning-based Collision-free Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

Authors: Tomoki Ando, Hiroto Iino, Hiroki Mori, Ryota Torishima, Kuniyuki Takahashi, Shoichiro Yamaguchi, Daisuke Okanohara, Tetsuya Ogata

Abstract: We propose a new method for collision-free planning using Conditional Generative Adversarial Networks (cGANs) to transform between the robot's joint space and a latent space that captures only collision-free areas of the joint space, conditioned by an obstacle map. Generating multiple plausible trajectories is convenient in applications such as the manipulation of a robot arm by enabling the selec… ▽ More We propose a new method for collision-free planning using Conditional Generative Adversarial Networks (cGANs) to transform between the robot's joint space and a latent space that captures only collision-free areas of the joint space, conditioned by an obstacle map. Generating multiple plausible trajectories is convenient in applications such as the manipulation of a robot arm by enabling the selection of trajectories that avoids collision with the robot or surrounding environment. In the proposed method, various trajectories that avoid obstacles can be generated by connecting the start and goal state with arbitrary line segments in this generated latent space. Our method provides this collision-free latent space, after which any planner, using any optimization conditions, can be used to generate the most suitable paths on the fly. We successfully verified this method with a simulated and actual UR5e 6-DoF robotic arm. We confirmed that different trajectories could be generated depending on optimization conditions. △ Less

Submitted 5 February, 2023; v1 submitted 26 February, 2022; originally announced February 2022.

Comments: 19 pages, 7 figures. An accompanying video is available at https://www.youtube.com/watch?v=IJUxdmaSwy0. arXiv admin note: text overlap with arXiv:2202.07203

arXiv:2202.07203 [pdf, other]

Collision-free Path Planning in the Latent Space through cGANs

Authors: Tomoki Ando, Hiroki Mori, Ryota Torishima, Kuniyuki Takahashi, Shoichiro Yamaguchi, Daisuke Okanohara, Tetsuya Ogata

Abstract: We show a new method for collision-free path planning by cGANs by map** its latent space to only the collision-free areas of the robot joint space. Our method simply provides this collision-free latent space after which any planner, using any optimization conditions, can be used to generate the most suitable paths on the fly. We successfully verified this method with a simulated two-link robot a… ▽ More We show a new method for collision-free path planning by cGANs by map** its latent space to only the collision-free areas of the robot joint space. Our method simply provides this collision-free latent space after which any planner, using any optimization conditions, can be used to generate the most suitable paths on the fly. We successfully verified this method with a simulated two-link robot arm. △ Less

Submitted 15 February, 2022; originally announced February 2022.

Comments: 10pages, 9figures

arXiv:2202.04237 [pdf, other]

Learning Robust Convolutional Neural Networks with Relevant Feature Focusing via Explanations

Authors: Kazuki Adachi, Shin'ya Yamaguchi

Abstract: Existing image recognition techniques based on convolutional neural networks (CNNs) basically assume that the training and test datasets are sampled from i.i.d distributions. However, this assumption is easily broken in the real world because of the distribution shift that occurs when the co-occurrence relations between objects and backgrounds in input images change. Under this type of distributio… ▽ More Existing image recognition techniques based on convolutional neural networks (CNNs) basically assume that the training and test datasets are sampled from i.i.d distributions. However, this assumption is easily broken in the real world because of the distribution shift that occurs when the co-occurrence relations between objects and backgrounds in input images change. Under this type of distribution shift, CNNs learn to focus on features that are not task-relevant, such as backgrounds from the training data, and degrade their accuracy on the test data. To tackle this problem, we propose relevant feature focusing (ReFF). ReFF detects task-relevant features and regularizes CNNs via explanation outputs (e.g., Grad-CAM). Since ReFF is composed of post-hoc explanation modules, it can be easily applied to off-the-shelf CNNs. Furthermore, ReFF requires no additional inference cost at test time because it is only used for regularization while training. We demonstrate that CNNs trained with ReFF focus on features relevant to the target task and that ReFF improves the test-time accuracy. △ Less

Submitted 23 March, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: Accepted by ICME 2022

arXiv:2106.09269 [pdf, other]

Pruning Randomly Initialized Neural Networks with Iterative Randomization

Authors: Daiki Chijiwa, Shin'ya Yamaguchi, Yasutoshi Ida, Kenji Umakoshi, Tomohiro Inoue

Abstract: Pruning the weights of randomly initialized neural networks plays an important role in the context of lottery ticket hypothesis. Ramanujan et al. (2020) empirically showed that only pruning the weights can achieve remarkable performance instead of optimizing the weight values. However, to achieve the same level of performance as the weight optimization, the pruning approach requires more parameter… ▽ More Pruning the weights of randomly initialized neural networks plays an important role in the context of lottery ticket hypothesis. Ramanujan et al. (2020) empirically showed that only pruning the weights can achieve remarkable performance instead of optimizing the weight values. However, to achieve the same level of performance as the weight optimization, the pruning approach requires more parameters in the networks before pruning and thus more memory space. To overcome this parameter inefficiency, we introduce a novel framework to prune randomly initialized neural networks with iteratively randomizing weight values (IteRand). Theoretically, we prove an approximation theorem in our framework, which indicates that the randomizing operations are provably effective to reduce the required number of the parameters. We also empirically demonstrate the parameter efficiency in multiple experiments on CIFAR-10 and ImageNet. The code is available at: https://github.com/dchiji-ntt/iterand △ Less

Submitted 5 April, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021); Selected for a spotlight presentation

arXiv:2106.02343 [pdf, other]

F-Drop&Match: GANs with a Dead Zone in the High-Frequency Domain

Authors: Shin'ya Yamaguchi, Sekitoshi Kanai

Abstract: Generative adversarial networks built from deep convolutional neural networks (GANs) lack the ability to exactly replicate the high-frequency components of natural images. To alleviate this issue, we introduce two novel training techniques called frequency drop** (F-Drop) and frequency matching (F-Match). The key idea of F-Drop is to filter out unnecessary high-frequency components from the inpu… ▽ More Generative adversarial networks built from deep convolutional neural networks (GANs) lack the ability to exactly replicate the high-frequency components of natural images. To alleviate this issue, we introduce two novel training techniques called frequency drop** (F-Drop) and frequency matching (F-Match). The key idea of F-Drop is to filter out unnecessary high-frequency components from the input images of the discriminators. This simple modification prevents the discriminators from being confused by perturbations of the high-frequency components. In addition, F-Drop makes the GANs focus on fitting in the low-frequency domain, in which there are the dominant components of natural images. F-Match minimizes the difference between real and fake images in the frequency domain for generating more realistic images. F-Match is implemented as a regularization term in the objective functions of the generators; it penalizes the batch mean error in the frequency domain. F-Match helps the generators to fit in the high-frequency domain filtered out by F-Drop to the real image. We experimentally demonstrate that the combination of F-Drop and F-Match improves the generative performance of GANs in both the frequency and spatial domain on multiple image benchmarks. △ Less

Submitted 18 August, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: Accepted to ICCV 2021; Added experiments on StyleGAN2-ADA

arXiv:2010.02558 [pdf, other]

Constraining Logits by Bounded Function for Adversarial Robustness

Authors: Sekitoshi Kanai, Masanori Yamada, Shin'ya Yamaguchi, Hiroshi Takahashi, Yasutoshi Ida

Abstract: We propose a method for improving adversarial robustness by addition of a new bounded function just before softmax. Recent studies hypothesize that small logits (inputs of softmax) by logit regularization can improve adversarial robustness of deep learning. Following this hypothesis, we analyze norms of logit vectors at the optimal point under the assumption of universal approximation and explore… ▽ More We propose a method for improving adversarial robustness by addition of a new bounded function just before softmax. Recent studies hypothesize that small logits (inputs of softmax) by logit regularization can improve adversarial robustness of deep learning. Following this hypothesis, we analyze norms of logit vectors at the optimal point under the assumption of universal approximation and explore new methods for constraining logits by addition of a bounded function before softmax. We theoretically and empirically reveal that small logits by addition of a common activation function, e.g., hyperbolic tangent, do not improve adversarial robustness since input vectors of the function (pre-logit vectors) can have large norms. From the theoretical findings, we develop the new bounded function. The addition of our function improves adversarial robustness because it makes logit and pre-logit vectors have small norms. Since our method only adds one activation function before softmax, it is easy to combine our method with adversarial training. Our experiments demonstrate that our method is comparable to logit regularization methods in terms of accuracies on adversarially perturbed datasets without adversarial training. Furthermore, it is superior or comparable to logit regularization methods and a recent defense method (TRADES) when using adversarial training. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: 19 pages, 16 figures

arXiv:2008.01883 [pdf, other]

When is invariance useful in an Out-of-Distribution Generalization problem ?

Authors: Masanori Koyama, Shoichiro Yamaguchi

Abstract: The goal of Out-of-Distribution (OOD) generalization problem is to train a predictor that generalizes on all environments. Popular approaches in this field use the hypothesis that such a predictor shall be an \textit{invariant predictor} that captures the mechanism that remains constant across environments. While these approaches have been experimentally successful in various case studies, there i… ▽ More The goal of Out-of-Distribution (OOD) generalization problem is to train a predictor that generalizes on all environments. Popular approaches in this field use the hypothesis that such a predictor shall be an \textit{invariant predictor} that captures the mechanism that remains constant across environments. While these approaches have been experimentally successful in various case studies, there is still much room for the theoretical validation of this hypothesis. This paper presents a new set of theoretical conditions necessary for an invariant predictor to achieve the OOD optimality. Our theory not only applies to non-linear cases, but also generalizes the necessary condition used in \citet{rojas2018invariant}. We also derive Inter Gradient Alignment algorithm from our theory and demonstrate its competitiveness on MNIST-derived benchmark datasets as well as on two of the three \textit{Invariance Unit Tests} proposed by \citet{aubinlinear}. △ Less

Submitted 25 November, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

arXiv:2007.05722 [pdf, other]

Do We Need Sound for Sound Source Localization?

Authors: Takashi Oya, Shohei Iwase, Ryota Natsume, Takahiro Itazuri, Shugo Yamaguchi, Shigeo Morishima

Abstract: During the performance of sound source localization which uses both visual and aural information, it presently remains unclear how much either image or sound modalities contribute to the result, i.e. do we need both image and sound for sound source localization? To address this question, we develop an unsupervised learning system that solves sound source localization by decomposing this task into… ▽ More During the performance of sound source localization which uses both visual and aural information, it presently remains unclear how much either image or sound modalities contribute to the result, i.e. do we need both image and sound for sound source localization? To address this question, we develop an unsupervised learning system that solves sound source localization by decomposing this task into two steps: (i) "potential sound source localization", a step that localizes possible sound sources using only visual information (ii) "object selection", a step that identifies which objects are actually sounding using aural information. Our overall system achieves state-of-the-art performance in sound source localization, and more importantly, we find that despite the constraint on available information, the results of (i) achieve similar performance. From this observation and further experiments, we show that visual information is dominant in "sound" source localization when evaluated with the currently adopted benchmark dataset. Moreover, we show that the majority of sound-producing objects within the samples in this dataset can be inherently identified using only visual information, and thus that the dataset is inadequate to evaluate a system's capability to leverage aural information. As an alternative, we present an evaluation protocol that enforces both visual and aural information to be leveraged, and verify this property through several experiments. △ Less

Submitted 11 July, 2020; originally announced July 2020.

Comments: Paper: 14 pages, 6 figures. Supplementary Material: 6 pages, 3 figures. Videos and Codes will be released later

arXiv:1912.11603 [pdf, other]

Image Enhanced Rotation Prediction for Self-Supervised Learning

Authors: Shin'ya Yamaguchi, Sekitoshi Kanai, Tetsuya Shioda, Shoichiro Takeda

Abstract: The rotation prediction (Rotation) is a simple pretext-task for self-supervised learning (SSL), where models learn useful representations for target vision tasks by solving pretext-tasks. Although Rotation captures information of object shapes, it hardly captures information of textures. To tackle this problem, we introduce a novel pretext-task called image enhanced rotation prediction (IE-Rot) fo… ▽ More The rotation prediction (Rotation) is a simple pretext-task for self-supervised learning (SSL), where models learn useful representations for target vision tasks by solving pretext-tasks. Although Rotation captures information of object shapes, it hardly captures information of textures. To tackle this problem, we introduce a novel pretext-task called image enhanced rotation prediction (IE-Rot) for SSL. IE-Rot simultaneously solves Rotation and another pretext-task based on image enhancement (e.g., sharpening and solarizing) while maintaining simplicity. Through the simultaneous prediction of rotation and image enhancement, models learn representations to capture the information of not only object shapes but also textures. Our experimental results show that IE-Rot models outperform Rotation on various standard benchmarks including ImageNet classification, PASCAL-VOC detection, and COCO detection/segmentation. △ Less

Submitted 4 June, 2021; v1 submitted 25 December, 2019; originally announced December 2019.

Comments: Accepted to IEEE ICIP 2021. The title has been changed from "Multiple Pretext-Task for Self-Supervised Learning via Mixing Multiple Image Transformations"

arXiv:1912.11597 [pdf, other]

Effective Data Augmentation with Multi-Domain Learning GANs

Authors: Shin'ya Yamaguchi, Sekitoshi Kanai, Takeharu Eda

Abstract: For deep learning applications, the massive data development (e.g., collecting, labeling), which is an essential process in building practical applications, still incurs seriously high costs. In this work, we propose an effective data augmentation method based on generative adversarial networks (GANs), called Domain Fusion. Our key idea is to import the knowledge contained in an outer dataset to a… ▽ More For deep learning applications, the massive data development (e.g., collecting, labeling), which is an essential process in building practical applications, still incurs seriously high costs. In this work, we propose an effective data augmentation method based on generative adversarial networks (GANs), called Domain Fusion. Our key idea is to import the knowledge contained in an outer dataset to a target model by using a multi-domain learning GAN. The multi-domain learning GAN simultaneously learns the outer and target dataset and generates new samples for the target tasks. The simultaneous learning process makes GANs generate the target samples with high fidelity and variety. As a result, we can obtain accurate models for the target tasks by using these generated samples even if we only have an extremely low volume target dataset. We experimentally evaluate the advantages of Domain Fusion in image classification tasks on 3 target datasets: CIFAR-100, FGVC-Aircraft, and Indoor Scene Recognition. When trained on each target dataset reduced the samples to 5,000 images, Domain Fusion achieves better classification accuracy than the data augmentation using fine-tuned GANs. Furthermore, we show that Domain Fusion improves the quality of generated samples, and the improvements can contribute to higher accuracy. △ Less

Submitted 25 December, 2019; originally announced December 2019.

Comments: AAAI-2020

arXiv:1911.08444 [pdf, other]

MANGA: Method Agnostic Neural-policy Generalization and Adaptation

Authors: Homanga Bharadhwaj, Shoichiro Yamaguchi, Shin-ichi Maeda

Abstract: In this paper we target the problem of transferring policies across multiple environments with different dynamics parameters and motor noise variations, by introducing a framework that decouples the processes of policy learning and system identification. Efficiently transferring learned policies to an unknown environment with changes in dynamics configurations in the presence of motor noise is ver… ▽ More In this paper we target the problem of transferring policies across multiple environments with different dynamics parameters and motor noise variations, by introducing a framework that decouples the processes of policy learning and system identification. Efficiently transferring learned policies to an unknown environment with changes in dynamics configurations in the presence of motor noise is very important for operating robots in the real world, and our work is a novel attempt in that direction. We introduce MANGA: Method Agnostic Neural-policy Generalization and Adaptation, that trains dynamics conditioned policies and efficiently learns to estimate the dynamics parameters of the environment given off-policy state-transition rollouts in the environment. Our scheme is agnostic to the type of training method used - both reinforcement learning (RL) and imitation learning (IL) strategies can be used. We demonstrate the effectiveness of our approach by experimenting with four different MuJoCo agents and comparing against previously proposed transfer baselines. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Comments: Under Review. Video available at https://drive.google.com/file/d/12GsDq3iQDXEutE-xpzXxqrEfD6dYhKjs/view?usp=sharing Other details will be made available in the author's webpage www.homangabharadhwaj.com

arXiv:1910.03253 [pdf, other]

Motion Generation Considering Situation with Conditional Generative Adversarial Networks for Throwing Robots

Authors: Kyo Kutsuzawa, Hitoshi Kusano, Ayaka Kume, Shoichiro Yamaguchi

Abstract: When robots work in a cluttered environment, the constraints for motions change frequently and the required action can change even for the same task. However, planning complex motions from direct calculation has the risk of resulting in poor performance local optima. In addition, machine learning approaches often require relearning for novel situations. In this paper, we propose a method of search… ▽ More When robots work in a cluttered environment, the constraints for motions change frequently and the required action can change even for the same task. However, planning complex motions from direct calculation has the risk of resulting in poor performance local optima. In addition, machine learning approaches often require relearning for novel situations. In this paper, we propose a method of searching appropriate motions by using conditional Generative Adversarial Networks (cGANs), which can generate motions based on the conditions by mimicking training datasets. By training cGANs with various motions for a task, its latent space is fulfilled with the valid motions for the task. The appropriate motions can be found efficiently by searching the latent space of the trained cGANs instead of the motion space, while avoiding poor local optima. We demonstrate that the proposed method successfully works for an object-throwing task to given target positions in both numerical simulation and real-robot experiments. The proposed method resulted in three times higher accuracy with 2.5 times faster calculation time than searching the action space directly. △ Less

Submitted 8 October, 2019; originally announced October 2019.

arXiv:1906.08412 [pdf, other]

Data Interpolating Prediction: Alternative Interpretation of Mixup

Authors: Takuya Shimada, Shoichiro Yamaguchi, Kohei Hayashi, Sosuke Kobayashi

Abstract: Data augmentation by mixing samples, such as Mixup, has widely been used typically for classification tasks. However, this strategy is not always effective due to the gap between augmented samples for training and original samples for testing. This gap may prevent a classifier from learning the optimal decision boundary and increase the generalization error. To overcome this problem, we propose an… ▽ More Data augmentation by mixing samples, such as Mixup, has widely been used typically for classification tasks. However, this strategy is not always effective due to the gap between augmented samples for training and original samples for testing. This gap may prevent a classifier from learning the optimal decision boundary and increase the generalization error. To overcome this problem, we propose an alternative framework called Data Interpolating Prediction (DIP). Unlike common data augmentations, we encapsulate the sample-mixing process in the hypothesis class of a classifier so that train and test samples are treated equally. We derive the generalization bound and show that DIP helps to reduce the original Rademacher complexity. Also, we empirically demonstrate that DIP can outperform existing Mixup. △ Less

Submitted 19 June, 2019; originally announced June 2019.

Comments: Presented at the 2nd Learning from Limited Labeled Data (LLD) Workshop at ICLR 2019

arXiv:1906.04868 [pdf, other]

Semi-flat minima and saddle points by embedding neural networks to overparameterization

Authors: Kenji Fukumizu, Shoichiro Yamaguchi, Yoh-ichi Mototake, Mirai Tanaka

Abstract: We theoretically study the landscape of the training error for neural networks in overparameterized cases. We consider three basic methods for embedding a network into a wider one with more hidden units, and discuss whether a minimum point of the narrower network gives a minimum or saddle point of the wider one. Our results show that the networks with smooth and ReLU activation have different part… ▽ More We theoretically study the landscape of the training error for neural networks in overparameterized cases. We consider three basic methods for embedding a network into a wider one with more hidden units, and discuss whether a minimum point of the narrower network gives a minimum or saddle point of the wider one. Our results show that the networks with smooth and ReLU activation have different partially flat landscapes around the embedded point. We also relate these results to a difference of their generalization abilities in overparameterized realization. △ Less

Submitted 14 June, 2019; v1 submitted 11 June, 2019; originally announced June 2019.

Comments: 38 pages, 4 figures

arXiv:1904.10595 [pdf, other]

Influences of Human Demographics, Brand Familiarity and Security Backgrounds on Homograph Recognition

Authors: Tran Phuong Thao, Yukiko Sawaya, Hoang-Quoc Nguyen-Son, Akira Yamada, Ayumu Kubota, Tran Van Sang, Rie Shigetomi Yamaguchi

Abstract: Homograph attack is a way that attackers deceive victims about which website domain name they are communicating with by exploiting the fact that many characters look alike. The attack becomes serious and is raising broad attention when recently many brand domains have been attacked such as Apple Inc., Adobe Inc., Lloyds Bank, etc. We first design a survey of human demographics, brand familiarity,… ▽ More Homograph attack is a way that attackers deceive victims about which website domain name they are communicating with by exploiting the fact that many characters look alike. The attack becomes serious and is raising broad attention when recently many brand domains have been attacked such as Apple Inc., Adobe Inc., Lloyds Bank, etc. We first design a survey of human demographics, brand familiarity, and security backgrounds and apply it to 2,067 participants. We build a regression model to study which factors affect participants' ability in recognizing homograph domains. We find that for different levels of visual similarity, the participants exhibit different abilities. 13.95% of participants can recognize non-homographs while 16.60% of participants can recognize homographs whose the visual similarity with the target brand domains is under 99.9%; but when the similarity increases to 99.9%, the number of participants who can recognize homographs significantly drops down to only 0.19%; and for the homographs with 100% of visual similarity, there is no way for the participants to recognize. We also find that female participants tend to recognize homographs better the male but male participants tend to able to recognize non-homographs better than females. Security knowledge is a significant factor affecting both homographs and non-homographs; surprisingly, people who have strong security knowledge tend to be able to recognize homographs but not non-homographs. Furthermore, people who work or are educated in computer science or computer engineering do not appear as a factor affecting the ability in recognizing homographs; however, interestingly, right after they are explained about the homograph attack, people who work or are educated in computer science or computer engineering are the ones who can capture the situation the most quickly. △ Less

Submitted 26 January, 2020; v1 submitted 23 April, 2019; originally announced April 2019.

arXiv:1903.09538 [pdf]

Use of Ghost Cytometry to Differentiate Cells with Similar Gross Morphologic Characteristics

Authors: Hiroaki Adachi, Yoko Kawamura, Keiji Nakagawa, Ryoichi Horisaki, Issei Sato, Satoko Yamaguchi, Katsuhito Fujiu, Kayo Waki, Hiroyuki Noji, Sadao Ota

Abstract: Imaging flow cytometry shows significant potential for increasing our understanding of heterogeneous and complex life systems and is useful for biomedical applications. Ghost cytometry is a recently proposed approach for directly analyzing compressively measured signals, thereby relieving the computational bottleneck observed in high-throughput cytometry based on morphological information. While t… ▽ More Imaging flow cytometry shows significant potential for increasing our understanding of heterogeneous and complex life systems and is useful for biomedical applications. Ghost cytometry is a recently proposed approach for directly analyzing compressively measured signals, thereby relieving the computational bottleneck observed in high-throughput cytometry based on morphological information. While this image-free approach could distinguish different cell types using the same fluorescence staining method, further strict controls are sometimes required to clearly demonstrate that the classification is based on detailed morphologic analysis. In this study, we show that ghost cytometry can be used to classify cell populations of the same type but with different fluorescence distributions in space, supporting the strength of our image-free approach for morphologic cell analysis. △ Less

Submitted 22 March, 2019; originally announced March 2019.

arXiv:1902.02992 [pdf, other]

A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning

Authors: Yoshihiro Nagano, Shoichiro Yamaguchi, Yasuhiro Fujita, Masanori Koyama

Abstract: Hyperbolic space is a geometry that is known to be well-suited for representation learning of data with an underlying hierarchical structure. In this paper, we present a novel hyperbolic distribution called \textit{pseudo-hyperbolic Gaussian}, a Gaussian-like distribution on hyperbolic space whose density can be evaluated analytically and differentiated with respect to the parameters. Our distribu… ▽ More Hyperbolic space is a geometry that is known to be well-suited for representation learning of data with an underlying hierarchical structure. In this paper, we present a novel hyperbolic distribution called \textit{pseudo-hyperbolic Gaussian}, a Gaussian-like distribution on hyperbolic space whose density can be evaluated analytically and differentiated with respect to the parameters. Our distribution enables the gradient-based learning of the probabilistic models on hyperbolic space that could never have been considered before. Also, we can sample from this hyperbolic probability distribution without resorting to auxiliary means like rejection sampling. As applications of our distribution, we develop a hyperbolic-analog of variational autoencoder and a method of probabilistic word embedding on hyperbolic space. We demonstrate the efficacy of our distribution on various datasets including MNIST, Atari 2600 Breakout, and WordNet. △ Less

Submitted 9 May, 2019; v1 submitted 8 February, 2019; originally announced February 2019.

Comments: 20 pages, 12 figures

arXiv:1812.05296 [pdf]

Aerial Robot Model based design and verification of the single and multi-agent inspection application development

Authors: Seiko P. Yamaguchi, Masaru Sakuma, Takaki Ueno, Filip Karolonek, Tadeusz Uhl, Ankit A. Ravankar, Takanori Emaru, Yukinori Kobayashi

Abstract: In recent decade, potential application of Unmanned Aerial Vehicles (UAV) has enabled replacement of various operations in hard-to-access areas, such as, inspection, surveillance or search and rescue applications in challenging and complex environments. Furthermore, aerial robotics application with multi-agent systems are anticipated to further extend its potential. However, one of the major diffi… ▽ More In recent decade, potential application of Unmanned Aerial Vehicles (UAV) has enabled replacement of various operations in hard-to-access areas, such as, inspection, surveillance or search and rescue applications in challenging and complex environments. Furthermore, aerial robotics application with multi-agent systems are anticipated to further extend its potential. However, one of the major difficulties in aerial robotics applications is the testing of the elaborated system within safety concerns, especially when multiple agents are simultaneously applied. Thus, virtual prototy** and simulation-based development can serve in development, assessment and improvement of the aerial robot applications. In this research, two examples of the specific applications are highlighted, harbor structure and facilities inspection with UAV, and development of autonomous positioning of multi-UAVs communication relaying system. In this research, virtual prototype was designed and further simulated in multi-body simulation (MBS) feigning the sensing and actuating equipment behaviors. Simultaneous simulation of the control and application system running with software in the loop (SITL) method is utilized to assess the designed hardware behavior with modular application nodes running in Robot Operating System. Furthermore, prepared simulation environment is assessed with multi-agent system, proposed in previous research with autonomous position control of communication relaying system. Application of the virtual prototype's simulation environment enables further examination of the proposed system within comparison degree with postfield tests. The research aims to contribute through case assessment of the design process to safer, time and cost-efficient development and application design in the field of aerial robotics. △ Less

Submitted 13 December, 2018; originally announced December 2018.

Comments: 3 pages, 5 figures, conference paper

Showing 1–34 of 34 results for author: Yamaguchi, S