Search | arXiv e-print repository

Deep Bayesian Filter for Bayes-faithful Data Assimilation

Authors: Yuta Tarumi, Keisuke Fukuda, Shin-ichi Maeda

Abstract: State estimation for nonlinear state space models is a challenging task. Existing assimilation methodologies predominantly assume Gaussian posteriors on physical space, where true posteriors become inevitably non-Gaussian. We propose Deep Bayesian Filtering (DBF) for data assimilation on nonlinear state space models (SSMs). DBF constructs new latent variables $h_t$ on a new latent (``fancy'') spac… ▽ More State estimation for nonlinear state space models is a challenging task. Existing assimilation methodologies predominantly assume Gaussian posteriors on physical space, where true posteriors become inevitably non-Gaussian. We propose Deep Bayesian Filtering (DBF) for data assimilation on nonlinear state space models (SSMs). DBF constructs new latent variables $h_t$ on a new latent (``fancy'') space and assimilates observations $o_t$. By (i) constraining the state transition on fancy space to be linear and (ii) learning a Gaussian inverse observation operator $q(h_t|o_t)$, posteriors always remain Gaussian for DBF. Quite distinctively, the structured design of posteriors provides an analytic formula for the recursive computation of posteriors without accumulating Monte-Carlo sampling errors over time steps. DBF seeks the Gaussian inverse observation operators $q(h_t|o_t)$ and other latent SSM parameters (e.g., dynamics matrix) by maximizing the evidence lower bound. Experiments show that DBF outperforms model-based approaches and latent assimilation methods in various tasks and conditions. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Main text 9 pages

arXiv:2404.17381 [pdf, other]

Frequency-Guided Multi-Level Human Action Anomaly Detection with Normalizing Flows

Authors: Shun Maeda, Chunzhi Gu, Jun Yu, Shogo Tokai, Shangce Gao, Chao Zhang

Abstract: We introduce the task of human action anomaly detection (HAAD), which aims to identify anomalous motions in an unsupervised manner given only the pre-determined normal category of training action samples. Compared to prior human-related anomaly detection tasks which primarily focus on unusual events from videos, HAAD involves the learning of specific action labels to recognize semantically anomalo… ▽ More We introduce the task of human action anomaly detection (HAAD), which aims to identify anomalous motions in an unsupervised manner given only the pre-determined normal category of training action samples. Compared to prior human-related anomaly detection tasks which primarily focus on unusual events from videos, HAAD involves the learning of specific action labels to recognize semantically anomalous human behaviors. To address this task, we propose a normalizing flow (NF)-based detection framework where the sample likelihood is effectively leveraged to indicate anomalies. As action anomalies often occur in some specific body parts, in addition to the full-body action feature learning, we incorporate extra encoding streams into our framework for a finer modeling of body subsets. Our framework is thus multi-level to jointly discover global and local motion anomalies. Furthermore, to show awareness of the potentially jittery data during recording, we resort to discrete cosine transformation by converting the action samples from the temporal to the frequency domain to mitigate the issue of data instability. Extensive experimental results on two human action datasets demonstrate that our method outperforms the baselines formed by adapting state-of-the-art human activity AD approaches to our task of HAAD. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.10999 [pdf]

Machine-Learning-Enhanced Soft Robotic System Inspired by Rectal Functions for Investigating Fecal incontinence

Authors: Zebing Mao, Sota Suzuki, Hiroyuki Nabae, Shoko Miyagawa, Koichi Suzumori, Shingo Maeda

Abstract: Fecal incontinence, arising from a myriad of pathogenic mechanisms, has attracted considerable global attention. Despite its significance, the replication of the defecatory system for studying fecal incontinence mechanisms remains limited largely due to social stigma and taboos. Inspired by the rectum's functionalities, we have developed a soft robotic system, encompassing a power supply, pressure… ▽ More Fecal incontinence, arising from a myriad of pathogenic mechanisms, has attracted considerable global attention. Despite its significance, the replication of the defecatory system for studying fecal incontinence mechanisms remains limited largely due to social stigma and taboos. Inspired by the rectum's functionalities, we have developed a soft robotic system, encompassing a power supply, pressure sensing, data acquisition systems, a flushing mechanism, a stage, and a rectal module. The innovative soft rectal module includes actuators inspired by sphincter muscles, both soft and rigid covers, and soft rectum mold. The rectal mold, fabricated from materials that closely mimic human rectal tissue, is produced using the mold replication fabrication method. Both the soft and rigid components of the mold are realized through the application of 3D-printing technology. The sphincter muscles-inspired actuators featuring double-layer pouch structures are modeled and optimized based on multilayer perceptron methods aiming to obtain high contractions ratios (100%), high generated pressure (9.8 kPa), and small recovery time (3 s). Upon assembly, this defecation robot is capable of smoothly expelling liquid faeces, performing controlled solid fecal cutting, and defecating extremely solid long faeces, thus closely replicating the human rectum and anal canal's functions. This defecation robot has the potential to assist humans in understanding the complex defecation system and contribute to the development of well-being devices related to defecation. △ Less

Submitted 1 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2310.01712 [pdf, other]

Generative Autoencoding of Dropout Patterns

Authors: Shunta Maeda

Abstract: We propose a generative model termed Deciphering Autoencoders. In this model, we assign a unique random dropout pattern to each data point in the training dataset and then train an autoencoder to reconstruct the corresponding data point using this pattern as information to be encoded. Even if a completely random dropout pattern is assigned to each data point regardless of their similarities, a suf… ▽ More We propose a generative model termed Deciphering Autoencoders. In this model, we assign a unique random dropout pattern to each data point in the training dataset and then train an autoencoder to reconstruct the corresponding data point using this pattern as information to be encoded. Even if a completely random dropout pattern is assigned to each data point regardless of their similarities, a sufficiently large encoder can smoothly map them to a low-dimensional latent space to reconstruct individual training data points. During inference, using a dropout pattern different from those used during training allows the model to function as a generator. Since the training of Deciphering Autoencoders relies solely on reconstruction error, it offers more stable training compared to other generative models. Despite their simplicity, Deciphering Autoencoders show sampling quality comparable to DCGAN on the CIFAR-10 dataset. △ Less

Submitted 27 June, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2310.00894 [pdf, other]

doi 10.1109/ICIP49359.2023.10223088

JPEG Information Regularized Deep Image Prior for Denoising

Authors: Tsukasa Takagi, Shinya Ishizaki, Shin-ichi Maeda

Abstract: Image denoising is a representative image restoration task in computer vision. Recent progress of image denoising from only noisy images has attracted much attention. Deep image prior (DIP) demonstrated successful image denoising from only a noisy image by inductive bias of convolutional neural network architectures without any pre-training. The major challenge of DIP based image denoising is that… ▽ More Image denoising is a representative image restoration task in computer vision. Recent progress of image denoising from only noisy images has attracted much attention. Deep image prior (DIP) demonstrated successful image denoising from only a noisy image by inductive bias of convolutional neural network architectures without any pre-training. The major challenge of DIP based image denoising is that DIP would completely recover the original noisy image unless applying early stop**. For early stop** without a ground-truth clean image, we propose to monitor JPEG file size of the recovered image during optimization as a proxy metric of noise levels in the recovered image. Our experiments show that the compressed image file size works as an effective metric for early stop**. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: IEEE International Conference on Image Processing (ICIP 2023)

arXiv:2309.08312 [pdf, other]

Two-fingered Hand with Gear-type Synchronization Mechanism with Magnet for Improved Small and Offset Objects Gras**: F2 Hand

Authors: Naoki Fukaya, Avinash Ummadisingu, Kuniyuki Takahashi, Guilherme Maeda, Shin-ichi Maeda

Abstract: A problem that plagues robotic gras** is the misalignment of the object and gripper due to difficulties in precise localization, actuation, etc. Under-actuated robotic hands with compliant mechanisms are used to adapt and compensate for these inaccuracies. However, these mechanisms come at the cost of controllability and coordination. For instance, adaptive functions that let the fingers of a tw… ▽ More A problem that plagues robotic gras** is the misalignment of the object and gripper due to difficulties in precise localization, actuation, etc. Under-actuated robotic hands with compliant mechanisms are used to adapt and compensate for these inaccuracies. However, these mechanisms come at the cost of controllability and coordination. For instance, adaptive functions that let the fingers of a two-fingered gripper adapt independently may affect the coordination necessary for gras** small objects. In this work, we develop a two-fingered robotic hand capable of gras** objects that are offset from the gripper's center, while still having the requisite coordination for gras** small objects via a novel gear-type synchronization mechanism with a magnet. This gear synchronization mechanism allows the adaptive finger's tips to be aligned enabling it to grasp objects as small as toothpicks and washers. The magnetic component allows this coordination to automatically turn off when needed, allowing for the gras** of objects that are offset/misaligned from the gripper. This equips the hand with the capability of gras** light, fragile objects (strawberries, creampuffs, etc) to heavy frying pan lids, all while maintaining their position and posture which is vital in numerous applications that require precise positioning or careful manipulation. △ Less

Submitted 20 September, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: 8 pages. Accepted at IEEE IROS 2023. An accompanying video is available at https://www.youtube.com/watch?v=RAO7Qb2ZGNs

arXiv:2306.10656 [pdf, other]

Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

Authors: Kenta Oono, Nontawat Charoenphakdee, Kotatsu Bito, Zhengyan Gao, Yoshiaki Ota, Shoichiro Yamaguchi, Yohei Sugawara, Shin-ichi Maeda, Kunihiko Miyoshi, Yuki Saito, Koki Tsuda, Hiroshi Maruyama, Kohei Hayashi

Abstract: Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental conditions. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose Virtual Human Generative Model (VHGM), a machine learning model for estimating attributes about healt… ▽ More Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental conditions. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose Virtual Human Generative Model (VHGM), a machine learning model for estimating attributes about healthcare, lifestyles, and personalities. VHGM is a deep generative model trained with masked modeling to learn the joint distribution of attributes conditioned on known ones. Using heterogeneous tabular datasets, VHGM learns more than 1,800 attributes efficiently. We numerically evaluate the performance of VHGM and its training techniques. As a proof-of-concept of VHGM, we present several applications demonstrating user scenarios, such as virtual measurements of healthcare attributes and hypothesis verifications of lifestyles. △ Less

Submitted 14 August, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

Comments: 14 pages, 4 figures

arXiv:2306.00229 [pdf, other]

Minotaur: A SIMD-Oriented Synthesizing Superoptimizer

Authors: Zhengyang Liu, Stefan Mada, John Regehr

Abstract: Minotaur is a superoptimizer for LLVM's intermediate representation that focuses on integer SIMD instructions, both portable and specific to x86-64. We created it to attack problems in finding missing peephole optimizations for SIMD instructions-this is challenging because there are many such instructions and they can be semantically complex. Minotaur runs a hybrid synthesis algorithm where instru… ▽ More Minotaur is a superoptimizer for LLVM's intermediate representation that focuses on integer SIMD instructions, both portable and specific to x86-64. We created it to attack problems in finding missing peephole optimizations for SIMD instructions-this is challenging because there are many such instructions and they can be semantically complex. Minotaur runs a hybrid synthesis algorithm where instructions are enumerated concretely, but literal constants are generated by the solver. We use Alive2 as a verification engine; to do this we modified it to support synthesis and also to support a large subset of Intel's vector instruction sets (SSE, AVX, AVX2, and AVX-512). Minotaur finds many profitable optimizations that are missing from LLVM. It achieves limited speedups on the integer parts of SPEC CPU2017, around 1.3%, and it speeds up the test suite for the libYUV library by 2.2%, on average, and by 1.64x maximum, when targeting an Intel Cascade Lake processor. △ Less

Submitted 12 July, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

arXiv:2305.10162 [pdf, other]

Orienting undirected phylogenetic networks to tree-child network

Authors: Shunsuke Maeda, Yusuke Kaneko, Hideaki Muramatsu, Yukihiro Murakami, Momoko Hayamizu

Abstract: Phylogenetic networks are used to represent the evolutionary history of species. They are versatile when compared to traditional phylogenetic trees, as they capture more complex evolutionary events such as hybridization and horizontal gene transfer. Distance-based methods such as the Neighbor-Net algorithm are widely used to compute phylogenetic networks from data. However, the output is necessari… ▽ More Phylogenetic networks are used to represent the evolutionary history of species. They are versatile when compared to traditional phylogenetic trees, as they capture more complex evolutionary events such as hybridization and horizontal gene transfer. Distance-based methods such as the Neighbor-Net algorithm are widely used to compute phylogenetic networks from data. However, the output is necessarily an undirected graph, posing a great challenge to deduce the direction of genetic flow in order to infer the true evolutionary history. Recently, Huber et al. investigated two different computational problems relevant to orienting undirected phylogenetic networks into directed ones. In this paper, we consider the problem of orienting an undirected binary network into a tree-child network. We give some necessary conditions for determining the tree-child orientability, such as a tight upper bound on the size of tree-child orientable graphs, as well as many interesting examples. In addition, we introduce new families of undirected phylogenetic networks, the jellyfish graphs and ladder graphs, that are orientable but not tree-child orientable. We also prove that any ladder graph can be made tree-child orientable by adding extra leaves, and describe a simple algorithm for orienting a ladder graph to a tree-child network with the minimum number of extra leaves. We pose many open problems as well. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: 14 pages, 15 figures

MSC Class: 05C20

arXiv:2304.12770 [pdf, other]

Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network

Authors: Yuri Kinoshita, Kenta Oono, Kenji Fukumizu, Yuichi Yoshida, Shin-ichi Maeda

Abstract: Variational autoencoders (VAEs) are one of the deep generative models that have experienced enormous success over the past decades. However, in practice, they suffer from a problem called posterior collapse, which occurs when the encoder coincides, or collapses, with the prior taking no information from the latent structure of the input data into consideration. In this work, we introduce an invers… ▽ More Variational autoencoders (VAEs) are one of the deep generative models that have experienced enormous success over the past decades. However, in practice, they suffer from a problem called posterior collapse, which occurs when the encoder coincides, or collapses, with the prior taking no information from the latent structure of the input data into consideration. In this work, we introduce an inverse Lipschitz neural network into the decoder and, based on this architecture, provide a new method that can control in a simple and clear manner the degree of posterior collapse for a wide range of VAE models equipped with a concrete theoretical guarantee. We also illustrate the effectiveness of our method through several numerical experiments. △ Less

Submitted 2 February, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: accepted to ICML 2023, some notations adjusted from the submitted version

arXiv:2302.09376 [pdf, other]

Why is parameter averaging beneficial in SGD? An objective smoothing perspective

Authors: Atsushi Nitanda, Ryuhei Kikuchi, Shugo Maeda, Denny Wu

Abstract: It is often observed that stochastic gradient descent (SGD) and its variants implicitly select a solution with good generalization performance; such implicit bias is often characterized in terms of the sharpness of the minima. Kleinberg et al. (2018) connected this bias with the smoothing effect of SGD which eliminates sharp local minima by the convolution using the stochastic gradient noise. We f… ▽ More It is often observed that stochastic gradient descent (SGD) and its variants implicitly select a solution with good generalization performance; such implicit bias is often characterized in terms of the sharpness of the minima. Kleinberg et al. (2018) connected this bias with the smoothing effect of SGD which eliminates sharp local minima by the convolution using the stochastic gradient noise. We follow this line of research and study the commonly-used averaged SGD algorithm, which has been empirically observed in Izmailov et al. (2018) to prefer a flat minimum and therefore achieves better generalization. We prove that in certain problem settings, averaged SGD can efficiently optimize the smoothed objective which avoids sharp local minima. In experiments, we verify our theory and show that parameter averaging with an appropriate step size indeed leads to significant improvement in the performance of SGD. △ Less

Submitted 26 May, 2024; v1 submitted 18 February, 2023; originally announced February 2023.

Comments: 27pages, AISTATS2024

arXiv:2207.09228 [pdf, other]

Image Super-Resolution with Deep Dictionary

Authors: Shunta Maeda

Abstract: Since the first success of Dong et al., the deep-learning-based approach has become dominant in the field of single-image super-resolution. This replaces all the handcrafted image processing steps of traditional sparse-coding-based methods with a deep neural network. In contrast to sparse-coding-based methods, which explicitly create high/low-resolution dictionaries, the dictionaries in deep-learn… ▽ More Since the first success of Dong et al., the deep-learning-based approach has become dominant in the field of single-image super-resolution. This replaces all the handcrafted image processing steps of traditional sparse-coding-based methods with a deep neural network. In contrast to sparse-coding-based methods, which explicitly create high/low-resolution dictionaries, the dictionaries in deep-learning-based methods are implicitly acquired as a nonlinear combination of multiple convolutions. One disadvantage of deep-learning-based methods is that their performance is degraded for images created differently from the training dataset (out-of-domain images). We propose an end-to-end super-resolution network with a deep dictionary (SRDD), where a high-resolution dictionary is explicitly learned without sacrificing the advantages of deep learning. Extensive experiments show that explicit learning of high-resolution dictionary makes the network more robust for out-of-domain test images while maintaining the performance of the in-domain test images. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: ECCV 2022

arXiv:2206.06556 [pdf, other]

F3 Hand: A Versatile Robot Hand Inspired by Human Thumb and Index Fingers

Authors: Naoki Fukaya, Avinash Ummadisingu, Guilherme Maeda, Shin-ichi Maeda

Abstract: It is challenging to grasp numerous objects with varying sizes and shapes with a single robot hand. To address this, we propose a new robot hand called the 'F3 hand' inspired by the complex movements of human index finger and thumb. The F3 hand attempts to realize complex human-like gras** movements by combining a parallel motion finger and a rotational motion finger with an adaptive function. I… ▽ More It is challenging to grasp numerous objects with varying sizes and shapes with a single robot hand. To address this, we propose a new robot hand called the 'F3 hand' inspired by the complex movements of human index finger and thumb. The F3 hand attempts to realize complex human-like gras** movements by combining a parallel motion finger and a rotational motion finger with an adaptive function. In order to confirm the performance of our hand, we attached it to a mobile manipulator - the Toyota Human Support Robot (HSR) and conducted gras** experiments. In our results, we show that it is able to grasp all YCB objects (82 in total), including washers with outer diameters as small as 6.4mm. We also built a system for intuitive operation with a 3D mouse and grasp an additional 24 objects, including small toothpicks and paper clips and large pitchers and cracker boxes. The F3 hand is able to achieve a 98% success rate in gras** even under imprecise control and positional offsets. Furthermore, owing to the finger's adaptive function, we demonstrate characteristics of the F3 hand that facilitate the gras** of soft objects such as strawberries in a desirable posture. △ Less

Submitted 16 June, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: 8 pages. Accepted at IEEE RO-MAN 2022. An accompanying video is available at https://www.youtube.com/watch?v=l6GK5XTbty8

arXiv:2205.07066 [pdf, other]

F1 Hand: A Versatile Fixed-Finger Gripper for Delicate Teleoperation and Autonomous Gras**

Authors: Guilherme Maeda, Naoki Fukaya, Shin-ichi Maeda

Abstract: Teleoperation is often limited by the ability of an operator to react and predict the behavior of the robot as it interacts with the environment. For example, to grasp small objects on a table, the teleoperator needs to predict the position of the fingertips before the fingers are closed to avoid them hitting the table. For that reason, we developed the F1 hand, a single-motor gripper that facilit… ▽ More Teleoperation is often limited by the ability of an operator to react and predict the behavior of the robot as it interacts with the environment. For example, to grasp small objects on a table, the teleoperator needs to predict the position of the fingertips before the fingers are closed to avoid them hitting the table. For that reason, we developed the F1 hand, a single-motor gripper that facilitates teleoperation with the use of a fixed finger. The hand is capable of gras** objects as thin as a paper clip, and as heavy and large as a cordless drill. The applicability of the hand can be expanded by replacing the fixed finger with different shapes. This flexibility makes the hand highly versatile while being easy and cheap to develop. However, due to the atypical asymmetric structure and actuation of the hand usual gras** strategies no longer apply. Thus, we propose a controller that approximates actuation symmetry by using the motion of the whole arm. The F1 hand and its controller are compared side-by-side with the original Toyota Human Support Robot (HSR) gripper in teleoperation using 22 objects from the YCB dataset in addition to small objects. The gras** time and peak contact forces could be decreased by 20% and 70%, respectively while increasing success rates by 5%. Using an off-the-shelf grasp pose estimator for autonomous gras**, the system achieved similar success rates to the original HSR gripper, at the order of 80%. △ Less

Submitted 14 May, 2022; originally announced May 2022.

Comments: Accepted for publication at the IEEE Robotics and Automation Letters (RA-L)

arXiv:2109.13432 [pdf, other]

Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency

Authors: Aditya Ganeshan, Alexis Vallet, Yasunori Kudo, Shin-ichi Maeda, Tommi Kerola, Rares Ambrus, Dennis Park, Adrien Gaidon

Abstract: Deep learning models for semantic segmentation rely on expensive, large-scale, manually annotated datasets. Labelling is a tedious process that can take hours per image. Automatically annotating video sequences by propagating sparsely labeled frames through time is a more scalable alternative. In this work, we propose a novel label propagation method, termed Warp-Refine Propagation, that combines… ▽ More Deep learning models for semantic segmentation rely on expensive, large-scale, manually annotated datasets. Labelling is a tedious process that can take hours per image. Automatically annotating video sequences by propagating sparsely labeled frames through time is a more scalable alternative. In this work, we propose a novel label propagation method, termed Warp-Refine Propagation, that combines semantic cues with geometric cues to efficiently auto-label videos. Our method learns to refine geometrically-warped labels and infuse them with learned semantic priors in a semi-supervised setting by leveraging cycle consistency across time. We quantitatively show that our method improves label-propagation by a noteworthy margin of 13.1 mIoU on the ApolloScape dataset. Furthermore, by training with the auto-labelled frames, we achieve competitive results on three semantic-segmentation benchmarks, improving the state-of-the-art by a large margin of 1.8 and 3.61 mIoU on NYU-V2 and KITTI, while matching the current best results on Cityscapes. △ Less

Submitted 27 September, 2021; originally announced September 2021.

Comments: 16 pages, 12 figures, including supplementary material. To be published in ICCV 2021

arXiv:2108.11018 [pdf, other]

A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?

Authors: Hiroaki Mikami, Kenji Fukumizu, Shogo Murai, Shuji Suzuki, Yuta Kikuchi, Taiji Suzuki, Shin-ichi Maeda, Kohei Hayashi

Abstract: Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks. The most significant advantage of using synthetic images is that the ground-truth labels are automatically available, enabling unlimited expansion of the data size without human cost. However, synthetic data may have a huge doma… ▽ More Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks. The most significant advantage of using synthetic images is that the ground-truth labels are automatically available, enabling unlimited expansion of the data size without human cost. However, synthetic data may have a huge domain gap, in which case increasing the data size does not improve the performance. How can we know that? In this study, we derive a simple scaling law that predicts the performance from the amount of pre-training data. By estimating the parameters of the law, we can judge whether we should increase the data or change the setting of image synthesis. Further, we analyze the theory of transfer learning by considering learning dynamics and confirm that the derived generalization bound is consistent with our empirical findings. We empirically validated our scaling law on various experimental settings of benchmark tasks, model sizes, and complexities of synthetic images. △ Less

Submitted 8 October, 2021; v1 submitted 24 August, 2021; originally announced August 2021.

arXiv:2105.12946 [pdf, other]

Uncertainty-Aware Self-Supervised Target-Mass Gras** of Granular Foods

Authors: Kuniyuki Takahashi, Wilson Ko, Avinash Ummadisingu, Shin-ichi Maeda

Abstract: Food packing industry workers typically pick a target amount of food by hand from a food tray and place them in containers. Since menus are diverse and change frequently, robots must adapt and learn to handle new foods in a short time-span. Learning to grasp a specific amount of granular food requires a large training dataset, which is challenging to collect reasonably quickly. In this study, we p… ▽ More Food packing industry workers typically pick a target amount of food by hand from a food tray and place them in containers. Since menus are diverse and change frequently, robots must adapt and learn to handle new foods in a short time-span. Learning to grasp a specific amount of granular food requires a large training dataset, which is challenging to collect reasonably quickly. In this study, we propose ways to reduce the necessary amount of training data by augmenting a deep neural network with models that estimate its uncertainty through self-supervised learning. To further reduce human effort, we devise a data collection system that automatically generates labels. We build on the idea that we can grasp sufficiently well if there is at least one low-uncertainty (high-confidence) grasp point among the various grasp point candidates. We evaluate the methods we propose in this work on a variety of granular foods -- coffee beans, rice, oatmeal and peanuts -- each of which has a different size, shape and material properties such as volumetric mass density or friction. For these foods, we show significantly improved grasp accuracy of user-specified target masses using smaller datasets by incorporating uncertainty. △ Less

Submitted 27 May, 2021; originally announced May 2021.

Comments: 7 pages. Accepted to ICRA2021. An accompanying video is available at the following link: https://youtu.be/5pLkg7SpmiE

arXiv:2010.13086 [pdf]

Entangled and correlated photon mixed strategy for social decision making

Authors: Shion Maeda, Nicolas Chauvet, Hayato Saigo, Hirokazu Hori, Guillaume Bachelier, Serge Huant, Makoto Naruse

Abstract: Collective decision making is important for maximizing total benefits while preserving equality among individuals in the competitive multi-armed bandit (CMAB) problem, wherein multiple players try to gain higher rewards from multiple slot machines. The CMAB problem represents an essential aspect of applications such as resource management in social infrastructure. In a previous study, we theoretic… ▽ More Collective decision making is important for maximizing total benefits while preserving equality among individuals in the competitive multi-armed bandit (CMAB) problem, wherein multiple players try to gain higher rewards from multiple slot machines. The CMAB problem represents an essential aspect of applications such as resource management in social infrastructure. In a previous study, we theoretically and experimentally demonstrated that entangled photons can physically resolve the difficulty of the CMAB problem. This decision-making strategy completely avoids decision conflicts while ensuring equality. However, decision conflicts can sometimes be beneficial if they yield greater rewards than non-conflicting decisions, indicating that greedy actions may provide positive effects depending on the given environment. In this study, we demonstrate a mixed strategy of entangled- and correlated-photon-based decision-making so that total rewards can be enhanced when compared to the entangled-photon-only decision strategy. We show that an optimal mixture of entangled- and correlated-photon-based strategies exists depending on the dynamics of the reward environment as well as the difficulty of the given problem. This study paves the way for utilizing both quantum and classical aspects of photons in a mixed manner for decision making and provides yet another example of the supremacy of mixed strategies known in game theory, especially in evolutionary game theory. △ Less

Submitted 25 October, 2020; originally announced October 2020.

arXiv:2006.01488 [pdf, other]

Meta Learning as Bayes Risk Minimization

Authors: Shin-ichi Maeda, Toshiki Nakanishi, Masanori Koyama

Abstract: Meta-Learning is a family of methods that use a set of interrelated tasks to learn a model that can quickly learn a new query task from a possibly small contextual dataset. In this study, we use a probabilistic framework to formalize what it means for two tasks to be related and reframe the meta-learning problem into the problem of Bayesian risk minimization (BRM). In our formulation, the BRM opti… ▽ More Meta-Learning is a family of methods that use a set of interrelated tasks to learn a model that can quickly learn a new query task from a possibly small contextual dataset. In this study, we use a probabilistic framework to formalize what it means for two tasks to be related and reframe the meta-learning problem into the problem of Bayesian risk minimization (BRM). In our formulation, the BRM optimal solution is given by the predictive distribution computed from the posterior distribution of the task-specific latent variable conditioned on the contextual dataset, and this justifies the philosophy of Neural Process. However, the posterior distribution in Neural Process violates the way the posterior distribution changes with the contextual dataset. To address this problem, we present a novel Gaussian approximation for the posterior distribution that generalizes the posterior of the linear Gaussian model. Unlike that of the Neural Process, our approximation of the posterior distributions converges to the maximum likelihood estimate with the same rate as the true posterior distribution. We also demonstrate the competitiveness of our approach on benchmark datasets. △ Less

Submitted 2 June, 2020; originally announced June 2020.

arXiv:2002.11397 [pdf, other]

Unpaired Image Super-Resolution using Pseudo-Supervision

Authors: Shunta Maeda

Abstract: In most studies on learning-based image super-resolution (SR), the paired training dataset is created by downscaling high-resolution (HR) images with a predetermined operation (e.g., bicubic). However, these methods fail to super-resolve real-world low-resolution (LR) images, for which the degradation process is much more complicated and unknown. In this paper, we propose an unpaired SR method usi… ▽ More In most studies on learning-based image super-resolution (SR), the paired training dataset is created by downscaling high-resolution (HR) images with a predetermined operation (e.g., bicubic). However, these methods fail to super-resolve real-world low-resolution (LR) images, for which the degradation process is much more complicated and unknown. In this paper, we propose an unpaired SR method using a generative adversarial network that does not require a paired/aligned training dataset. Our network consists of an unpaired kernel/noise correction network and a pseudo-paired SR network. The correction network removes noise and adjusts the kernel of the inputted LR image; then, the corrected clean LR image is upscaled by the SR network. In the training phase, the correction network also produces a pseudo-clean LR image from the inputted HR image, and then a map** from the pseudo-clean LR image to the inputted HR image is learned by the SR network in a paired manner. Because our SR network is independent of the correction network, well-studied existing network architectures and pixel-wise loss functions can be integrated with the proposed framework. Experiments on diverse datasets show that the proposed method is superior to existing solutions to the unpaired SR problem. △ Less

Submitted 26 February, 2020; originally announced February 2020.

Comments: 10 pages, 10 figures

arXiv:1911.08724 [pdf, other]

Fast and Flexible Image Blind Denoising via Competition of Experts

Authors: Shunta Maeda

Abstract: Fast and flexible processing are two essential requirements for a number of practical applications of image denoising. Current state-of-the-art methods, however, still require either high computational cost or limited scopes of the target. We introduce an efficient ensemble network trained via a competition of expert networks, as an application for image blind denoising. We realize automatic divis… ▽ More Fast and flexible processing are two essential requirements for a number of practical applications of image denoising. Current state-of-the-art methods, however, still require either high computational cost or limited scopes of the target. We introduce an efficient ensemble network trained via a competition of expert networks, as an application for image blind denoising. We realize automatic division of unlabeled noisy datasets into clusters respectively optimized to enhance denoising performance. The architecture is scalable, can be extended to deal with diverse noise sources/levels without increasing the computation time. Taking advantage of this method, we save up to approximately 90% of computational cost without sacrifice of the denoising performance compared to single network models with identical architectures. We also compare the proposed method with several existing algorithms and observe significant outperformance over prior arts in terms of computational efficiency. △ Less

Submitted 20 November, 2019; originally announced November 2019.

Comments: 9 pages, 9 figures

arXiv:1911.08444 [pdf, other]

MANGA: Method Agnostic Neural-policy Generalization and Adaptation

Authors: Homanga Bharadhwaj, Shoichiro Yamaguchi, Shin-ichi Maeda

Abstract: In this paper we target the problem of transferring policies across multiple environments with different dynamics parameters and motor noise variations, by introducing a framework that decouples the processes of policy learning and system identification. Efficiently transferring learned policies to an unknown environment with changes in dynamics configurations in the presence of motor noise is ver… ▽ More In this paper we target the problem of transferring policies across multiple environments with different dynamics parameters and motor noise variations, by introducing a framework that decouples the processes of policy learning and system identification. Efficiently transferring learned policies to an unknown environment with changes in dynamics configurations in the presence of motor noise is very important for operating robots in the real world, and our work is a novel attempt in that direction. We introduce MANGA: Method Agnostic Neural-policy Generalization and Adaptation, that trains dynamics conditioned policies and efficiently learns to estimate the dynamics parameters of the environment given off-policy state-transition rollouts in the environment. Our scheme is agnostic to the type of training method used - both reinforcement learning (RL) and imitation learning (IL) strategies can be used. We demonstrate the effectiveness of our approach by experimenting with four different MuJoCo agents and comparing against previously proposed transfer baselines. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Comments: Under Review. Video available at https://drive.google.com/file/d/12GsDq3iQDXEutE-xpzXxqrEfD6dYhKjs/view?usp=sharing Other details will be made available in the author's webpage www.homangabharadhwaj.com

arXiv:1909.09540 [pdf, other]

Reconnaissance and Planning algorithm for constrained MDP

Authors: Shin-ichi Maeda, Hayato Watahiki, Shintarou Okada, Masanori Koyama

Abstract: Practical reinforcement learning problems are often formulated as constrained Markov decision process (CMDP) problems, in which the agent has to maximize the expected return while satisfying a set of prescribed safety constraints. In this study, we propose a novel simulator-based method to approximately solve a CMDP problem without making any compromise on the safety constraints. We achieve this b… ▽ More Practical reinforcement learning problems are often formulated as constrained Markov decision process (CMDP) problems, in which the agent has to maximize the expected return while satisfying a set of prescribed safety constraints. In this study, we propose a novel simulator-based method to approximately solve a CMDP problem without making any compromise on the safety constraints. We achieve this by decomposing the CMDP into a pair of MDPs; reconnaissance MDP and planning MDP. The purpose of reconnaissance MDP is to evaluate the set of actions that are safe, and the purpose of planning MDP is to maximize the return while using the actions authorized by reconnaissance MDP. RMDP can define a set of safe policies for any given set of safety constraint, and this set of safe policies can be used to solve another CMDP problem with different reward. Our method is not only computationally less demanding than the previous simulator-based approaches to CMDP, but also capable of finding a competitive reward-seeking policy in a high dimensional environment, including those involving multiple moving obstacles. △ Less

Submitted 20 September, 2019; originally announced September 2019.

arXiv:1908.04471 [pdf, other]

Einconv: Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks

Authors: Kohei Hayashi, Taiki Yamaguchi, Yohei Sugawara, Shin-ichi Maeda

Abstract: Tensor decomposition methods are widely used for model compression and fast inference in convolutional neural networks (CNNs). Although many decompositions are conceivable, only CP decomposition and a few others have been applied in practice, and no extensive comparisons have been made between available methods. Previous studies have not determined how many decompositions are available, nor which… ▽ More Tensor decomposition methods are widely used for model compression and fast inference in convolutional neural networks (CNNs). Although many decompositions are conceivable, only CP decomposition and a few others have been applied in practice, and no extensive comparisons have been made between available methods. Previous studies have not determined how many decompositions are available, nor which of them is optimal. In this study, we first characterize a decomposition class specific to CNNs by adopting a flexible graphical notation. The class includes such well-known CNN modules as depthwise separable convolution layers and bottleneck layers, but also previously unknown modules with nonlinear activations. We also experimentally compare the tradeoff between prediction accuracy and time/space complexity for modules found by enumerating all possible decompositions, or by using a neural architecture search. We find some nonlinear decompositions outperform existing ones. △ Less

Submitted 27 November, 2019; v1 submitted 12 August, 2019; originally announced August 2019.

Comments: NeurIPS 2019

arXiv:1905.13021 [pdf, other]

Robustness to Adversarial Perturbations in Learning from Incomplete Data

Authors: Amir Najafi, Shin-ichi Maeda, Masanori Koyama, Takeru Miyato

Abstract: What is the role of unlabeled data in an inference problem, when the presumed underlying distribution is adversarially perturbed? To provide a concrete answer to this question, this paper unifies two major learning frameworks: Semi-Supervised Learning (SSL) and Distributionally Robust Learning (DRL). We develop a generalization theory for our framework based on a number of novel complexity measure… ▽ More What is the role of unlabeled data in an inference problem, when the presumed underlying distribution is adversarially perturbed? To provide a concrete answer to this question, this paper unifies two major learning frameworks: Semi-Supervised Learning (SSL) and Distributionally Robust Learning (DRL). We develop a generalization theory for our framework based on a number of novel complexity measures, such as an adversarial extension of Rademacher complexity and its semi-supervised analogue. Moreover, our analysis is able to quantify the role of unlabeled data in the generalization under a more general condition compared to the existing theoretical works in SSL. Based on our framework, we also present a hybrid of DRL and EM algorithms that has a guaranteed convergence rate. When implemented with deep neural networks, our method shows a comparable performance to those of the state-of-the-art on a number of real-world benchmark datasets. △ Less

Submitted 24 May, 2019; originally announced May 2019.

Comments: 41 pages, 9 figures

arXiv:1902.01020 [pdf, other]

Graph Warp Module: an Auxiliary Module for Boosting the Power of Graph Neural Networks in Molecular Graph Analysis

Authors: Katsuhiko Ishiguro, Shin-ichi Maeda, Masanori Koyama

Abstract: Graph Neural Network (GNN) is a popular architecture for the analysis of chemical molecules, and it has numerous applications in material and medicinal science. Current lines of GNNs developed for molecular analysis, however, do not fit well on the training set, and their performance does not scale well with the complexity of the network. In this paper, we propose an auxiliary module to be attache… ▽ More Graph Neural Network (GNN) is a popular architecture for the analysis of chemical molecules, and it has numerous applications in material and medicinal science. Current lines of GNNs developed for molecular analysis, however, do not fit well on the training set, and their performance does not scale well with the complexity of the network. In this paper, we propose an auxiliary module to be attached to a GNN that can boost the representation power of the model without hindering with the original GNN architecture. Our auxiliary module can be attached to a wide variety of GNNs, including those that are used commonly in biochemical applications. With our auxiliary architecture, the performances of many GNNs used in practice improve more consistently, achieving the state-of-the-art performance on popular molecular graph datasets. △ Less

Submitted 24 May, 2019; v1 submitted 3 February, 2019; originally announced February 2019.

Comments: Augmented experiments, title slightly modified

arXiv:1810.11748 [pdf, other]

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

Authors: Riku Arakawa, Sosuke Kobayashi, Yuya Unno, Yuta Tsuboi, Shin-ichi Maeda

Abstract: Exploration has been one of the greatest challenges in reinforcement learning (RL), which is a large obstacle in the application of RL to robotics. Even with state-of-the-art RL algorithms, building a well-learned agent often requires too many trials, mainly due to the difficulty of matching its actions with rewards in the distant future. A remedy for this is to train an agent with real-time feedb… ▽ More Exploration has been one of the greatest challenges in reinforcement learning (RL), which is a large obstacle in the application of RL to robotics. Even with state-of-the-art RL algorithms, building a well-learned agent often requires too many trials, mainly due to the difficulty of matching its actions with rewards in the distant future. A remedy for this is to train an agent with real-time feedback from a human observer who immediately gives rewards for some actions. This study tackles a series of challenges for introducing such a human-in-the-loop RL scheme. The first contribution of this work is our experiments with a precisely modeled human observer: binary, delay, stochasticity, unsustainability, and natural reaction. We also propose an RL method called DQN-TAMER, which efficiently uses both human feedback and distant rewards. We find that DQN-TAMER agents outperform their baselines in Maze and Taxi simulated environments. Furthermore, we demonstrate a real-world human-in-the-loop RL application where a camera automatically recognizes a user's facial expressions as feedback to the agent while the agent explores a maze. △ Less

Submitted 27 October, 2018; originally announced October 2018.

arXiv:1807.01985 [pdf, other]

BayesGrad: Explaining Predictions of Graph Convolutional Networks

Authors: Hirotaka Akita, Kosuke Nakago, Tomoki Komatsu, Yohei Sugawara, Shin-ichi Maeda, Yukino Baba, Hisashi Kashima

Abstract: Recent advances in graph convolutional networks have significantly improved the performance of chemical predictions, raising a new research question: "how do we explain the predictions of graph convolutional networks?" A possible approach to answer this question is to visualize evidence substructures responsible for the predictions. For chemical property prediction tasks, the sample size of the tr… ▽ More Recent advances in graph convolutional networks have significantly improved the performance of chemical predictions, raising a new research question: "how do we explain the predictions of graph convolutional networks?" A possible approach to answer this question is to visualize evidence substructures responsible for the predictions. For chemical property prediction tasks, the sample size of the training data is often small and/or a label imbalance problem occurs, where a few samples belong to a single class and the majority of samples belong to the other classes. This can lead to uncertainty related to the learned parameters of the machine learning model. To address this uncertainty, we propose BayesGrad, utilizing the Bayesian predictive distribution, to define the importance of each node in an input graph, which is computed efficiently using the dropout technique. We demonstrate that BayesGrad successfully visualizes the substructures responsible for the label prediction in the artificial experiment, even when the sample size is small. Furthermore, we use a real dataset to evaluate the effectiveness of the visualization. The basic idea of BayesGrad is not limited to graph-structured data and can be applied to other data types. △ Less

Submitted 4 July, 2018; originally announced July 2018.

arXiv:1805.06386 [pdf, other]

Neural Multi-scale Image Compression

Authors: Ken Nakanishi, Shin-ichi Maeda, Takeru Miyato, Daisuke Okanohara

Abstract: This study presents a new lossy image compression method that utilizes the multi-scale features of natural images. Our model consists of two networks: multi-scale lossy autoencoder and parallel multi-scale lossless coder. The multi-scale lossy autoencoder extracts the multi-scale image features to quantized variables and the parallel multi-scale lossless coder enables rapid and accurate lossless c… ▽ More This study presents a new lossy image compression method that utilizes the multi-scale features of natural images. Our model consists of two networks: multi-scale lossy autoencoder and parallel multi-scale lossless coder. The multi-scale lossy autoencoder extracts the multi-scale image features to quantized variables and the parallel multi-scale lossless coder enables rapid and accurate lossless coding of the quantized variables via encoding/decoding the variables in parallel. Our proposed model achieves comparable performance to the state-of-the-art model on Kodak and RAISE-1k dataset images, and it encodes a PNG image of size $768 \times 512$ in 70 ms with a single GPU and a single CPU process and decodes it into a high-fidelity image in approximately 200 ms. △ Less

Submitted 16 May, 2018; originally announced May 2018.

Comments: 15 pages, 15 figures

arXiv:1802.07564 [pdf, other]

Clipped Action Policy Gradient

Authors: Yasuhiro Fujita, Shin-ichi Maeda

Abstract: Many continuous control tasks have bounded action spaces. When policy gradient methods are applied to such tasks, out-of-bound actions need to be clipped before execution, while policies are usually optimized as if the actions are not clipped. We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that our estimato… ▽ More Many continuous control tasks have bounded action spaces. When policy gradient methods are applied to such tasks, out-of-bound actions need to be clipped before execution, while policies are usually optimized as if the actions are not clipped. We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that our estimator, named clipped action policy gradient (CAPG), is unbiased and achieves lower variance than the conventional estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the conventional estimator, indicating that it is a better policy gradient estimator for continuous control tasks. The source code is available at https://github.com/pfnet-research/capg. △ Less

Submitted 22 June, 2018; v1 submitted 21 February, 2018; originally announced February 2018.

Comments: Accepted at ICML 2018

arXiv:1711.10168 [pdf, other]

Semi-supervised learning of hierarchical representations of molecules using neural message passing

Authors: Hai Nguyen, Shin-ichi Maeda, Kenta Oono

Abstract: With the rapid increase of compound databases available in medicinal and material science, there is a growing need for learning representations of molecules in a semi-supervised manner. In this paper, we propose an unsupervised hierarchical feature extraction algorithm for molecules (or more generally, graph-structured objects with fixed number of types of nodes and edges), which is applicable to… ▽ More With the rapid increase of compound databases available in medicinal and material science, there is a growing need for learning representations of molecules in a semi-supervised manner. In this paper, we propose an unsupervised hierarchical feature extraction algorithm for molecules (or more generally, graph-structured objects with fixed number of types of nodes and edges), which is applicable to both unsupervised and semi-supervised tasks. Our method extends recently proposed Paragraph Vector algorithm and incorporates neural message passing to obtain hierarchical representations of subgraphs. We applied our method to an unsupervised task and demonstrated that it outperforms existing proposed methods in several benchmark datasets. We also experimentally showed that semi-supervised tasks enhanced predictive performance compared with supervised ones with labeled molecules only. △ Less

Submitted 28 November, 2017; v1 submitted 28 November, 2017; originally announced November 2017.

Comments: 8 pages, 2 figures. Appeared as a poster presentation in workshop on Machine Learning for Molecules and Materials in NIPS 2017

arXiv:1706.10031 [pdf, other]

Neural Sequence Model Training via $α$-divergence Minimization

Authors: Sotetsu Koyamada, Yuta Kikuchi, Atsunori Kanemura, Shin-ichi Maeda, Shin Ishii

Abstract: We propose a new neural sequence model training method in which the objective function is defined by $α$-divergence. We demonstrate that the objective function generalizes the maximum-likelihood (ML)-based and reinforcement learning (RL)-based objective functions as special cases (i.e., ML corresponds to $α\to 0$ and RL to $α\to1$). We also show that the gradient of the objective function can be c… ▽ More We propose a new neural sequence model training method in which the objective function is defined by $α$-divergence. We demonstrate that the objective function generalizes the maximum-likelihood (ML)-based and reinforcement learning (RL)-based objective functions as special cases (i.e., ML corresponds to $α\to 0$ and RL to $α\to1$). We also show that the gradient of the objective function can be considered a mixture of ML- and RL-based objective gradients. The experimental results of a machine translation task show that minimizing the objective function with $α> 0$ outperforms $α\to 0$, which corresponds to ML-based methods. △ Less

Submitted 30 June, 2017; originally announced June 2017.

Comments: 2017 ICML Workshop on Learning to Generate Natural Language (LGNL 2017)

arXiv:1704.03976 [pdf, other]

Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning

Authors: Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Shin Ishii

Abstract: We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input. Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation. Unlike adversarial training, our method defines the adversarial direction without label info… ▽ More We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input. Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation. Unlike adversarial training, our method defines the adversarial direction without label information and is hence applicable to semi-supervised learning. Because the directions in which we smooth the model are only "virtually" adversarial, we call our method virtual adversarial training (VAT). The computational cost of VAT is relatively low. For neural networks, the approximated gradient of virtual adversarial loss can be computed with no more than two pairs of forward- and back-propagations. In our experiments, we applied VAT to supervised and semi-supervised learning tasks on multiple benchmark datasets. With a simple enhancement of the algorithm based on the entropy minimization principle, our VAT achieves state-of-the-art performance for semi-supervised learning tasks on SVHN and CIFAR-10. △ Less

Submitted 27 June, 2018; v1 submitted 12 April, 2017; originally announced April 2017.

Comments: To be appeared in IEEE Transactions on Pattern Analysis and Machine Intelligence

arXiv:1509.01004 [pdf, other]

Bayesian Masking: Sparse Bayesian Estimation with Weaker Shrinkage Bias

Authors: Yohei Kondo, Kohei Hayashi, Shin-ichi Maeda

Abstract: A common strategy for sparse linear regression is to introduce regularization, which eliminates irrelevant features by letting the corresponding weights be zeros. However, regularization often shrinks the estimator for relevant features, which leads to incorrect feature selection. Motivated by the above-mentioned issue, we propose Bayesian masking (BM), a sparse estimation method which imposes no… ▽ More A common strategy for sparse linear regression is to introduce regularization, which eliminates irrelevant features by letting the corresponding weights be zeros. However, regularization often shrinks the estimator for relevant features, which leads to incorrect feature selection. Motivated by the above-mentioned issue, we propose Bayesian masking (BM), a sparse estimation method which imposes no regularization on the weights. The key concept of BM is to introduce binary latent variables that randomly mask features. Estimating the masking rates determines the relevance of the features automatically. We derive a variational Bayesian inference algorithm that maximizes the lower bound of the factorized information criterion (FIC), which is a recently developed asymptotic criterion for evaluating the marginal log-likelihood. In addition, we propose reparametrization to accelerate the convergence of the derived algorithm. Finally, we show that BM outperforms Lasso and automatic relevance determination (ARD) in terms of the sparsity-shrinkage trade-off. △ Less

Submitted 6 October, 2015; v1 submitted 3 September, 2015; originally announced September 2015.

arXiv:1507.00677 [pdf, other]

Distributional Smoothing with Virtual Adversarial Training

Authors: Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii

Abstract: We propose local distributional smoothness (LDS), a new notion of smoothness for statistical model that can be used as a regularization term to promote the smoothness of the model distribution. We named the LDS based regularization as virtual adversarial training (VAT). The LDS of a model at an input datapoint is defined as the KL-divergence based robustness of the model distribution against local… ▽ More We propose local distributional smoothness (LDS), a new notion of smoothness for statistical model that can be used as a regularization term to promote the smoothness of the model distribution. We named the LDS based regularization as virtual adversarial training (VAT). The LDS of a model at an input datapoint is defined as the KL-divergence based robustness of the model distribution against local perturbation around the datapoint. VAT resembles adversarial training, but distinguishes itself in that it determines the adversarial direction from the model distribution alone without using the label information, making it applicable to semi-supervised learning. The computational cost for VAT is relatively low. For neural network, the approximated gradient of the LDS can be computed with no more than three pairs of forward and back propagations. When we applied our technique to supervised and semi-supervised learning for the MNIST dataset, it outperformed all the training methods other than the current state of the art method, which is based on a highly advanced generative model. We also applied our method to SVHN and NORB, and confirmed our method's superior performance over the current state of the art semi-supervised method applied to these datasets. △ Less

Submitted 11 June, 2016; v1 submitted 2 July, 2015; originally announced July 2015.

Comments: Under review as a conference paper at ICLR 2016

arXiv:1504.05665 [pdf, ps, other]

Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal Likelihood

Authors: Kohei Hayashi, Shin-ichi Maeda, Ryohei Fujimaki

Abstract: Factorized information criterion (FIC) is a recently developed approximation technique for the marginal log-likelihood, which provides an automatic model selection framework for a few latent variable models (LVMs) with tractable inference algorithms. This paper reconsiders FIC and fills theoretical gaps of previous FIC studies. First, we reveal the core idea of FIC that allows generalization for a… ▽ More Factorized information criterion (FIC) is a recently developed approximation technique for the marginal log-likelihood, which provides an automatic model selection framework for a few latent variable models (LVMs) with tractable inference algorithms. This paper reconsiders FIC and fills theoretical gaps of previous FIC studies. First, we reveal the core idea of FIC that allows generalization for a broader class of LVMs, including continuous LVMs, in contrast to previous FICs, which are applicable only to binary LVMs. Second, we investigate the model selection mechanism of the generalized FIC. Our analysis provides a formal justification of FIC as a model selection criterion for LVMs and also a systematic procedure for pruning redundant latent variables that have been removed heuristically in previous studies. Third, we provide an interpretation of FIC as a variational free energy and uncover a few previously-unknown their relationships. A demonstrative study on Bayesian principal component analysis is provided and numerical experiments support our theoretical results. △ Less

Submitted 22 April, 2015; originally announced April 2015.

arXiv:1412.7003 [pdf, other]

A Bayesian encourages dropout

Authors: Shin-ichi Maeda

Abstract: Dropout is one of the key techniques to prevent the learning from overfitting. It is explained that dropout works as a kind of modified L2 regularization. Here, we shed light on the dropout from Bayesian standpoint. Bayesian interpretation enables us to optimize the dropout rate, which is beneficial for learning of weight parameters and prediction after learning. The experiment result also encoura… ▽ More Dropout is one of the key techniques to prevent the learning from overfitting. It is explained that dropout works as a kind of modified L2 regularization. Here, we shed light on the dropout from Bayesian standpoint. Bayesian interpretation enables us to optimize the dropout rate, which is beneficial for learning of weight parameters and prediction after learning. The experiment result also encourages the optimization of the dropout. △ Less

Submitted 30 December, 2014; v1 submitted 22 December, 2014; originally announced December 2014.

Showing 1–37 of 37 results for author: Mada, S