Search | arXiv e-print repository

Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning

Abstract: Recent studies on online reinforcement learning (RL) have demonstrated the advantages of learning multiple behaviors from a single task, as in the case of few-shot adaptation to a new environment. Although this approach is expected to yield similar benefits in offline RL, appropriate methods for learning multiple solutions have not been fully investigated in previous studies. In this study, we the… ▽ More Recent studies on online reinforcement learning (RL) have demonstrated the advantages of learning multiple behaviors from a single task, as in the case of few-shot adaptation to a new environment. Although this approach is expected to yield similar benefits in offline RL, appropriate methods for learning multiple solutions have not been fully investigated in previous studies. In this study, we therefore addressed the problem of finding multiple solutions from a single task in offline RL. We propose algorithms that can learn multiple solutions in offline RL, and empirically investigate their performance. Our experimental results show that the proposed algorithm learns multiple qualitatively and quantitatively distinctive solutions in offline RL. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: ICML 2024, 21 pages

arXiv:2006.08210 [pdf, other]

Hyperbolic Neural Networks++

Authors: Ryohei Shimizu, Yusuke Mukuta, Tatsuya Harada

Abstract: Hyperbolic spaces, which have the capacity to embed tree structures without distortion owing to their exponential volume growth, have recently been applied to machine learning to better capture the hierarchical nature of data. In this study, we generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincaré ball model. This novel methodology con… ▽ More Hyperbolic spaces, which have the capacity to embed tree structures without distortion owing to their exponential volume growth, have recently been applied to machine learning to better capture the hierarchical nature of data. In this study, we generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincaré ball model. This novel methodology constructs a multinomial logistic regression, fully-connected layers, convolutional layers, and attention mechanisms under a unified mathematical interpretation, without increasing the parameters. Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts. △ Less

Submitted 17 March, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: The Ninth International Conference on Learning Representations (ICLR 2021)

arXiv:2003.07849 [pdf, other]

Blur, Noise, and Compression Robust Generative Adversarial Networks

Authors: Takuhiro Kaneko, Tatsuya Harada

Abstract: Generative adversarial networks (GANs) have gained considerable attention owing to their ability to reproduce images. However, they can recreate training images faithfully despite image degradation in the form of blur, noise, and compression, generating similarly degraded images. To solve this problem, the recently proposed noise robust GAN (NR-GAN) provides a partial solution by demonstrating the… ▽ More Generative adversarial networks (GANs) have gained considerable attention owing to their ability to reproduce images. However, they can recreate training images faithfully despite image degradation in the form of blur, noise, and compression, generating similarly degraded images. To solve this problem, the recently proposed noise robust GAN (NR-GAN) provides a partial solution by demonstrating the ability to learn a clean image generator directly from noisy images using a two-generator model comprising image and noise generators. However, its application is limited to noise, which is relatively easy to decompose owing to its additive and reversible characteristics, and its application to irreversible image degradation, in the form of blur, compression, and combination of all, remains a challenge. To address these problems, we propose blur, noise, and compression robust GAN (BNCR-GAN) that can learn a clean image generator directly from degraded images without knowledge of degradation parameters (e.g., blur kernel types, noise amounts, or quality factor values). Inspired by NR-GAN, BNCR-GAN uses a multiple-generator model composed of image, blur-kernel, noise, and quality-factor generators. However, in contrast to NR-GAN, to address irreversible characteristics, we introduce masking architectures adjusting degradation strength values in a data-driven manner using bypasses before and after degradation. Furthermore, to suppress uncertainty caused by the combination of blur, noise, and compression, we introduce adaptive consistency losses imposing consistency between irreversible degradation processes according to the degradation strengths. We demonstrate the effectiveness of BNCR-GAN through large-scale comparative studies on CIFAR-10 and a generality analysis on FFHQ. In addition, we demonstrate the applicability of BNCR-GAN in image restoration. △ Less

Submitted 23 June, 2021; v1 submitted 17 March, 2020; originally announced March 2020.

Comments: Accepted to CVPR 2021. Project page: https://takuhirok.github.io/BNCR-GAN/

arXiv:1911.11776 [pdf, other]

Noise Robust Generative Adversarial Networks

Authors: Takuhiro Kaneko, Tatsuya Harada

Abstract: Generative adversarial networks (GANs) are neural networks that learn data distributions through adversarial training. In intensive studies, recent GANs have shown promising results for reproducing training images. However, in spite of noise, they reproduce images with fidelity. As an alternative, we propose a novel family of GANs called noise robust GANs (NR-GANs), which can learn a clean image g… ▽ More Generative adversarial networks (GANs) are neural networks that learn data distributions through adversarial training. In intensive studies, recent GANs have shown promising results for reproducing training images. However, in spite of noise, they reproduce images with fidelity. As an alternative, we propose a novel family of GANs called noise robust GANs (NR-GANs), which can learn a clean image generator even when training images are noisy. In particular, NR-GANs can solve this problem without having complete noise information (e.g., the noise distribution type, noise amount, or signal-noise relationship). To achieve this, we introduce a noise generator and train it along with a clean image generator. However, without any constraints, there is no incentive to generate an image and noise separately. Therefore, we propose distribution and transformation constraints that encourage the noise generator to capture only the noise-specific components. In particular, considering such constraints under different assumptions, we devise two variants of NR-GANs for signal-independent noise and three variants of NR-GANs for signal-dependent noise. On three benchmark datasets, we demonstrate the effectiveness of NR-GANs in noise robust image generation. Furthermore, we show the applicability of NR-GANs in image denoising. Our code is available at https://github.com/takuhirok/NR-GAN/. △ Less

Submitted 31 March, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

Comments: Accepted to CVPR 2020. Project page: https://takuhirok.github.io/NR-GAN/

arXiv:1910.01409 [pdf, other]

A General Upper Bound for Unsupervised Domain Adaptation

Authors: Dexuan Zhang, Tatsuya Harada

Abstract: In this work, we present a novel upper bound of target error to address the problem for unsupervised domain adaptation. Recent studies reveal that a deep neural network can learn transferable features which generalize well to novel tasks. Furthermore, a theory proposed by Ben-David et al. (2010) provides a upper bound for target error when transferring the knowledge, which can be summarized as min… ▽ More In this work, we present a novel upper bound of target error to address the problem for unsupervised domain adaptation. Recent studies reveal that a deep neural network can learn transferable features which generalize well to novel tasks. Furthermore, a theory proposed by Ben-David et al. (2010) provides a upper bound for target error when transferring the knowledge, which can be summarized as minimizing the source error and distance between marginal distributions simultaneously. However, common methods based on the theory usually ignore the joint error such that samples from different classes might be mixed together when matching marginal distribution. And in such case, no matter how we minimize the marginal discrepancy, the target error is not bounded due to an increasing joint error. To address this problem, we propose a general upper bound taking joint error into account, such that the undesirable case can be properly penalized. In addition, we utilize constrained hypothesis space to further formalize a tighter bound as well as a novel cross margin discrepancy to measure the dissimilarity between hypotheses which alleviates instability during adversarial learning. Extensive empirical evidence shows that our proposal outperforms related approaches in image classification error rates on standard domain adaptation benchmarks. △ Less

Submitted 4 October, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

Comments: 20 pages

arXiv:1910.00216 [pdf, other]

Revisiting Fine-tuning for Few-shot Learning

Authors: Akihiro Nakamura, Tatsuya Harada

Abstract: Few-shot learning is the process of learning novel classes using only a few examples and it remains a challenging task in machine learning. Many sophisticated few-shot learning algorithms have been proposed based on the notion that networks can easily overfit to novel examples if they are simply fine-tuned using only a few examples. In this study, we show that in the commonly used low-resolution m… ▽ More Few-shot learning is the process of learning novel classes using only a few examples and it remains a challenging task in machine learning. Many sophisticated few-shot learning algorithms have been proposed based on the notion that networks can easily overfit to novel examples if they are simply fine-tuned using only a few examples. In this study, we show that in the commonly used low-resolution mini-ImageNet dataset, the fine-tuning method achieves higher accuracy than common few-shot learning algorithms in the 1-shot task and nearly the same accuracy as that of the state-of-the-art algorithm in the 5-shot task. We then evaluate our method with more practical tasks, namely the high-resolution single-domain and cross-domain tasks. With both tasks, we show that our method achieves higher accuracy than common few-shot learning algorithms. We further analyze the experimental results and show that: 1) the retraining process can be stabilized by employing a low learning rate, 2) using adaptive gradient optimizers during fine-tuning can increase test accuracy, and 3) test accuracy can be improved by updating the entire network when a large domain-shift exists between base and novel classes. △ Less

Submitted 3 October, 2019; v1 submitted 1 October, 2019; originally announced October 2019.

Comments: 10 pages

arXiv:1909.12655 [pdf, other]

Rethinking Task and Metrics of Instance Segmentation on 3D Point Clouds

Authors: Kosuke Arase, Yusuke Mukuta, Tatsuya Harada

Abstract: Instance segmentation on 3D point clouds is one of the most extensively researched areas toward the realization of autonomous cars and robots. Certain existing studies have split input point clouds into small regions such as 1m x 1m; one reason for this is that models in the studies cannot consume a large number of points because of the large space complexity. However, because such small regions o… ▽ More Instance segmentation on 3D point clouds is one of the most extensively researched areas toward the realization of autonomous cars and robots. Certain existing studies have split input point clouds into small regions such as 1m x 1m; one reason for this is that models in the studies cannot consume a large number of points because of the large space complexity. However, because such small regions occasionally include a very small number of instances belonging to the same class, an evaluation using existing metrics such as mAP is largely affected by the category recognition performance. To address these problems, we propose a new method with space complexity O(Np) such that large regions can be consumed, as well as novel metrics for tasks that are independent of the categories or size of the inputs. Our method learns a map** from input point clouds to an embedding space, where the embeddings form clusters for each instance and distinguish instances using these clusters during testing. Our method achieves state-of-the-art performance using both existing and the proposed metrics. Moreover, we show that our new metric can evaluate the performance of a task without being affected by any other condition. △ Less

Submitted 27 September, 2019; originally announced September 2019.

Comments: The 4th Workshop on Geometry Meets Deep Learning (ICCV Workshop 2019)

arXiv:1906.01861 [pdf, other]

Scalable Generative Models for Graphs with Graph Attention Mechanism

Authors: Wataru Kawai, Yusuke Mukuta, Tatsuya Harada

Abstract: Graphs are ubiquitous real-world data structures, and generative models that approximate distributions over graphs and derive new samples from them have significant importance. Among the known challenges in graph generation tasks, scalability handling of large graphs and datasets is one of the most important for practical applications. Recently, an increasing number of graph generative models have… ▽ More Graphs are ubiquitous real-world data structures, and generative models that approximate distributions over graphs and derive new samples from them have significant importance. Among the known challenges in graph generation tasks, scalability handling of large graphs and datasets is one of the most important for practical applications. Recently, an increasing number of graph generative models have been proposed and have demonstrated impressive results. However, scalability is still an unresolved problem due to the complex generation process or difficulty in training parallelization. In this paper, we first define scalability from three different perspectives: number of nodes, data, and node/edge labels. Then, we propose GRAM, a generative model for graphs that is scalable in all three contexts, especially in training. We aim to achieve scalability by employing a novel graph attention mechanism, formulating the likelihood of graphs in a simple and general manner. Also, we apply two techniques to reduce computational complexity. Furthermore, we construct a unified and non-domain-specific evaluation metric in node/edge-labeled graph generation tasks by combining a graph kernel and Maximum Mean Discrepancy. Our experiments on synthetic and real-world graphs demonstrated the scalability of our models and their superior performance compared with baseline methods. △ Less

Submitted 3 October, 2019; v1 submitted 5 June, 2019; originally announced June 2019.

Comments: 22 pages, 14 figures

arXiv:1906.01857 [pdf, other]

Invariant Feature Coding using Tensor Product Representation

Authors: Yusuke Mukuta, Tatsuya Harada

Abstract: In this study, a novel feature coding method that exploits invariance for transformations represented by a finite group of orthogonal matrices is proposed. We prove that the group-invariant feature vector contains sufficient discriminative information when learning a linear classifier using convex loss minimization. Based on this result, a novel feature model that explicitly consider group action… ▽ More In this study, a novel feature coding method that exploits invariance for transformations represented by a finite group of orthogonal matrices is proposed. We prove that the group-invariant feature vector contains sufficient discriminative information when learning a linear classifier using convex loss minimization. Based on this result, a novel feature model that explicitly consider group action is proposed for principal component analysis and k-means clustering, which are commonly used in most feature coding methods, and global feature functions. Although the global feature functions are in general complex nonlinear functions, the group action on this space can be easily calculated by constructing these functions as tensor-product representations of basic representations, resulting in an explicit form of invariant feature functions. The effectiveness of our method is demonstrated on several image datasets. △ Less

Submitted 8 March, 2023; v1 submitted 5 June, 2019; originally announced June 2019.

Comments: 26 pages, 41 figures

arXiv:1906.01851 [pdf, other]

Compact Approximation for Polynomial of Covariance Feature

Authors: Yusuke Mukuta, Tatsuaki Machida, Tatsuya Harada

Abstract: Covariance pooling is a feature pooling method with good classification accuracy. Because covariance features consist of second-order statistics, the scale of the feature elements are varied. Therefore, normalizing covariance features using a matrix square root affects the performance improvement. When pooling methods are applied to local features extracted from CNN models, the accuracy increases… ▽ More Covariance pooling is a feature pooling method with good classification accuracy. Because covariance features consist of second-order statistics, the scale of the feature elements are varied. Therefore, normalizing covariance features using a matrix square root affects the performance improvement. When pooling methods are applied to local features extracted from CNN models, the accuracy increases when the pooling function is back-propagatable and the feature-extraction model is learned in an end-to-end manner. Recently, the iterative polynomial approximation method for the matrix square root of a covariance feature was proposed, and resulted in a faster and more stable training than the methods based on singular-value decomposition. In this paper, we propose an extension of compact bilinear pooling, which is a compact approximation of the standard covariance feature, to the polynomials of the covariance feature. Subsequently, we apply the proposed approximation to the polynomial corresponding to the matrix square root to obtain a compact approximation for the square root of the covariance feature. Our method approximates a higher-dimensional polynomial of a covariance by the weighted sum of the approximate features corresponding to a pair of local features based on the similarity of the local features. We apply our method for standard fine-grained image recognition datasets and demonstrate that the proposed method shows comparable accuracy with fewer dimensions than the original feature. △ Less

Submitted 5 June, 2019; originally announced June 2019.

Comments: 9 pages, 7 figures

arXiv:1905.02185 [pdf, other]

Label-Noise Robust Multi-Domain Image-to-Image Translation

Authors: Takuhiro Kaneko, Tatsuya Harada

Abstract: Multi-domain image-to-image translation is a problem where the goal is to learn map**s among multiple domains. This problem is challenging in terms of scalability because it requires the learning of numerous map**s, the number of which increases proportional to the number of domains. However, generative adversarial networks (GANs) have emerged recently as a powerful framework for this problem.… ▽ More Multi-domain image-to-image translation is a problem where the goal is to learn map**s among multiple domains. This problem is challenging in terms of scalability because it requires the learning of numerous map**s, the number of which increases proportional to the number of domains. However, generative adversarial networks (GANs) have emerged recently as a powerful framework for this problem. In particular, label-conditional extensions (e.g., StarGAN) have become a promising solution owing to their ability to address this problem using only a single unified model. Nonetheless, a limitation is that they rely on the availability of large-scale clean-labeled data, which are often laborious or impractical to collect in a real-world scenario. To overcome this limitation, we propose a novel model called the label-noise robust image-to-image translation model (RMIT) that can learn a clean label conditional generator even when noisy labeled data are only available. In particular, we propose a novel loss called the virtual cycle consistency loss that is able to regularize cyclic reconstruction independently of noisy labeled data, as well as we introduce advanced techniques to boost the performance in practice. Our experimental results demonstrate that RMIT is useful for obtaining label-noise robustness in various settings including synthetic and real-world noise. △ Less

Submitted 6 May, 2019; originally announced May 2019.

arXiv:1812.07405 [pdf, ps, other]

TWINs: Two Weighted Inconsistency-reduced Networks for Partial Domain Adaptation

Authors: Toshihiko Matsuura, Kuniaki Saito, Tatsuya Harada

Abstract: The task of unsupervised domain adaptation is proposed to transfer the knowledge of a label-rich domain (source domain) to a label-scarce domain (target domain). Matching feature distributions between different domains is a widely applied method for the aforementioned task. However, the method does not perform well when classes in the two domains are not identical. Specifically, when the classes o… ▽ More The task of unsupervised domain adaptation is proposed to transfer the knowledge of a label-rich domain (source domain) to a label-scarce domain (target domain). Matching feature distributions between different domains is a widely applied method for the aforementioned task. However, the method does not perform well when classes in the two domains are not identical. Specifically, when the classes of the target correspond to a subset of those of the source, target samples can be incorrectly aligned with the classes that exist only in the source. This problem setting is termed as partial domain adaptation (PDA). In this study, we propose a novel method called Two Weighted Inconsistency-reduced Networks (TWINs) for PDA. We utilize two classification networks to estimate the ratio of the target samples in each class with which a classification loss is weighted to adapt the classes present in the target domain. Furthermore, to extract discriminative features for the target, we propose to minimize the divergence between domains measured by the classifiers' inconsistency on target samples. We empirically demonstrate that reducing the inconsistency between two networks is effective for PDA and that our method outperforms other existing methods with a large margin in several datasets. △ Less

Submitted 18 December, 2018; originally announced December 2018.

arXiv:1811.11165 [pdf, other]

Label-Noise Robust Generative Adversarial Networks

Authors: Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada

Abstract: Generative adversarial networks (GANs) are a framework that learns a generative distribution through adversarial training. Recently, their class-conditional extensions (e.g., conditional GAN (cGAN) and auxiliary classifier GAN (AC-GAN)) have attracted much attention owing to their ability to learn the disentangled representations and to improve the training stability. However, their training requi… ▽ More Generative adversarial networks (GANs) are a framework that learns a generative distribution through adversarial training. Recently, their class-conditional extensions (e.g., conditional GAN (cGAN) and auxiliary classifier GAN (AC-GAN)) have attracted much attention owing to their ability to learn the disentangled representations and to improve the training stability. However, their training requires the availability of large-scale accurate class-labeled data, which are often laborious or impractical to collect in a real-world scenario. To remedy this, we propose a novel family of GANs called label-noise robust GANs (rGANs), which, by incorporating a noise transition model, can learn a clean label conditional generative distribution even when training labels are noisy. In particular, we propose two variants: rAC-GAN, which is a bridging model between AC-GAN and the label-noise robust classification model, and rcGAN, which is an extension of cGAN and solves this problem with no reliance on any classifier. In addition to providing the theoretical background, we demonstrate the effectiveness of our models through extensive experiments using diverse GAN configurations, various noise settings, and multiple evaluation metrics (in which we tested 402 conditions in total). Our code is available at https://github.com/takuhirok/rGAN/. △ Less

Submitted 2 May, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

Comments: Accepted to CVPR 2019 (Oral). Project page: https://takuhirok.github.io/rGAN/

arXiv:1811.11163 [pdf, other]

Class-Distinct and Class-Mutual Image Generation with GANs

Authors: Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada

Abstract: Class-conditional extensions of generative adversarial networks (GANs), such as auxiliary classifier GAN (AC-GAN) and conditional GAN (cGAN), have garnered attention owing to their ability to decompose representations into class labels and other factors and to boost the training stability. However, a limitation is that they assume that each class is separable and ignore the relationship between cl… ▽ More Class-conditional extensions of generative adversarial networks (GANs), such as auxiliary classifier GAN (AC-GAN) and conditional GAN (cGAN), have garnered attention owing to their ability to decompose representations into class labels and other factors and to boost the training stability. However, a limitation is that they assume that each class is separable and ignore the relationship between classes even though class overlap** frequently occurs in a real-world scenario when data are collected on the basis of diverse or ambiguous criteria. To overcome this limitation, we address a novel problem called class-distinct and class-mutual image generation, in which the goal is to construct a generator that can capture between-class relationships and generate an image selectively conditioned on the class specificity. To solve this problem without additional supervision, we propose classifier's posterior GAN (CP-GAN), in which we redesign the generator input and the objective function of AC-GAN for class-overlap** data. Precisely, we incorporate the classifier's posterior into the generator input and optimize the generator so that the classifier's posterior of generated data corresponds with that of real data. We demonstrate the effectiveness of CP-GAN using both controlled and real-world class-overlap** data with a model configuration analysis and comparative study. Our code is available at https://github.com/takuhirok/CP-GAN/. △ Less

Submitted 24 July, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

Comments: Accepted to BMVC 2019 (Spotlight). Project page: https://takuhirok.github.io/CP-GAN/

arXiv:1711.10284 [pdf, other]

Between-class Learning for Image Classification

Authors: Yuji Tokozume, Yoshitaka Ushiku, Tatsuya Harada

Abstract: In this paper, we propose a novel learning method for image classification called Between-Class learning (BC learning). We generate between-class images by mixing two images belonging to different classes with a random ratio. We then input the mixed image to the model and train the model to output the mixing ratio. BC learning has the ability to impose constraints on the shape of the feature distr… ▽ More In this paper, we propose a novel learning method for image classification called Between-Class learning (BC learning). We generate between-class images by mixing two images belonging to different classes with a random ratio. We then input the mixed image to the model and train the model to output the mixing ratio. BC learning has the ability to impose constraints on the shape of the feature distributions, and thus the generalization ability is improved. BC learning is originally a method developed for sounds, which can be digitally mixed. Mixing two image data does not appear to make sense; however, we argue that because convolutional neural networks have an aspect of treating input data as waveforms, what works on sounds must also work on images. First, we propose a simple mixing method using internal divisions, which surprisingly proves to significantly improve performance. Second, we propose a mixing method that treats the images as waveforms, which leads to a further improvement in performance. As a result, we achieved 19.4% and 2.26% top-1 errors on ImageNet-1K and CIFAR-10, respectively. △ Less

Submitted 8 April, 2018; v1 submitted 28 November, 2017; originally announced November 2017.

Comments: 11 pages, 8 figures, published as a conference paper at CVPR 2018

arXiv:1711.10282 [pdf, other]

Learning from Between-class Examples for Deep Sound Recognition

Authors: Yuji Tokozume, Yoshitaka Ushiku, Tatsuya Harada

Abstract: Deep learning methods have achieved high performance in sound recognition tasks. Deciding how to feed the training data is important for further performance improvement. We propose a novel learning method for deep sound recognition: Between-Class learning (BC learning). Our strategy is to learn a discriminative feature space by recognizing the between-class sounds as between-class sounds. We gener… ▽ More Deep learning methods have achieved high performance in sound recognition tasks. Deciding how to feed the training data is important for further performance improvement. We propose a novel learning method for deep sound recognition: Between-Class learning (BC learning). Our strategy is to learn a discriminative feature space by recognizing the between-class sounds as between-class sounds. We generate between-class sounds by mixing two sounds belonging to different classes with a random ratio. We then input the mixed sound to the model and train the model to output the mixing ratio. The advantages of BC learning are not limited only to the increase in variation of the training data; BC learning leads to an enlargement of Fisher's criterion in the feature space and a regularization of the positional relationship among the feature distributions of the classes. The experimental results show that BC learning improves the performance on various sound recognition networks, datasets, and data augmentation schemes, in which BC learning proves to be always beneficial. Furthermore, we construct a new deep sound recognition network (EnvNet-v2) and train it with BC learning. As a result, we achieved a performance surpasses the human level. △ Less

Submitted 28 February, 2018; v1 submitted 28 November, 2017; originally announced November 2017.

Comments: 13 pages, 6 figures, published as a conference paper at ICLR 2018

arXiv:1612.07976 [pdf, ps, other]

DeMIAN: Deep Modality Invariant Adversarial Network

Authors: Kuniaki Saito, Yusuke Mukuta, Yoshitaka Ushiku, Tatsuya Harada

Abstract: Obtaining common representations from different modalities is important in that they are interchangeable with each other in a classification problem. For example, we can train a classifier on image features in the common representations and apply it to the testing of the text features in the representations. Existing multi-modal representation learning methods mainly aim to extract rich informatio… ▽ More Obtaining common representations from different modalities is important in that they are interchangeable with each other in a classification problem. For example, we can train a classifier on image features in the common representations and apply it to the testing of the text features in the representations. Existing multi-modal representation learning methods mainly aim to extract rich information from paired samples and train a classifier by the corresponding labels; however, collecting paired samples and their labels simultaneously involves high labor costs. Addressing paired modal samples without their labels and single modal data with their labels independently is much easier than addressing labeled multi-modal data. To obtain the common representations under such a situation, we propose to make the distributions over different modalities similar in the learned representations, namely modality-invariant representations. In particular, we propose a novel algorithm for modality-invariant representation learning, named Deep Modality Invariant Adversarial Network (DeMIAN), which utilizes the idea of Domain Adaptation (DA). Using the modality-invariant representations learned by DeMIAN, we achieved better classification accuracy than with the state-of-the-art methods, especially for some benchmark datasets of zero-shot learning. △ Less

Submitted 27 December, 2016; v1 submitted 23 December, 2016; originally announced December 2016.

arXiv:1503.05743 [pdf, ps, other]

Implementation of a Practical Distributed Calculation System with Browsers and JavaScript, and Application to Distributed Deep Learning

Authors: Ken Miura, Tatsuya Harada

Abstract: Deep learning can achieve outstanding results in various fields. However, it requires so significant computational power that graphics processing units (GPUs) and/or numerous computers are often required for the practical application. We have developed a new distributed calculation framework called "Sashimi" that allows any computer to be used as a distribution node only by accessing a website. We… ▽ More Deep learning can achieve outstanding results in various fields. However, it requires so significant computational power that graphics processing units (GPUs) and/or numerous computers are often required for the practical application. We have developed a new distributed calculation framework called "Sashimi" that allows any computer to be used as a distribution node only by accessing a website. We have also developed a new JavaScript neural network framework called "Sukiyaki" that uses general purpose GPUs with web browsers. Sukiyaki performs 30 times faster than a conventional JavaScript library for deep convolutional neural networks (deep CNNs) learning. The combination of Sashimi and Sukiyaki, as well as new distribution algorithms, demonstrates the distributed deep learning of deep CNNs only with web browsers on various devices. The libraries that comprise the proposed methods are available under MIT license at http://mil-tokyo.github.io/. △ Less

Submitted 19 March, 2015; originally announced March 2015.

arXiv:1502.06064 [pdf, ps, other]

MILJS : Brand New JavaScript Libraries for Matrix Calculation and Machine Learning

Authors: Ken Miura, Tetsuaki Mano, Atsushi Kanehira, Yuichiro Tsuchiya, Tatsuya Harada

Abstract: MILJS is a collection of state-of-the-art, platform-independent, scalable, fast JavaScript libraries for matrix calculation and machine learning. Our core library offering a matrix calculation is called Sushi, which exhibits far better performance than any other leading machine learning libraries written in JavaScript. Especially, our matrix multiplication is 177 times faster than the fastest Java… ▽ More MILJS is a collection of state-of-the-art, platform-independent, scalable, fast JavaScript libraries for matrix calculation and machine learning. Our core library offering a matrix calculation is called Sushi, which exhibits far better performance than any other leading machine learning libraries written in JavaScript. Especially, our matrix multiplication is 177 times faster than the fastest JavaScript benchmark. Based on Sushi, a machine learning library called Tempura is provided, which supports various algorithms widely used in machine learning research. We also provide Soba as a visualization library. The implementations of our libraries are clearly written, properly documented and thus can are easy to get started with, as long as there is a web browser. These libraries are available from http://mil-tokyo.github.io/ under the MIT license. △ Less

Submitted 20 February, 2015; originally announced February 2015.

Showing 1–19 of 19 results for author: Harada, T