-
Flow matching achieves minimax optimal convergence
Authors:
Kenji Fukumizu,
Taiji Suzuki,
Noboru Isobe,
Kazusato Oko,
Masanori Koyama
Abstract:
Flow matching (FM) has gained significant attention as a simulation-free generative model. Unlike diffusion models, which are based on stochastic differential equations, FM employs a simpler approach by solving an ordinary differential equation with an initial condition from a normal distribution, thus streamlining the sample generation process. This paper discusses the convergence properties of F…
▽ More
Flow matching (FM) has gained significant attention as a simulation-free generative model. Unlike diffusion models, which are based on stochastic differential equations, FM employs a simpler approach by solving an ordinary differential equation with an initial condition from a normal distribution, thus streamlining the sample generation process. This paper discusses the convergence properties of FM in terms of the $p$-Wasserstein distance, a measure of distributional discrepancy. We establish that FM can achieve the minmax optimal convergence rate for $1 \leq p \leq 2$, presenting the first theoretical evidence that FM can reach convergence rates comparable to those of diffusion models. Our analysis extends existing frameworks by examining a broader class of mean and variance functions for the vector fields and identifies specific conditions necessary to attain these optimal rates.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Optimal Switching Networks for Paired-Egress Bell State Analyzer Pools
Authors:
Marii Koyama,
Claire Yun,
Amin Taherkhani,
Naphan Benchasattabuse,
Bernard Ousmane Sane,
Michal Hajdušek,
Shota Nagayama,
Rodney Van Meter
Abstract:
To scale quantum computers to useful levels, we must build networks of quantum computational nodes that can share entanglement for use in distributed forms of quantum algorithms. In one proposed architecture, node-to-node entanglement is created when nodes emit photons entangled with stationary memories, with the photons routed through a switched interconnect to a shared pool of Bell state analyze…
▽ More
To scale quantum computers to useful levels, we must build networks of quantum computational nodes that can share entanglement for use in distributed forms of quantum algorithms. In one proposed architecture, node-to-node entanglement is created when nodes emit photons entangled with stationary memories, with the photons routed through a switched interconnect to a shared pool of Bell state analyzers (BSAs). Designs that optimize switching circuits will reduce loss and crosstalk, raising entanglement rates and fidelity. We present optimal designs for switched interconnects constrained to planar layouts, appropriate for silicon waveguides and Mach-Zehnder interferometer (MZI) $2 \times 2$ switch points. The architectures for the optimal designs are scalable and algorithmically structured to pair any arbitrary inputs in a rearrangeable, non-blocking way. For pairing $N$ inputs, $N(N - 2)/4$ switches are required, which is less than half of number of switches required for full permutation switching networks. An efficient routing algorithm is also presented for each architecture. These designs can also be employed in reverse for entanglement generation using a shared pool of entangled paired photon sources.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Solar chromospheric heating by magnetohydrodynamic waves: dependence on magnetic field inclination
Authors:
Mayu Koyama,
Toshifumi Shimizu
Abstract:
A proposed mechanism for solar chromospheric heating is that magnetohydrodynamic waves propagate upward along magnetic field lines and dissipate their energy in the chromosphere. In particular, compressible magneto-acoustic waves may contribute to the heating. Theoretically, the components below the cutoff frequency cannot propagate into the chromosphere; however, the cutoff frequency depends on t…
▽ More
A proposed mechanism for solar chromospheric heating is that magnetohydrodynamic waves propagate upward along magnetic field lines and dissipate their energy in the chromosphere. In particular, compressible magneto-acoustic waves may contribute to the heating. Theoretically, the components below the cutoff frequency cannot propagate into the chromosphere; however, the cutoff frequency depends on the inclination of the magnetic field lines. In this study, using high temporal cadence spectral data of IRIS and Hinode SOT spectropolarimeter (SP) in plages, we investigated the dependence of the low-frequency waves on magnetic-field properties and quantitatively estimated the amount of energy dissipation in the chromosphere. The following results were obtained: (a) The amount of energy dissipated by the low-frequency component (3--6 mHz) increases with the field inclination below 40 degrees, whereas it is decreased as a function of the field inclination above 40 degrees. (b) The amount of the energy is enhanced toward $10^4 W/m^2$, which is the energy required for heating in the chromospheric plage regions, when the magnetic field is higher than 600 G and inclined more than 40 degree. (c) In the photosphere, the low-frequency component has much more power in the magnetic field inclined more and weaker than 400 G. The results suggest that the observed low-frequency components can bring the energy along the magnetic field lines and that only a specific range of the field inclination angles and field strength may allow the low-frequency component to bring the sufficient amount of the energy into the chromosphere.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Extended Flow Matching: a Method of Conditional Generation with Generalized Continuity Equation
Authors:
Noboru Isobe,
Masanori Koyama,
**zhe Zhang,
Kohei Hayashi,
Kenji Fukumizu
Abstract:
The task of conditional generation is one of the most important applications of generative models, and numerous methods have been developed to date based on the celebrated flow-based models. However, many flow-based models in use today are not built to allow one to introduce an explicit inductive bias to how the conditional distribution to be generated changes with respect to conditions. This can…
▽ More
The task of conditional generation is one of the most important applications of generative models, and numerous methods have been developed to date based on the celebrated flow-based models. However, many flow-based models in use today are not built to allow one to introduce an explicit inductive bias to how the conditional distribution to be generated changes with respect to conditions. This can result in unexpected behavior in the task of style transfer, for example. In this research, we introduce extended flow matching (EFM), a direct extension of flow matching that learns a "matrix field" corresponding to the continuous map from the space of conditions to the space of distributions. We show that we can introduce inductive bias to the conditional generation through the matrix field and demonstrate this fact with MMOT-EFM, a version of EFM that aims to minimize the Dirichlet energy or the sensitivity of the distribution with respect to conditions. We will present our theory along with experimental results that support the competitiveness of EFM in conditional generation.
△ Less
Submitted 5 July, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Reduced Vietoris-Rips Complexes: New methods to compute Vietoris-Rips Persistent Homology
Authors:
Musashi Ayrton Koyama,
Vanessa Robins,
Katharine Turner,
Facundo Memoli
Abstract:
Computing Persistent Homology for large point clouds remains a bottleneck for the wider adoption of persistent homology by the scientific community. We present an algorithm which can compute the one-dimensional Vietoris-Rips Persistent Homology of point clouds in low dimensional Euclidean Space for point clouds of the the order $10^5$.
Computing Persistent Homology for large point clouds remains a bottleneck for the wider adoption of persistent homology by the scientific community. We present an algorithm which can compute the one-dimensional Vietoris-Rips Persistent Homology of point clouds in low dimensional Euclidean Space for point clouds of the the order $10^5$.
△ Less
Submitted 30 July, 2023;
originally announced July 2023.
-
Neural Fourier Transform: A General Approach to Equivariant Representation Learning
Authors:
Masanori Koyama,
Kenji Fukumizu,
Kohei Hayashi,
Takeru Miyato
Abstract:
Symmetry learning has proven to be an effective approach for extracting the hidden structure of data, with the concept of equivariance relation playing the central role. However, most of the current studies are built on architectural theory and corresponding assumptions on the form of data. We propose Neural Fourier Transform (NFT), a general framework of learning the latent linear action of the g…
▽ More
Symmetry learning has proven to be an effective approach for extracting the hidden structure of data, with the concept of equivariance relation playing the central role. However, most of the current studies are built on architectural theory and corresponding assumptions on the form of data. We propose Neural Fourier Transform (NFT), a general framework of learning the latent linear action of the group without assuming explicit knowledge of how the group acts on data. We present the theoretical foundations of NFT and show that the existence of a linear equivariant feature, which has been assumed ubiquitously in equivariance learning, is equivalent to the existence of a group invariant kernel on the dataspace. We also provide experimental results to demonstrate the application of NFT in typical scenarios with varying levels of knowledge about the acting group.
△ Less
Submitted 14 February, 2024; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Exponential sum approximations of finite completely monotonic functions
Authors:
Yohei M. Koyama
Abstract:
Bernstein's theorem (also called Hausdorff--Bernstein--Widder theorem) enables the integral representation of a completely monotonic function. We introduce a finite completely monotonic function, which is a completely monotonic function with a finite positive integral interval of the integral representation. We consider the exponential sum approximation of a finite completely monotonic function ba…
▽ More
Bernstein's theorem (also called Hausdorff--Bernstein--Widder theorem) enables the integral representation of a completely monotonic function. We introduce a finite completely monotonic function, which is a completely monotonic function with a finite positive integral interval of the integral representation. We consider the exponential sum approximation of a finite completely monotonic function based on the Gaussian quadrature with a variable transformation. If the variable transformation is analytic on an open Bernstein ellipse, the maximum absolute error decreases at least geometrically with respect to the number of exponential functions. The maximization of the decreasing rate of the error bound can be achieved by using a variable transformation represented by Jacobi's delta amplitude function (also called dn function). The error curve is expanded by introducing basis functions, which are eigenfunctions of a fourth order differential operator, satisfy orthogonality conditions, and have the interlacing property of zeros by Kellogg's theorem.
△ Less
Submitted 22 July, 2023; v1 submitted 21 January, 2023;
originally announced January 2023.
-
Invariance-adapted decomposition and Lasso-type contrastive learning
Authors:
Masanori Koyama,
Takeru Miyato,
Kenji Fukumizu
Abstract:
Recent years have witnessed the effectiveness of contrastive learning in obtaining the representation of dataset that is useful in interpretation and downstream tasks. However, the mechanism that describes this effectiveness have not been thoroughly analyzed, and many studies have been conducted to investigate the data structures captured by contrastive learning. In particular, the recent study of…
▽ More
Recent years have witnessed the effectiveness of contrastive learning in obtaining the representation of dataset that is useful in interpretation and downstream tasks. However, the mechanism that describes this effectiveness have not been thoroughly analyzed, and many studies have been conducted to investigate the data structures captured by contrastive learning. In particular, the recent study of \citet{content_isolate} has shown that contrastive learning is capable of decomposing the data space into the space that is invariant to all augmentations and its complement. In this paper, we introduce the notion of invariance-adapted latent space that decomposes the data space into the intersections of the invariant spaces of each augmentation and their complements. This decomposition generalizes the one introduced in \citet{content_isolate}, and describes a structure that is analogous to the frequencies in the harmonic analysis of a group. We experimentally show that contrastive learning with lasso-type metric can be used to find an invariance-adapted latent space, thereby suggesting a new potential for the contrastive learning. We also investigate when such a latent space can be identified up to mixings within each component.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
Unsupervised Learning of Equivariant Structure from Sequences
Authors:
Takeru Miyato,
Masanori Koyama,
Kenji Fukumizu
Abstract:
In this study, we present meta-sequential prediction (MSP), an unsupervised framework to learn the symmetry from the time sequence of length at least three. Our method leverages the stationary property (e.g. constant velocity, constant acceleration) of the time sequence to learn the underlying equivariant structure of the dataset by simply training the encoder-decoder model to be able to predict t…
▽ More
In this study, we present meta-sequential prediction (MSP), an unsupervised framework to learn the symmetry from the time sequence of length at least three. Our method leverages the stationary property (e.g. constant velocity, constant acceleration) of the time sequence to learn the underlying equivariant structure of the dataset by simply training the encoder-decoder model to be able to predict the future observations. We will demonstrate that, with our framework, the hidden disentangled structure of the dataset naturally emerges as a by-product by applying simultaneous block-diagonalization to the transition operators in the latent space, the procedure which is commonly used in representation theory to decompose the feature-space based on the type of response to group actions. We will showcase our method from both empirical and theoretical perspectives. Our result suggests that finding a simple structured relation and learning a model with extrapolation capability are two sides of the same coin. The code is available at https://github.com/takerum/meta_sequential_prediction.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Cyber Catalysis: N$_2$ Dissociation over Ruthenium Catalyst with Strong Metal-Support Interaction
Authors:
Gerardo Valadez Huerta,
Kaoru Hisama,
Katsutoshi Sato,
Katsutoshi Nagaoka,
Michihisa Koyama
Abstract:
Catalysis informatics is constantly develo**, and significant advances in data mining, molecular simulation, and automation for computational design and high-throughput experimentation have been achieved. However, efforts to reveal the mechanisms of complex supported nanoparticle catalysts in cyberspace have proven to be unsuccessful thus far. This study fills this gap by exploring N$_2$ dissoci…
▽ More
Catalysis informatics is constantly develo**, and significant advances in data mining, molecular simulation, and automation for computational design and high-throughput experimentation have been achieved. However, efforts to reveal the mechanisms of complex supported nanoparticle catalysts in cyberspace have proven to be unsuccessful thus far. This study fills this gap by exploring N$_2$ dissociation on a supported Ru nanoparticle as an example using a universal neural network potential. We calculated 200 catalyst configurations considering the reduction of the support and strong metal-support interaction (SMSI), eventually performing 15,600 calculations for various N$_2$ adsorption states. After successfully validating our results with experimental IR spectral data, we clarified key N$_2$ dissociation pathways behind the high activity of the SMSI surface and disclosed the maximum activity of catalysts reduced at 650 °C. Our method is well applicable to other complex systems, and we believe it represents a key first step toward the digital transformation of investigations on heterogeneous catalysis.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
Contrastive Representation Learning with Trainable Augmentation Channel
Authors:
Masanori Koyama,
Kentaro Minami,
Takeru Miyato,
Yarin Gal
Abstract:
In contrastive representation learning, data representation is trained so that it can classify the image instances even when the images are altered by augmentations. However, depending on the datasets, some augmentations can damage the information of the images beyond recognition, and such augmentations can result in collapsed representations. We present a partial solution to this problem by forma…
▽ More
In contrastive representation learning, data representation is trained so that it can classify the image instances even when the images are altered by augmentations. However, depending on the datasets, some augmentations can damage the information of the images beyond recognition, and such augmentations can result in collapsed representations. We present a partial solution to this problem by formalizing a stochastic encoding process in which there exist a tug-of-war between the data corruption introduced by the augmentations and the information preserved by the encoder. We show that, with the infoMax objective based on this framework, we can learn a data-dependent distribution of augmentations to avoid the collapse of the representation.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Calculations of Real-System Nanoparticles Using Universal Neural Network Potential PFP
Authors:
Gerardo Valadez Huerta,
Yusuke Nanba,
Iori Kurata,
Kosuke Nakago,
So Takamoto,
Chikashi Shinagawa,
Michihisa Koyama
Abstract:
It is essential to explore the stability and activity of real-system nanoparticles theoretically. While applications of theoretical methods for this purpose can be found in literature, the expensive computational costs of conventional theoretical methods hinder their massive applications to practical materials design. With the recent development of neural network algorithms along with the advancem…
▽ More
It is essential to explore the stability and activity of real-system nanoparticles theoretically. While applications of theoretical methods for this purpose can be found in literature, the expensive computational costs of conventional theoretical methods hinder their massive applications to practical materials design. With the recent development of neural network algorithms along with the advancement of computer systems, neural network potentials have emerged as a promising candidate for the description of a wide range of materials, including metals and molecules, with a reasonable computational time. In this study, we successfully validate a universal neural network potential, PFP, for the description of monometallic Ru nanoparticles, PdRuCu ternary alloy nanoparticles, and the NO adsorption on Rh nanoparticles against first-principles calculations. We further conduct molecular dynamics simulations on the NO-Rh system and challenge the PFP to describe a large, supported Pt nanoparticle system.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
When is invariance useful in an Out-of-Distribution Generalization problem ?
Authors:
Masanori Koyama,
Shoichiro Yamaguchi
Abstract:
The goal of Out-of-Distribution (OOD) generalization problem is to train a predictor that generalizes on all environments. Popular approaches in this field use the hypothesis that such a predictor shall be an \textit{invariant predictor} that captures the mechanism that remains constant across environments. While these approaches have been experimentally successful in various case studies, there i…
▽ More
The goal of Out-of-Distribution (OOD) generalization problem is to train a predictor that generalizes on all environments. Popular approaches in this field use the hypothesis that such a predictor shall be an \textit{invariant predictor} that captures the mechanism that remains constant across environments. While these approaches have been experimentally successful in various case studies, there is still much room for the theoretical validation of this hypothesis. This paper presents a new set of theoretical conditions necessary for an invariant predictor to achieve the OOD optimality. Our theory not only applies to non-linear cases, but also generalizes the necessary condition used in \citet{rojas2018invariant}. We also derive Inter Gradient Alignment algorithm from our theory and demonstrate its competitiveness on MNIST-derived benchmark datasets as well as on two of the three \textit{Invariance Unit Tests} proposed by \citet{aubinlinear}.
△ Less
Submitted 25 November, 2021; v1 submitted 4 August, 2020;
originally announced August 2020.
-
Learning Structured Latent Factors from Dependent Data:A Generative Model Framework from Information-Theoretic Perspective
Authors:
Ruixiang Zhang,
Masanori Koyama,
Katsuhiko Ishiguro
Abstract:
Learning controllable and generalizable representation of multivariate data with desired structural properties remains a fundamental problem in machine learning. In this paper, we present a novel framework for learning generative models with various underlying structures in the latent space. We represent the inductive bias in the form of mask variables to model the dependency structure in the grap…
▽ More
Learning controllable and generalizable representation of multivariate data with desired structural properties remains a fundamental problem in machine learning. In this paper, we present a novel framework for learning generative models with various underlying structures in the latent space. We represent the inductive bias in the form of mask variables to model the dependency structure in the graphical model and extend the theory of multivariate information bottleneck to enforce it. Our model provides a principled approach to learn a set of semantically meaningful latent factors that reflect various types of desired structures like capturing correlation or encoding invariance, while also offering the flexibility to automatically estimate the dependency structure from data. We show that our framework unifies many existing generative models and can be applied to a variety of tasks including multi-modal data modeling, algorithmic fairness, and invariant risk minimization.
△ Less
Submitted 2 October, 2020; v1 submitted 21 July, 2020;
originally announced July 2020.
-
First-principles study of the stability of bimetallic PdPt nanoparticles under a finite temperature
Authors:
Takayoshi Ishimoto,
Michihisa Koyama
Abstract:
To understand the vibrational and configurational entropy effects for the stability of core-shell and solid-solution bimetallic nanoparticles, we theoretically investigated the excess energy of PdPt nanoparticles, adopting the (PdPt)201 model of ca. 2 nm by using the density functional method. The vibrational energy and entropy terms contributed to the total energy of both core-shell and solid-sol…
▽ More
To understand the vibrational and configurational entropy effects for the stability of core-shell and solid-solution bimetallic nanoparticles, we theoretically investigated the excess energy of PdPt nanoparticles, adopting the (PdPt)201 model of ca. 2 nm by using the density functional method. The vibrational energy and entropy terms contributed to the total energy of both core-shell and solid-solution nanoparticles. The configurational entropy term was defined only for the solid-solution nanoparticles. Although the absolute values of vibrational energy and entropy terms were much larger than that of configurational entropy term, their contributions were limited in the form of excess free energy due to the small difference between different atomic configurations. The large contribution of configurational entropy term to the excess free energy was clearly confirmed from our first-principles calculations. To estimate the stability of core-shell and solid-solution metal nanoparticles based on the excess energy, the configurational entropy term was the dominant factor.
△ Less
Submitted 12 July, 2020;
originally announced July 2020.
-
Meta Learning as Bayes Risk Minimization
Authors:
Shin-ichi Maeda,
Toshiki Nakanishi,
Masanori Koyama
Abstract:
Meta-Learning is a family of methods that use a set of interrelated tasks to learn a model that can quickly learn a new query task from a possibly small contextual dataset. In this study, we use a probabilistic framework to formalize what it means for two tasks to be related and reframe the meta-learning problem into the problem of Bayesian risk minimization (BRM). In our formulation, the BRM opti…
▽ More
Meta-Learning is a family of methods that use a set of interrelated tasks to learn a model that can quickly learn a new query task from a possibly small contextual dataset. In this study, we use a probabilistic framework to formalize what it means for two tasks to be related and reframe the meta-learning problem into the problem of Bayesian risk minimization (BRM). In our formulation, the BRM optimal solution is given by the predictive distribution computed from the posterior distribution of the task-specific latent variable conditioned on the contextual dataset, and this justifies the philosophy of Neural Process. However, the posterior distribution in Neural Process violates the way the posterior distribution changes with the contextual dataset. To address this problem, we present a novel Gaussian approximation for the posterior distribution that generalizes the posterior of the linear Gaussian model. Unlike that of the Neural Process, our approximation of the posterior distributions converges to the maximum likelihood estimate with the same rate as the true posterior distribution. We also demonstrate the competitiveness of our approach on benchmark datasets.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
Computational Bounds for Doing Harmonic Analysis on Permutation Modules of Finite Groups
Authors:
Michael Hansen,
Masanori Koyama,
Matthew B. A. McDermott,
Michael E. Orrison,
Sarah Wolff
Abstract:
We develop an approach to finding upper bounds for the number of arithmetic operations necessary for doing harmonic analysis on permutation modules of finite groups. The approach takes advantage of the intrinsic orbital structure of permutation modules, and it uses the multiplicities of irreducible submodules within individual orbital spaces to express the resulting computational bounds. We conclu…
▽ More
We develop an approach to finding upper bounds for the number of arithmetic operations necessary for doing harmonic analysis on permutation modules of finite groups. The approach takes advantage of the intrinsic orbital structure of permutation modules, and it uses the multiplicities of irreducible submodules within individual orbital spaces to express the resulting computational bounds. We conclude by showing that these bounds are surprisingly small when dealing with certain permutation modules arising from the action of the symmetric group on tabloids.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Reconnaissance and Planning algorithm for constrained MDP
Authors:
Shin-ichi Maeda,
Hayato Watahiki,
Shintarou Okada,
Masanori Koyama
Abstract:
Practical reinforcement learning problems are often formulated as constrained Markov decision process (CMDP) problems, in which the agent has to maximize the expected return while satisfying a set of prescribed safety constraints. In this study, we propose a novel simulator-based method to approximately solve a CMDP problem without making any compromise on the safety constraints. We achieve this b…
▽ More
Practical reinforcement learning problems are often formulated as constrained Markov decision process (CMDP) problems, in which the agent has to maximize the expected return while satisfying a set of prescribed safety constraints. In this study, we propose a novel simulator-based method to approximately solve a CMDP problem without making any compromise on the safety constraints. We achieve this by decomposing the CMDP into a pair of MDPs; reconnaissance MDP and planning MDP. The purpose of reconnaissance MDP is to evaluate the set of actions that are safe, and the purpose of planning MDP is to maximize the return while using the actions authorized by reconnaissance MDP. RMDP can define a set of safe policies for any given set of safety constraint, and this set of safe policies can be used to solve another CMDP problem with different reward. Our method is not only computationally less demanding than the previous simulator-based approaches to CMDP, but also capable of finding a competitive reward-seeking policy in a high dimensional environment, including those involving multiple moving obstacles.
△ Less
Submitted 20 September, 2019;
originally announced September 2019.
-
Optuna: A Next-generation Hyperparameter Optimization Framework
Authors:
Takuya Akiba,
Shotaro Sano,
Toshihiko Yanase,
Takeru Ohta,
Masanori Koyama
Abstract:
The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API that allows users to construct the parameter search space dynamically, (2) efficient implementation of both searching and pruning strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purpo…
▽ More
The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API that allows users to construct the parameter search space dynamically, (2) efficient implementation of both searching and pruning strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purposes, ranging from scalable distributed computing to light-weight experiment conducted via interactive interface. In order to prove our point, we will introduce Optuna, an optimization software which is a culmination of our effort in the development of a next generation optimization software. As an optimization software designed with define-by-run principle, Optuna is particularly the first of its kind. We will present the design-techniques that became necessary in the development of the software that meets the above criteria, and demonstrate the power of our new design through experimental results and real world applications. Our software is available under the MIT license (https://github.com/pfnet/optuna/).
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Robustness to Adversarial Perturbations in Learning from Incomplete Data
Authors:
Amir Najafi,
Shin-ichi Maeda,
Masanori Koyama,
Takeru Miyato
Abstract:
What is the role of unlabeled data in an inference problem, when the presumed underlying distribution is adversarially perturbed? To provide a concrete answer to this question, this paper unifies two major learning frameworks: Semi-Supervised Learning (SSL) and Distributionally Robust Learning (DRL). We develop a generalization theory for our framework based on a number of novel complexity measure…
▽ More
What is the role of unlabeled data in an inference problem, when the presumed underlying distribution is adversarially perturbed? To provide a concrete answer to this question, this paper unifies two major learning frameworks: Semi-Supervised Learning (SSL) and Distributionally Robust Learning (DRL). We develop a generalization theory for our framework based on a number of novel complexity measures, such as an adversarial extension of Rademacher complexity and its semi-supervised analogue. Moreover, our analysis is able to quantify the role of unlabeled data in the generalization under a more general condition compared to the existing theoretical works in SSL. Based on our framework, we also present a hybrid of DRL and EM algorithms that has a guaranteed convergence rate. When implemented with deep neural networks, our method shows a comparable performance to those of the state-of-the-art on a number of real-world benchmark datasets.
△ Less
Submitted 24 May, 2019;
originally announced May 2019.
-
A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation
Authors:
Mitsuru Kusumoto,
Takuya Inoue,
Gentaro Watanabe,
Takuya Akiba,
Masanori Koyama
Abstract:
Recomputation algorithms collectively refer to a family of methods that aims to reduce the memory consumption of the backpropagation by selectively discarding the intermediate results of the forward propagation and recomputing the discarded results as needed. In this paper, we will propose a novel and efficient recomputation method that can be applied to a wider range of neural nets than previous…
▽ More
Recomputation algorithms collectively refer to a family of methods that aims to reduce the memory consumption of the backpropagation by selectively discarding the intermediate results of the forward propagation and recomputing the discarded results as needed. In this paper, we will propose a novel and efficient recomputation method that can be applied to a wider range of neural nets than previous methods. We use the language of graph theory to formalize the general recomputation problem of minimizing the computational overhead under a fixed memory budget constraint, and provide a dynamic programming solution to the problem. Our method can reduce the peak memory consumption on various benchmark networks by 36%~81%, which outperforms the reduction achieved by other methods.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning
Authors:
Yoshihiro Nagano,
Shoichiro Yamaguchi,
Yasuhiro Fujita,
Masanori Koyama
Abstract:
Hyperbolic space is a geometry that is known to be well-suited for representation learning of data with an underlying hierarchical structure. In this paper, we present a novel hyperbolic distribution called \textit{pseudo-hyperbolic Gaussian}, a Gaussian-like distribution on hyperbolic space whose density can be evaluated analytically and differentiated with respect to the parameters. Our distribu…
▽ More
Hyperbolic space is a geometry that is known to be well-suited for representation learning of data with an underlying hierarchical structure. In this paper, we present a novel hyperbolic distribution called \textit{pseudo-hyperbolic Gaussian}, a Gaussian-like distribution on hyperbolic space whose density can be evaluated analytically and differentiated with respect to the parameters. Our distribution enables the gradient-based learning of the probabilistic models on hyperbolic space that could never have been considered before. Also, we can sample from this hyperbolic probability distribution without resorting to auxiliary means like rejection sampling. As applications of our distribution, we develop a hyperbolic-analog of variational autoencoder and a method of probabilistic word embedding on hyperbolic space. We demonstrate the efficacy of our distribution on various datasets including MNIST, Atari 2600 Breakout, and WordNet.
△ Less
Submitted 9 May, 2019; v1 submitted 8 February, 2019;
originally announced February 2019.
-
Graph Warp Module: an Auxiliary Module for Boosting the Power of Graph Neural Networks in Molecular Graph Analysis
Authors:
Katsuhiko Ishiguro,
Shin-ichi Maeda,
Masanori Koyama
Abstract:
Graph Neural Network (GNN) is a popular architecture for the analysis of chemical molecules, and it has numerous applications in material and medicinal science. Current lines of GNNs developed for molecular analysis, however, do not fit well on the training set, and their performance does not scale well with the complexity of the network. In this paper, we propose an auxiliary module to be attache…
▽ More
Graph Neural Network (GNN) is a popular architecture for the analysis of chemical molecules, and it has numerous applications in material and medicinal science. Current lines of GNNs developed for molecular analysis, however, do not fit well on the training set, and their performance does not scale well with the complexity of the network. In this paper, we propose an auxiliary module to be attached to a GNN that can boost the representation power of the model without hindering with the original GNN architecture. Our auxiliary module can be attached to a wide variety of GNNs, including those that are used commonly in biochemical applications. With our auxiliary architecture, the performances of many GNNs used in practice improve more consistently, achieving the state-of-the-art performance on popular molecular graph datasets.
△ Less
Submitted 24 May, 2019; v1 submitted 3 February, 2019;
originally announced February 2019.
-
Spatially Controllable Image Synthesis with Internal Representation Collaging
Authors:
Ryohei Suzuki,
Masanori Koyama,
Takeru Miyato,
Taizan Yonetsuji,
Huachun Zhu
Abstract:
We present a novel CNN-based image editing strategy that allows the user to change the semantic information of an image over an arbitrary region by manipulating the feature-space representation of the image in a trained GAN model. We will present two variants of our strategy: (1) spatial conditional batch normalization (sCBN), a type of conditional batch normalization with user-specifiable spatial…
▽ More
We present a novel CNN-based image editing strategy that allows the user to change the semantic information of an image over an arbitrary region by manipulating the feature-space representation of the image in a trained GAN model. We will present two variants of our strategy: (1) spatial conditional batch normalization (sCBN), a type of conditional batch normalization with user-specifiable spatial weight maps, and (2) feature-blending, a method of directly modifying the intermediate features. Our methods can be used to edit both artificial image and real image, and they both can be used together with any GAN with conditional normalization layers. We will demonstrate the power of our method through experiments on various types of GANs trained on different datasets. Code will be available at https://github.com/pfnet-research/neural-collage.
△ Less
Submitted 9 April, 2019; v1 submitted 25 November, 2018;
originally announced November 2018.
-
Train Sparsely, Generate Densely: Memory-efficient Unsupervised Training of High-resolution Temporal GAN
Authors:
Masaki Saito,
Shunta Saito,
Masanori Koyama,
Sosuke Kobayashi
Abstract:
Training of Generative Adversarial Network (GAN) on a video dataset is a challenge because of the sheer size of the dataset and the complexity of each observation. In general, the computational cost of training GAN scales exponentially with the resolution. In this study, we present a novel memory efficient method of unsupervised learning of high-resolution video dataset whose computational cost sc…
▽ More
Training of Generative Adversarial Network (GAN) on a video dataset is a challenge because of the sheer size of the dataset and the complexity of each observation. In general, the computational cost of training GAN scales exponentially with the resolution. In this study, we present a novel memory efficient method of unsupervised learning of high-resolution video dataset whose computational cost scales only linearly with the resolution. We achieve this by designing the generator model as a stack of small sub-generators and training the model in a specific way. We train each sub-generator with its own specific discriminator. At the time of the training, we introduce between each pair of consecutive sub-generators an auxiliary subsampling layer that reduces the frame-rate by a certain ratio. This procedure can allow each sub-generator to learn the distribution of the video at different levels of resolution. We also need only a few GPUs to train a highly complex generator that far outperforms the predecessor in terms of inception scores.
△ Less
Submitted 1 June, 2020; v1 submitted 22 November, 2018;
originally announced November 2018.
-
Spectral Normalization for Generative Adversarial Networks
Authors:
Takeru Miyato,
Toshiki Kataoka,
Masanori Koyama,
Yuichi Yoshida
Abstract:
One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral norm…
▽ More
One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral normalization on CIFAR10, STL-10, and ILSVRC2012 dataset, and we experimentally confirmed that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques.
△ Less
Submitted 16 February, 2018;
originally announced February 2018.
-
cGANs with Projection Discriminator
Authors:
Takeru Miyato,
Masanori Koyama
Abstract:
We propose a novel, projection based way to incorporate the conditional information into the discriminator of GANs that respects the role of the conditional information in the underlining probabilistic model. This approach is in contrast with most frameworks of conditional GANs used in application today, which use the conditional information by concatenating the (embedded) conditional vector to th…
▽ More
We propose a novel, projection based way to incorporate the conditional information into the discriminator of GANs that respects the role of the conditional information in the underlining probabilistic model. This approach is in contrast with most frameworks of conditional GANs used in application today, which use the conditional information by concatenating the (embedded) conditional vector to the feature vectors. With this modification, we were able to significantly improve the quality of the class conditional image generation on ILSVRC2012 (ImageNet) 1000-class image dataset from the current state-of-the-art result, and we achieved this with a single pair of a discriminator and a generator. We were also able to extend the application to super-resolution and succeeded in producing highly discriminative super-resolution images. This new structure also enabled high quality category transformation based on parametric functional transformation of conditional batch normalization layers in the generator.
△ Less
Submitted 14 August, 2018; v1 submitted 15 February, 2018;
originally announced February 2018.
-
SPICE Simulation of tunnel FET aiming at 32 kHz crystal-oscillator operation
Authors:
Tetsufumi Tanamoto,
Chika Tanaka,
Satoshi Takaya,
Masato Koyama
Abstract:
We numerically investigate the possibility of using Tunnel field-effect transistor (TFET) in a 32 kHz crystal oscillator circuit to reduce power consumption. A simulation using SPICE (Simulation Program with Integrated Circuit Emphasis) is carried out based on a conventional CMOS transistor model. It is shown that the power consumption of TFET is one-tenth that of conventional low-power CMOS.
We numerically investigate the possibility of using Tunnel field-effect transistor (TFET) in a 32 kHz crystal oscillator circuit to reduce power consumption. A simulation using SPICE (Simulation Program with Integrated Circuit Emphasis) is carried out based on a conventional CMOS transistor model. It is shown that the power consumption of TFET is one-tenth that of conventional low-power CMOS.
△ Less
Submitted 19 October, 2017;
originally announced October 2017.
-
Non-explosivity of stochastically modeled reaction networks that are complex balanced
Authors:
David F. Anderson,
Daniele Cappelletti,
Masanori Koyama,
Thomas G. Kurtz
Abstract:
We consider stochastically modeled reaction networks and prove that if a constant solution to the Kolmogorov forward equation decays fast enough relatively to the transition rates, then the model is non-explosive. In particular, complex balanced reaction networks are non-explosive.
We consider stochastically modeled reaction networks and prove that if a constant solution to the Kolmogorov forward equation decays fast enough relatively to the transition rates, then the model is non-explosive. In particular, complex balanced reaction networks are non-explosive.
△ Less
Submitted 18 May, 2018; v1 submitted 30 August, 2017;
originally announced August 2017.
-
Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning
Authors:
Takeru Miyato,
Shin-ichi Maeda,
Masanori Koyama,
Shin Ishii
Abstract:
We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input. Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation. Unlike adversarial training, our method defines the adversarial direction without label info…
▽ More
We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input. Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation. Unlike adversarial training, our method defines the adversarial direction without label information and is hence applicable to semi-supervised learning. Because the directions in which we smooth the model are only "virtually" adversarial, we call our method virtual adversarial training (VAT). The computational cost of VAT is relatively low. For neural networks, the approximated gradient of virtual adversarial loss can be computed with no more than two pairs of forward- and back-propagations. In our experiments, we applied VAT to supervised and semi-supervised learning tasks on multiple benchmark datasets. With a simple enhancement of the algorithm based on the entropy minimization principle, our VAT achieves state-of-the-art performance for semi-supervised learning tasks on SVHN and CIFAR-10.
△ Less
Submitted 27 June, 2018; v1 submitted 12 April, 2017;
originally announced April 2017.
-
Counting curves on surfaces
Authors:
Norman Do,
Musashi A. Koyama,
Daniel V. Mathews
Abstract:
In this paper we consider an elementary, and largely unexplored, combinatorial problem in low-dimensional topology. Consider a real 2-dimensional compact surface $S$, and fix a number of points $F$ on its boundary. We ask: how many configurations of disjoint arcs are there on $S$ whose boundary is $F$?
We find that this enumerative problem, counting curves on surfaces, has a rich structure. For…
▽ More
In this paper we consider an elementary, and largely unexplored, combinatorial problem in low-dimensional topology. Consider a real 2-dimensional compact surface $S$, and fix a number of points $F$ on its boundary. We ask: how many configurations of disjoint arcs are there on $S$ whose boundary is $F$?
We find that this enumerative problem, counting curves on surfaces, has a rich structure. For instance, we show that the curve counts obey an effective recursion, in the general framework of topological recursion. Moreover, they exhibit quasi-polynomial behaviour.
This "elementary curve-counting" is in fact related to a more advanced notion of "curve-counting" from algebraic geometry or symplectic geometry. The asymptotics of this enumerative problem are closely related to the asymptotics of volumes of moduli spaces of curves, and the quasi-polynomials governing the enumerative problem encode intersection numbers on moduli spaces.
Furthermore, among several other results, we show that generating functions and differential forms for these curve counts exhibit structure that is reminiscent of the mathematical physics of free energies, partition functions, topological recursion, and quantum curves.
△ Less
Submitted 28 January, 2016; v1 submitted 30 December, 2015;
originally announced December 2015.
-
Distributional Smoothing with Virtual Adversarial Training
Authors:
Takeru Miyato,
Shin-ichi Maeda,
Masanori Koyama,
Ken Nakae,
Shin Ishii
Abstract:
We propose local distributional smoothness (LDS), a new notion of smoothness for statistical model that can be used as a regularization term to promote the smoothness of the model distribution. We named the LDS based regularization as virtual adversarial training (VAT). The LDS of a model at an input datapoint is defined as the KL-divergence based robustness of the model distribution against local…
▽ More
We propose local distributional smoothness (LDS), a new notion of smoothness for statistical model that can be used as a regularization term to promote the smoothness of the model distribution. We named the LDS based regularization as virtual adversarial training (VAT). The LDS of a model at an input datapoint is defined as the KL-divergence based robustness of the model distribution against local perturbation around the datapoint. VAT resembles adversarial training, but distinguishes itself in that it determines the adversarial direction from the model distribution alone without using the label information, making it applicable to semi-supervised learning. The computational cost for VAT is relatively low. For neural network, the approximated gradient of the LDS can be computed with no more than three pairs of forward and back propagations. When we applied our technique to supervised and semi-supervised learning for the MNIST dataset, it outperformed all the training methods other than the current state of the art method, which is based on a highly advanced generative model. We also applied our method to SVHN and NORB, and confirmed our method's superior performance over the current state of the art semi-supervised method applied to these datasets.
△ Less
Submitted 11 June, 2016; v1 submitted 2 July, 2015;
originally announced July 2015.
-
Size distribution of islands according to 2D growth model with 2 kinds of diffusion atoms
Authors:
R. Yamauchi,
X. M. Lu,
M. Koyama,
H. Sasakura,
Y. Nakata,
S. Muto
Abstract:
We simulated the growth of 2D islands with 2 kinds of diffusion atoms using the kinetic Monte- Carlo (kMC) method. As a result, we found that the slow atoms tend to create nuclei and determine the island volume distribution, along with additional properties such as island density. We also conducted a theoretical analysis using the rate equation of the point-island model to confirm these results.
We simulated the growth of 2D islands with 2 kinds of diffusion atoms using the kinetic Monte- Carlo (kMC) method. As a result, we found that the slow atoms tend to create nuclei and determine the island volume distribution, along with additional properties such as island density. We also conducted a theoretical analysis using the rate equation of the point-island model to confirm these results.
△ Less
Submitted 15 March, 2015;
originally announced March 2015.
-
Deep learning of fMRI big data: a novel approach to subject-transfer decoding
Authors:
Sotetsu Koyamada,
Yumi Shikauchi,
Ken Nakae,
Masanori Koyama,
Shin Ishii
Abstract:
As a technology to read brain states from measurable brain activities, brain decoding are widely applied in industries and medical sciences. In spite of high demands in these applications for a universal decoder that can be applied to all individuals simultaneously, large variation in brain activities across individuals has limited the scope of many studies to the development of individual-specifi…
▽ More
As a technology to read brain states from measurable brain activities, brain decoding are widely applied in industries and medical sciences. In spite of high demands in these applications for a universal decoder that can be applied to all individuals simultaneously, large variation in brain activities across individuals has limited the scope of many studies to the development of individual-specific decoders. In this study, we used deep neural network (DNN), a nonlinear hierarchical model, to construct a subject-transfer decoder. Our decoder is the first successful DNN-based subject-transfer decoder. When applied to a large-scale functional magnetic resonance imaging (fMRI) database, our DNN-based decoder achieved higher decoding accuracy than other baseline methods, including support vector machine (SVM). In order to analyze the knowledge acquired by this decoder, we applied principal sensitivity analysis (PSA) to the decoder and visualized the discriminative features that are common to all subjects in the dataset. Our PSA successfully visualized the subject-independent features contributing to the subject-transferability of the trained decoder.
△ Less
Submitted 31 January, 2015;
originally announced February 2015.
-
Principal Sensitivity Analysis
Authors:
Sotetsu Koyamada,
Masanori Koyama,
Ken Nakae,
Shin Ishii
Abstract:
We present a novel algorithm (Principal Sensitivity Analysis; PSA) to analyze the knowledge of the classifier obtained from supervised machine learning techniques. In particular, we define principal sensitivity map (PSM) as the direction on the input space to which the trained classifier is most sensitive, and use analogously defined k-th PSM to define a basis for the input space. We train neural…
▽ More
We present a novel algorithm (Principal Sensitivity Analysis; PSA) to analyze the knowledge of the classifier obtained from supervised machine learning techniques. In particular, we define principal sensitivity map (PSM) as the direction on the input space to which the trained classifier is most sensitive, and use analogously defined k-th PSM to define a basis for the input space. We train neural networks with artificial data and real data, and apply the algorithm to the obtained supervised classifiers. We then visualize the PSMs to demonstrate the PSA's ability to decompose the knowledge acquired by the trained classifiers.
△ Less
Submitted 11 March, 2015; v1 submitted 21 December, 2014;
originally announced December 2014.
-
An asymptotic relationship between coupling methods for stochastically modeled population processes
Authors:
David F. Anderson,
Masanori Koyama
Abstract:
This paper is concerned with elucidating a relationship between two common coupling methods for the continuous time Markov chain models utilized in the cell biology literature. The couplings considered here are primarily used in a computational framework by providing reductions in variance for different Monte Carlo estimators, thereby allowing for significantly more accurate results for a fixed am…
▽ More
This paper is concerned with elucidating a relationship between two common coupling methods for the continuous time Markov chain models utilized in the cell biology literature. The couplings considered here are primarily used in a computational framework by providing reductions in variance for different Monte Carlo estimators, thereby allowing for significantly more accurate results for a fixed amount of computational time. Common applications of the couplings include the estimation of parametric sensitivities via finite difference methods and the estimation of expectations via multi-level Monte Carlo algorithms. While a number of coupling strategies have been proposed for the models considered here, and a number of articles have experimentally compared the different strategies, to date there has been no mathematical analysis describing the connections between them. Such analyses are critical in order to determine the best use for each. In the current paper, we show a connection between the common reaction path (CRP) method and the split coupling (SC) method, which is termed coupled finite differences (CFD) in the parametric sensitivities literature. In particular, we show that the two couplings are both limits of a third coupling strategy we call the "local-CRP" coupling, with the split coupling method arising as a key parameter goes to infinity, and the common reaction path coupling arising as the same parameter goes to zero. The analysis helps explain why the split coupling method often provides a lower variance than does the common reaction path method, a fact previously shown experimentally.
△ Less
Submitted 1 August, 2014; v1 submitted 12 March, 2014;
originally announced March 2014.
-
Realization of Carrier Tunneling from InAlAs Quantum Dots to AlAs
Authors:
Masataka Koyama,
Dai Suzuki,
Xiangmeng Lu,
Yoshiaki Nakata,
Shunichi Muto
Abstract:
With the aim of improving solar cell efficiency, a structure for realizing electron tunneling from In0.6Al0.4As quantum dots (QDs) through an Al0.4Ga0.6As barrier to AlAs has been grown using molecular beam epitaxy. The photoluminescence decay time decreased from 1.1 ns to 390 ps as the barrier thickness decreased from 4 to 2 nm, which indicates that the photo-excited carriers tunneled from the QD…
▽ More
With the aim of improving solar cell efficiency, a structure for realizing electron tunneling from In0.6Al0.4As quantum dots (QDs) through an Al0.4Ga0.6As barrier to AlAs has been grown using molecular beam epitaxy. The photoluminescence decay time decreased from 1.1 ns to 390 ps as the barrier thickness decreased from 4 to 2 nm, which indicates that the photo-excited carriers tunneled from the QDs to the AlAs X energy level for a barrier thickness 2 nm in 0.6 ns, which is significantly longer than the tunneling time of GaAs and InAlAs quantum wells. We expect that this structure will assist in develo** high-efficiency QD sensitized solar cells.
△ Less
Submitted 21 March, 2013;
originally announced March 2013.
-
Size Distribution and Its Scaling Behavior of InAlAs/AlGaAs Quantum Dots Grown on GaAs by Molecular Beam Epitaxy
Authors:
X. M. Lu,
M. Koyama,
Y. Izumi,
Y. Nakata,
S. Adachi,
S. Muto
Abstract:
We studied the size distribution and its scaling behavior of self-assembled InAlAs/AlGaAs quantum dots (QDs) grown on GaAs with the Stranski-Krastanov (SK) mode by molecular beam epitaxy (MBE), at both 480°C and 510°C, as a function of InAlAs coverage. A scaling function of the volume was found for the first time in ternary alloy QDs. The function was similar to that of InAs/GaAs QDs, which agreed…
▽ More
We studied the size distribution and its scaling behavior of self-assembled InAlAs/AlGaAs quantum dots (QDs) grown on GaAs with the Stranski-Krastanov (SK) mode by molecular beam epitaxy (MBE), at both 480°C and 510°C, as a function of InAlAs coverage. A scaling function of the volume was found for the first time in ternary alloy QDs. The function was similar to that of InAs/GaAs QDs, which agreed with the scaling function for the two-dimensional submonolayer homoepitaxy simulation with a critical island size of i = 1. However, a character of i = 0 was also found as a tail in the large volume.
△ Less
Submitted 1 August, 2012;
originally announced August 2012.
-
Weak error analysis of numerical methods for stochastic models of population processes
Authors:
David F. Anderson,
Masanori Koyama
Abstract:
The simplest, and most common, stochastic model for population processes, including those from biochemistry and cell biology, are continuous time Markov chains. Simulation of such models is often relatively straightforward as there are easily implementable methods for the generation of exact sample paths. However, when using ensemble averages to approximate expected values, the computational compl…
▽ More
The simplest, and most common, stochastic model for population processes, including those from biochemistry and cell biology, are continuous time Markov chains. Simulation of such models is often relatively straightforward as there are easily implementable methods for the generation of exact sample paths. However, when using ensemble averages to approximate expected values, the computational complexity can become prohibitive as the number of computations per path scales linearly with the number of jumps of the process. When such methods become computationally intractable, approximate methods, which introduce a bias, can become advantageous. In this paper, we provide a general framework for understanding the weak error, or bias, induced by different numerical approximation techniques in the current setting. The analysis takes into account both the natural scalings within a given system and the step-size of the numerical method. Examples are provided to demonstrate the main analytical results as well as the reduction in computational complexity achieved by the approximate methods.
△ Less
Submitted 29 February, 2012; v1 submitted 14 February, 2011;
originally announced February 2011.
-
Diffusion and activation of n-type dopants in germanium
Authors:
Masahiro Koike,
Yoshiki Kamata,
Tsunehiro Ino,
Daisuke Hagishima,
Kosuke Tatsumura,
Masato Koyama,
Akira Nishiyama
Abstract:
The diffusion and activation of $n$-type impurities (P and As) implanted into $p$-type Ge(100) substrates were examined under various dose and annealing conditions. The secondary ion mass spectrometry profiles of chemical concentrations indicated the existence of a sufficiently high number of impurities with increasing implanted doses. However, spreading resistance probe profiles of electrical c…
▽ More
The diffusion and activation of $n$-type impurities (P and As) implanted into $p$-type Ge(100) substrates were examined under various dose and annealing conditions. The secondary ion mass spectrometry profiles of chemical concentrations indicated the existence of a sufficiently high number of impurities with increasing implanted doses. However, spreading resistance probe profiles of electrical concentrations showed electrical concentration saturation in spite of increasing doses and indicated poor activation of As relative to P in Ge. The relationships between the chemical and electrical concentrations of P in Ge and Si were calculated, taking into account the effect of incomplete ionization. The results indicated that the activation of P was almost the same in Ge and Si. The activation ratios obtained experimentally were similar to the calculated values, implying insufficient degeneration of Ge. The profiles of P in Ge substrates with and without damage generated by Ge ion implantation were compared, and it was clarified that the damage that may compensate the activated $n$-type dopants has no relationship with the activation of P in Ge.
△ Less
Submitted 31 July, 2008; v1 submitted 27 March, 2007;
originally announced March 2007.
-
Dielectric Properties of Noncrystalline HfSiON
Authors:
Masahiro Koike,
Tsunehiro Ino,
Yuuichi Kamimuta,
Masato Koyama,
Yoshiki Kamata,
Masamichi Suzuki,
Yuichiro Mitani,
Akira Nishiyama
Abstract:
The dielectric properties of noncrystalline hafnium silicon oxynitride (HfSiON) films with a variety of atomic compositions were investigated. The films were deposited by reactive sputtering of Hf and Si in an O, N, and Ar mixture ambient. The bonding states, band-gap energies, atomic compositions, and crystallinities were confirmed by X-ray photoelectron spectroscopy (XPS), reflection electron…
▽ More
The dielectric properties of noncrystalline hafnium silicon oxynitride (HfSiON) films with a variety of atomic compositions were investigated. The films were deposited by reactive sputtering of Hf and Si in an O, N, and Ar mixture ambient. The bonding states, band-gap energies, atomic compositions, and crystallinities were confirmed by X-ray photoelectron spectroscopy (XPS), reflection electron energy loss spectroscopy (REELS), Rutherford backscattering spectrometry (RBS), and X-ray diffractometry (XRD), respectively. The optical (high-frequency) dielectric constants were optically determined by the square of the reflective indexes measured by ellipsometry. The static dielectric constants were electrically estimated by the capacitance of Au/HfSiON/Si(100) structures. It was observed that low N incorporation in the films led to the formation of only Si-N bonds without Hf-N bonds. An abrupt decrease in band-gap energies was observed at atomic compositions corresponding to the boundary where Hf-N bonds start to form. By combining the data for the atomic concentrations and bonding states, we found that HfSiON can be regarded as a pseudo-quaternary alloy consisting of four insulating components: SiO$_2$, HfO$_2$, Si$_3$N$_4$, and Hf$_3$N$_4$. The optical and static dielectric constants for the films showed a nonlinear dependence on the N concentration, whose behavior can be understood in terms of abrupt Hf-N bond formation.
△ Less
Submitted 23 January, 2006;
originally announced January 2006.
-
Electronic structures of Cr$_{1-δ}$X (X=S, Te) studied by Cr 2p soft x-ray magnetic circular dichroism
Authors:
K. Yaji,
A. Kimura,
C. Hirai,
M. Taniguchi,
M. Koyama,
H. Sato,
K. Shimada,
A. Tanaka,
T. Muro,
S. Imada,
S. Suga
Abstract:
Cr 2p core excited XAS and XMCD spectra of ferromagnetic Cr$_{1-δ}$Te with several concentrations of $δ$=0.11-0.33 and ferrimagnetic Cr$_{5}$S$_{6}$ have been measured. The observed XMCD lineshapes are found to very weakly depend on $δ$ for Cr$_{1-δ}$Te. The experimental results are analyzed by means of a configuration-interaction cluster model calculation with consideration of hybridization and…
▽ More
Cr 2p core excited XAS and XMCD spectra of ferromagnetic Cr$_{1-δ}$Te with several concentrations of $δ$=0.11-0.33 and ferrimagnetic Cr$_{5}$S$_{6}$ have been measured. The observed XMCD lineshapes are found to very weakly depend on $δ$ for Cr$_{1-δ}$Te. The experimental results are analyzed by means of a configuration-interaction cluster model calculation with consideration of hybridization and electron correlation effects. The obtained values of the spin magnetic moment by the cluster model analyses are in agreement with the results of the band structure calculation.The calculated result shows that the doped holes created by the Cr deficiency exist mainly in the Te 5porbital of Cr$_{1-δ}$Te, whereas the holes are likely to be in Cr 3d state for Cr$_{5}$S$_{6}$.
△ Less
Submitted 11 June, 2004;
originally announced June 2004.
-
Soft X-ray magnetic circular dichroism study of the ferromagnetic Cr$_{1-δ}$Te
Authors:
K. Yaji,
A. Kimura,
C. Hirai,
H. Sato,
M. Taniguchi,
M. Koyama,
K. Shimada,
A. Tanaka,
T. Muro,
S. Imada,
S. Suga
Abstract:
The 2p core excited XAS and XMCD spectra of Cr$_{1-δ}$Te with several concentrations of $δ$=0.11-0.33 have been measured. The observed XMCD lineshapes are found to very weakly depend on $δ$. The experimental results are analyzed in terms of the configuration-interaction picture with consideration of hybridization and electron correlation effects. The calculated result shows that CrTe can be clas…
▽ More
The 2p core excited XAS and XMCD spectra of Cr$_{1-δ}$Te with several concentrations of $δ$=0.11-0.33 have been measured. The observed XMCD lineshapes are found to very weakly depend on $δ$. The experimental results are analyzed in terms of the configuration-interaction picture with consideration of hybridization and electron correlation effects. The calculated result shows that CrTe can be classified into a charge transfer type material and created holes preferably exist in Te 5p orbitals in Cr deficient materials Cr$_{1-δ}$Te, which are in consistence with the observed XMCD feature and the reported band structure calculation.
△ Less
Submitted 10 January, 2003; v1 submitted 9 January, 2003;
originally announced January 2003.
-
Electronic structure of Cr1-dS (d=0,0.17) with NiAs-type crystal structure
Authors:
M. Koyama,
H. Sato,
Y. Ueda,
C. Hirai,
M. Taniguchi
Abstract:
Valence-band and conduction-band electronic structure of CrS (d=0) and Cr5S6 (d=0.17) has been investigated by means of photoemission and inverse-photoemission spectroscopies. Bandwidth of the valence bands of Cr5S6 (8.5 eV) is wider than that of CrS (8.1 eV), though the Cr 3d partial density of states evaluated from the Cr 3p-3d resonant photoemission spectroscopy is almost unchanged between th…
▽ More
Valence-band and conduction-band electronic structure of CrS (d=0) and Cr5S6 (d=0.17) has been investigated by means of photoemission and inverse-photoemission spectroscopies. Bandwidth of the valence bands of Cr5S6 (8.5 eV) is wider than that of CrS (8.1 eV), though the Cr 3d partial density of states evaluated from the Cr 3p-3d resonant photoemission spectroscopy is almost unchanged between the two compounds with respect to the shapes including binding energies. The Cr 3d (t2g) exchange splitting energies of CrS and Cr5S6 are determined to be 3.9 and 3.3 eV, respectively.
△ Less
Submitted 15 November, 2002;
originally announced November 2002.