-
Shower Separation in Five Dimensions for Highly Granular Calorimeters using Machine Learning
Authors:
S. Lai,
J. Utehs,
A. Wilhahn,
M. C. Fouz,
O. Bach,
E. Brianne,
A. Ebrahimi,
K. Gadow,
P. Göttlicher,
O. Hartbrich,
D. Heuchel,
A. Irles,
K. Krüger,
J. Kvasnicka,
S. Lu,
C. Neubüser,
A. Provenza,
M. Reinecke,
F. Sefkow,
S. Schuwalow,
M. De Silva,
Y. Sudo,
H. L. Tran,
L. Liu,
R. Masuda
, et al. (26 additional authors not shown)
Abstract:
To achieve state-of-the-art jet energy resolution for Particle Flow, sophisticated energy clustering algorithms must be developed that can fully exploit available information to separate energy deposits from charged and neutral particles. Three published neural network-based shower separation models were applied to simulation and experimental data to measure the performance of the highly granular…
▽ More
To achieve state-of-the-art jet energy resolution for Particle Flow, sophisticated energy clustering algorithms must be developed that can fully exploit available information to separate energy deposits from charged and neutral particles. Three published neural network-based shower separation models were applied to simulation and experimental data to measure the performance of the highly granular CALICE Analogue Hadronic Calorimeter (AHCAL) technological prototype in distinguishing the energy deposited by a single charged and single neutral hadron for Particle Flow. The performance of models trained using only standard spatial and energy and charged track position information from an event was compared to models trained using timing information available from AHCAL, which is expected to improve sensitivity to shower development and, therefore, aid in clustering. Both simulation and experimental data were used to train and test the models and their performances were compared. The best-performing neural network achieved significantly superior event reconstruction when timing information was utilised in training for the case where the charged hadron had more energy than the neutral one, motivating temporally sensitive calorimeters. All models under test were observed to tend to allocate energy deposited by the more energetic of the two showers to the less energetic one. Similar shower reconstruction performance was observed for a model trained on simulation and applied to data and a model trained and applied to data.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
STOLAS: STOchastic LAttice Simulation of cosmic inflation
Authors:
Yurino Mizuguchi,
Tomoaki Murata,
Yuichiro Tada
Abstract:
We develop a C++ package of the STOchastic LAttice Simulation (STOLAS) of cosmic inflation. It performs the numerical lattice simulation in the application of the stochastic-$δN$ formalism. STOLAS can directly compute the three-dimensional map of the observable curvature perturbation without estimating its statistical properties. In its application to two toy models of inflation, chaotic inflation…
▽ More
We develop a C++ package of the STOchastic LAttice Simulation (STOLAS) of cosmic inflation. It performs the numerical lattice simulation in the application of the stochastic-$δN$ formalism. STOLAS can directly compute the three-dimensional map of the observable curvature perturbation without estimating its statistical properties. In its application to two toy models of inflation, chaotic inflation and Starobinsky's linear-potential inflation, we confirm that STOLAS is well-consistent with the standard perturbation theory. Furthermore, by introducing the importance sampling technique, we have success in numerically sampling the current abundance of primordial black holes in a non-perturbative way. The package is available at https://github.com/STOchasticLAtticeSimulation/STOLAS_dist.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
DEGNN: Dual Experts Graph Neural Network Handling Both Edge and Node Feature Noise
Authors:
Tai Hasegawa,
Sukwon Yun,
Xin Liu,
Yin Jun Phua,
Tsuyoshi Murata
Abstract:
Graph Neural Networks (GNNs) have achieved notable success in various applications over graph data. However, recent research has revealed that real-world graphs often contain noise, and GNNs are susceptible to noise in the graph. To address this issue, several Graph Structure Learning (GSL) models have been introduced. While GSL models are tailored to enhance robustness against edge noise through…
▽ More
Graph Neural Networks (GNNs) have achieved notable success in various applications over graph data. However, recent research has revealed that real-world graphs often contain noise, and GNNs are susceptible to noise in the graph. To address this issue, several Graph Structure Learning (GSL) models have been introduced. While GSL models are tailored to enhance robustness against edge noise through edge reconstruction, a significant limitation surfaces: their high reliance on node features. This inherent dependence amplifies their susceptibility to noise within node features. Recognizing this vulnerability, we present DEGNN, a novel GNN model designed to adeptly mitigate noise in both edges and node features. The core idea of DEGNN is to design two separate experts: an edge expert and a node feature expert. These experts utilize self-supervised learning techniques to produce modified edges and node features. Leveraging these modified representations, DEGNN subsequently addresses downstream tasks, ensuring robustness against noise present in both edges and node features of real-world graphs. Notably, the modification process can be trained end-to-end, empowering DEGNN to adjust dynamically and achieves optimal edge and node representations for specific tasks. Comprehensive experiments demonstrate DEGNN's efficacy in managing noise, both in original real-world graphs and in graphs with synthetic noise.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Future-Proofing Class Incremental Learning
Authors:
Quentin Jodelet,
Xin Liu,
Yin Jun Phua,
Tsuyoshi Murata
Abstract:
Exemplar-Free Class Incremental Learning is a highly challenging setting where replay memory is unavailable. Methods relying on frozen feature extractors have drawn attention recently in this setting due to their impressive performances and lower computational costs. However, those methods are highly dependent on the data used to train the feature extractor and may struggle when an insufficient am…
▽ More
Exemplar-Free Class Incremental Learning is a highly challenging setting where replay memory is unavailable. Methods relying on frozen feature extractors have drawn attention recently in this setting due to their impressive performances and lower computational costs. However, those methods are highly dependent on the data used to train the feature extractor and may struggle when an insufficient amount of classes are available during the first incremental step. To overcome this limitation, we propose to use a pre-trained text-to-image diffusion model in order to generate synthetic images of future classes and use them to train the feature extractor. Experiments on the standard benchmarks CIFAR100 and ImageNet-Subset demonstrate that our proposed method can be used to improve state-of-the-art methods for exemplar-free class incremental learning, especially in the most difficult settings where the first incremental step only contains few classes. Moreover, we show that using synthetic samples of future classes achieves higher performance than using real data from different classes, paving the way for better and less costly pre-training methods for incremental learning.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Software Compensation for Highly Granular Calorimeters using Machine Learning
Authors:
S. Lai,
J. Utehs,
A. Wilhahn,
O. Bach,
E. Brianne,
A. Ebrahimi,
K. Gadow,
P. Göttlicher,
O. Hartbrich,
D. Heuchel,
A. Irles,
K. Krüger,
J. Kvasnicka,
S. Lu,
C. Neubüser,
A. Provenza,
M. Reinecke,
F. Sefkow,
S. Schuwalow,
M. De Silva,
Y. Sudo,
H. L. Tran,
E. Buhmann,
E. Garutti,
S. Huck
, et al. (39 additional authors not shown)
Abstract:
A neural network for software compensation was developed for the highly granular CALICE Analogue Hadronic Calorimeter (AHCAL). The neural network uses spatial and temporal event information from the AHCAL and energy information, which is expected to improve sensitivity to shower development and the neutron fraction of the hadron shower. The neural network method produced a depth-dependent energy w…
▽ More
A neural network for software compensation was developed for the highly granular CALICE Analogue Hadronic Calorimeter (AHCAL). The neural network uses spatial and temporal event information from the AHCAL and energy information, which is expected to improve sensitivity to shower development and the neutron fraction of the hadron shower. The neural network method produced a depth-dependent energy weighting and a time-dependent threshold for enhancing energy deposits consistent with the timescale of evaporation neutrons. Additionally, it was observed to learn an energy-weighting indicative of longitudinal leakage correction. In addition, the method produced a linear detector response and outperformed a published control method regarding resolution for every particle energy studied.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Modularity-based selection of the number of slices in temporal network clustering
Authors:
Patrik Seiron,
Axel Lindegren,
Matteo Magnani,
Christian Rohner,
Tsuyoshi Murata,
Petter Holme
Abstract:
A popular way to cluster a temporal network is to transform it into a sequence of networks, also called slices, where each slice corresponds to a time interval and contains the vertices and edges existing in that interval. A reason to perform this transformation is that after a network has been sliced, existing algorithms designed to find clusters in multilayer networks can be used. However, to us…
▽ More
A popular way to cluster a temporal network is to transform it into a sequence of networks, also called slices, where each slice corresponds to a time interval and contains the vertices and edges existing in that interval. A reason to perform this transformation is that after a network has been sliced, existing algorithms designed to find clusters in multilayer networks can be used. However, to use this approach, we need to know how many slices to generate. This chapter discusses how to select the number of slices when generalized modularity is used to identify the clusters.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Parity-violating scalar trispectrum from a rolling axion during inflation
Authors:
Tomohiro Fujita,
Tomoaki Murata,
Ippei Obata,
Maresuke Shiraishi
Abstract:
We study a mechanism of generating the trispectrum (4-point correlation) of curvature perturbation through the dynamics of a spectator axion field and U(1) gauge field during inflation. Owing to the Chern-Simons coupling, only one helicity mode of gauge field experiences a tachyonic instability and sources scalar perturbations. Sourced curvature perturbation exhibits parity-violating nature which…
▽ More
We study a mechanism of generating the trispectrum (4-point correlation) of curvature perturbation through the dynamics of a spectator axion field and U(1) gauge field during inflation. Owing to the Chern-Simons coupling, only one helicity mode of gauge field experiences a tachyonic instability and sources scalar perturbations. Sourced curvature perturbation exhibits parity-violating nature which can be tested through its trispectrum. We numerically compute parity-even and parity-odd component of the sourced trispectrum. It is found that the ratio of parity-odd to parity-even mode can reach O(10%) in an exact equilateral momentum configuration. We also investigate a quasi-equilateral shape where only one of the momenta is slightly longer than the other three, and find that the parity-odd mode can reach, and more interestingly, surpass the parity-even one. This may help us to interpret a large parity-odd trispectrum signal extracted from BOSS galaxy-clustering data.
△ Less
Submitted 19 March, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Influence maximization on temporal networks: a review
Authors:
Eric Yanchenko,
Tsuyoshi Murata,
Petter Holme
Abstract:
Influence maximization (IM) is an important topic in network science where a small seed set is chosen to maximize the spread of influence on a network. Recently, this problem has attracted attention on temporal networks where the network structure changes with time. IM on such dynamically varying networks is the topic of this review. We first categorize methods into two main paradigms: single and…
▽ More
Influence maximization (IM) is an important topic in network science where a small seed set is chosen to maximize the spread of influence on a network. Recently, this problem has attracted attention on temporal networks where the network structure changes with time. IM on such dynamically varying networks is the topic of this review. We first categorize methods into two main paradigms: single and multiple seeding. In single seeding, nodes activate at the beginning of the diffusion process, and most methods either efficiently estimate the influence spread and select nodes with a greedy algorithm, or use a node-ranking heuristic. Nodes activate at different time points in the multiple seeding problem, via either sequential seeding, maintenance seeding or node probing paradigms. Throughout this review, we give special attention to deploying these algorithms in practice while also discussing existing solutions for real-world applications. We conclude by sharing important future research directions and challenges.
△ Less
Submitted 30 June, 2023;
originally announced July 2023.
-
Class-Incremental Learning using Diffusion Model for Distillation and Replay
Authors:
Quentin Jodelet,
Xin Liu,
Yin Jun Phua,
Tsuyoshi Murata
Abstract:
Class-incremental learning aims to learn new classes in an incremental fashion without forgetting the previously learned ones. Several research works have shown how additional data can be used by incremental models to help mitigate catastrophic forgetting. In this work, following the recent breakthrough in text-to-image generative models and their wide distribution, we propose the use of a pretrai…
▽ More
Class-incremental learning aims to learn new classes in an incremental fashion without forgetting the previously learned ones. Several research works have shown how additional data can be used by incremental models to help mitigate catastrophic forgetting. In this work, following the recent breakthrough in text-to-image generative models and their wide distribution, we propose the use of a pretrained Stable Diffusion model as a source of additional data for class-incremental learning. Compared to competitive methods that rely on external, often unlabeled, datasets of real images, our approach can generate synthetic samples belonging to the same classes as the previously encountered images. This allows us to use those additional data samples not only in the distillation loss but also for replay in the classification loss. Experiments on the competitive benchmarks CIFAR100, ImageNet-Subset, and ImageNet demonstrate how this new approach can be used to further improve the performance of state-of-the-art methods for class-incremental learning on large scale datasets.
△ Less
Submitted 9 October, 2023; v1 submitted 30 June, 2023;
originally announced June 2023.
-
Link prediction for ex ante influence maximization on temporal networks
Authors:
Eric Yanchenko,
Tsuyoshi Murata,
Petter Holme
Abstract:
Influence maximization (IM) is the task of finding the most important nodes in order to maximize the spread of influence or information on a network. This task is typically studied on static or temporal networks where the complete topology of the graph is known. In practice, however, the seed nodes must be selected before observing the future evolution of the network. In this work, we consider thi…
▽ More
Influence maximization (IM) is the task of finding the most important nodes in order to maximize the spread of influence or information on a network. This task is typically studied on static or temporal networks where the complete topology of the graph is known. In practice, however, the seed nodes must be selected before observing the future evolution of the network. In this work, we consider this realistic ex ante setting where $p$ time steps of the network have been observed before selecting the seed nodes. Then the influence is calculated after the network continues to evolve for a total of $T>p$ time steps. We address this problem by using statistical, non-negative matrix factorization and graph neural networks link prediction algorithms to predict the future evolution of the network and then apply existing influence maximization algorithms on the predicted networks. Additionally, the output of the link prediction methods can be used to construct novel IM algorithms. We apply the proposed methods to eight real-world and synthetic networks to compare their performance using the Susceptible-Infected (SI) diffusion model. We demonstrate that it is possible to construct quality seed sets in the ex ante setting as we achieve influence spread within 87\% of the optimal spread on seven of eight network. In many settings, choosing seed nodes based only historical edges provides results comparable to the results treating the future graph snapshots as known. The proposed heuristics based on the link prediction model are also some of the best-performing methods. These findings indicate that, for these eight networks under the SI model, the latent process which determines the most influential nodes may not have large temporal variation. Thus, knowing the future status of the network is not necessary to obtain good results for ex ante IM.
△ Less
Submitted 12 September, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning
Authors:
Tomoya Murata,
Taiji Suzuki
Abstract:
Differential private optimization for nonconvex smooth objective is considered. In the previous work, the best known utility bound is $\widetilde O(\sqrt{d}/(n\varepsilon_\mathrm{DP}))$ in terms of the squared full gradient norm, which is achieved by Differential Private Gradient Descent (DP-GD) as an instance, where $n$ is the sample size, $d$ is the problem dimensionality and…
▽ More
Differential private optimization for nonconvex smooth objective is considered. In the previous work, the best known utility bound is $\widetilde O(\sqrt{d}/(n\varepsilon_\mathrm{DP}))$ in terms of the squared full gradient norm, which is achieved by Differential Private Gradient Descent (DP-GD) as an instance, where $n$ is the sample size, $d$ is the problem dimensionality and $\varepsilon_\mathrm{DP}$ is the differential privacy parameter. To improve the best known utility bound, we propose a new differential private optimization framework called \emph{DIFF2 (DIFFerential private optimization via gradient DIFFerences)} that constructs a differential private global gradient estimator with possibly quite small variance based on communicated \emph{gradient differences} rather than gradients themselves. It is shown that DIFF2 with a gradient descent subroutine achieves the utility of $\widetilde O(d^{2/3}/(n\varepsilon_\mathrm{DP})^{4/3})$, which can be significantly better than the previous one in terms of the dependence on the sample size $n$. To the best of our knowledge, this is the first fundamental result to improve the standard utility $\widetilde O(\sqrt{d}/(n\varepsilon_\mathrm{DP}))$ for nonconvex objectives. Additionally, a more computational and communication efficient subroutine is combined with DIFF2 and its theoretical analysis is also given. Numerical experiments are conducted to validate the superiority of DIFF2 framework.
△ Less
Submitted 3 June, 2023; v1 submitted 8 February, 2023;
originally announced February 2023.
-
How does SU($N$)-natural inflation isotropize the universe?
Authors:
Tomoaki Murata,
Tomohiro Fujita,
Tsutomu Kobayashi
Abstract:
We study the homogeneous and anisotropic dynamics of pseudoscalar inflation coupled to an SU($N$) gauge field. To see how the initially anisotropic universe is isotropized in such an inflation model, we derive the equations to obtain axisymmetric SU($N$) gauge field configurations in Bianchi type-I geometry and discuss a method to identify their isotropic subsets which are the candidates of their…
▽ More
We study the homogeneous and anisotropic dynamics of pseudoscalar inflation coupled to an SU($N$) gauge field. To see how the initially anisotropic universe is isotropized in such an inflation model, we derive the equations to obtain axisymmetric SU($N$) gauge field configurations in Bianchi type-I geometry and discuss a method to identify their isotropic subsets which are the candidates of their late-time attractor. Each isotropic solution is characterized by the corresponding SU(2) subalgebra of the SU($N$) algebra. It is shown numerically that the isotropic universe is a universal late-time attractor in the case of the SU(3) gauge field. Interestingly, we find that a transition between the two distinct gauge-field configurations characterized by different SU(2) subalgebras can occur during inflation. We clarify the conditions for this to occur. This transition could leave an observable imprint on the CMB and the primordial gravitational wave background.
△ Less
Submitted 9 February, 2023; v1 submitted 17 November, 2022;
originally announced November 2022.
-
Maps to toric varieties and toric degenerations
Authors:
Takuya Murata,
Lara Bossinger
Abstract:
We study and construct maps to toric varieties. In the process, we generalize torus embeddings to the non-projective case. Moreover, we give an analog of Cox's construction of toric varieties as GIT quotients of affine spaces for the non-normal case after T. Kajiwara.
The main focus of the paper is an application to toric degenerations, (proper) families whose special fibers are not-necessarily-…
▽ More
We study and construct maps to toric varieties. In the process, we generalize torus embeddings to the non-projective case. Moreover, we give an analog of Cox's construction of toric varieties as GIT quotients of affine spaces for the non-normal case after T. Kajiwara.
The main focus of the paper is an application to toric degenerations, (proper) families whose special fibers are not-necessarily-normal toric varieties. We give a negative answer to a question of I. Dolgachev and K. Kaveh as to whether a toric degeneration can be constructed as a degeneration by projection. In the classical topology over the complex numbers, we recover an alternative construction of integral systems as was done by Harada--Kaveh in M. Harada and K. Kaveh. "Integrable systems, toric degenerations and okounkov bodie" using a deformation retract. In particular, we have an analogue of a moment map from a variety admitting a toric degeneration to its Newton--Okounkov polytope.
△ Less
Submitted 1 July, 2024; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning
Authors:
Kazusato Oko,
Shunta Akiyama,
Tomoya Murata,
Taiji Suzuki
Abstract:
While variance reduction methods have shown great success in solving large scale optimization problems, many of them suffer from accumulated errors and, therefore, should periodically require the full gradient computation. In this paper, we present a single-loop algorithm named SLEDGE (Single-Loop mEthoD for Gradient Estimator) for finite-sum nonconvex optimization, which does not require periodic…
▽ More
While variance reduction methods have shown great success in solving large scale optimization problems, many of them suffer from accumulated errors and, therefore, should periodically require the full gradient computation. In this paper, we present a single-loop algorithm named SLEDGE (Single-Loop mEthoD for Gradient Estimator) for finite-sum nonconvex optimization, which does not require periodic refresh of the gradient estimator but achieves nearly optimal gradient complexity. Unlike existing methods, SLEDGE has the advantage of versatility; (i) second-order optimality, (ii) exponential convergence in the PL region, and (iii) smaller complexity under less heterogeneity of data.
We build an efficient federated learning algorithm by exploiting these favorable properties. We show the first and second-order optimality of the output and also provide analysis under PL conditions. When the local budget is sufficiently large and clients are less (Hessian-)~heterogeneous, the algorithm requires fewer communication rounds then existing methods such as FedAvg, SCAFFOLD, and Mime. The superiority of our method is verified in numerical experiments.
△ Less
Submitted 4 October, 2022; v1 submitted 1 September, 2022;
originally announced September 2022.
-
Simulating reaction time for Eureka effect in visual object recognition using artificial neural network
Authors:
Kazufumi Hosoda,
Shigeto Seno,
Tsutomu Murata
Abstract:
The human brain can recognize objects hidden in even severely degraded images after observing them for a while, which is known as a type of Eureka effect, possibly associated with human creativity. A previous psychological study suggests that the basis of this "Eureka recognition" is neural processes of coincidence of multiple stochastic activities. Here we constructed an artificial-neural-network…
▽ More
The human brain can recognize objects hidden in even severely degraded images after observing them for a while, which is known as a type of Eureka effect, possibly associated with human creativity. A previous psychological study suggests that the basis of this "Eureka recognition" is neural processes of coincidence of multiple stochastic activities. Here we constructed an artificial-neural-network-based model that simulated the characteristics of the human Eureka recognition.
△ Less
Submitted 30 June, 2022;
originally announced July 2022.
-
Modularity Optimization as a Training Criterion for Graph Neural Networks
Authors:
Tsuyoshi Murata,
Naveed Afzal
Abstract:
Graph convolution is a recent scalable method for performing deep feature learning on attributed graphs by aggregating local node information over multiple layers. Such layers only consider attribute information of node neighbors in the forward model and do not incorporate knowledge of global network structure in the learning task. In particular, the modularity function provides a convenient sourc…
▽ More
Graph convolution is a recent scalable method for performing deep feature learning on attributed graphs by aggregating local node information over multiple layers. Such layers only consider attribute information of node neighbors in the forward model and do not incorporate knowledge of global network structure in the learning task. In particular, the modularity function provides a convenient source of information about the community structure of networks. In this work we investigate the effect on the quality of learned representations by the incorporation of community structure preservation objectives of networks in the graph convolutional model. We incorporate the objectives in two ways, through an explicit regularization term in the cost function in the output layer and as an additional loss term computed via an auxiliary layer. We report the effect of community structure preserving terms in the graph convolutional architectures. Experimental evaluation on two attributed bibilographic networks showed that the incorporation of the community-preserving objective improves semi-supervised node classification accuracy in the sparse label regime.
△ Less
Submitted 30 June, 2022;
originally announced July 2022.
-
Visual-based Positioning and Pose Estimation
Authors:
Somnuk Phon-Amnuaisuk,
Ken T. Murata,
La-Or Kovavisaruch,
Tiong-Hoo Lim,
Praphan Pavarangkoon,
Takamichi Mizuhara
Abstract:
Recent advances in deep learning and computer vision offer an excellent opportunity to investigate high-level visual analysis tasks such as human localization and human pose estimation. Although the performance of human localization and human pose estimation has significantly improved in recent reports, they are not perfect and erroneous localization and pose estimation can be expected among video…
▽ More
Recent advances in deep learning and computer vision offer an excellent opportunity to investigate high-level visual analysis tasks such as human localization and human pose estimation. Although the performance of human localization and human pose estimation has significantly improved in recent reports, they are not perfect and erroneous localization and pose estimation can be expected among video frames. Studies on the integration of these techniques into a generic pipeline that is robust to noise introduced from those errors are still lacking. This paper fills the missing study. We explored and developed two working pipelines that suited the visual-based positioning and pose estimation tasks. Analyses of the proposed pipelines were conducted on a badminton game. We showed that the concept of tracking by detection could work well, and errors in position and pose could be effectively handled by a linear interpolation technique using information from nearby frames. The results showed that the Visual-based Positioning and Pose Estimation could deliver position and pose estimations with good spatial and temporal resolutions.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Esca** Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning
Authors:
Tomoya Murata,
Taiji Suzuki
Abstract:
In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time. However, existing work has mainly focused on studying first-order optimality guarantees. On the other side, second-order optimality guaranteed algorithms, i.e., algorithms esca** saddle points, have been extensively studied in the non-distri…
▽ More
In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time. However, existing work has mainly focused on studying first-order optimality guarantees. On the other side, second-order optimality guaranteed algorithms, i.e., algorithms esca** saddle points, have been extensively studied in the non-distributed optimization literature. In this paper, we study a new local algorithm called Bias-Variance Reduced Local Perturbed SGD (BVR-L-PSGD), that combines the existing bias-variance reduced gradient estimator with parameter perturbation to find second-order optimal points in centralized nonconvex distributed optimization. BVR-L-PSGD enjoys second-order optimality with nearly the same communication complexity as the best known one of BVR-L-SGD to find first-order optimality. Particularly, the communication complexity is better than non-local methods when the local datasets heterogeneity is smaller than the smoothness of the local loss. In an extreme case, the communication complexity approaches to $\widetilde Θ(1)$ when the local datasets heterogeneity goes to zero. Numerical results validate our theoretical findings.
△ Less
Submitted 12 October, 2022; v1 submitted 12 February, 2022;
originally announced February 2022.
-
Lea** Through Time with Gradient-based Adaptation for Recommendation
Authors:
Nuttapong Chairatanakul,
Hoang NT,
Xin Liu,
Tsuyoshi Murata
Abstract:
Modern recommender systems are required to adapt to the change in user preferences and item popularity. Such a problem is known as the temporal dynamics problem, and it is one of the main challenges in recommender system modeling. Different from the popular recurrent modeling approach, we propose a new solution named LeapRec to the temporal dynamic problem by using trajectory-based meta-learning t…
▽ More
Modern recommender systems are required to adapt to the change in user preferences and item popularity. Such a problem is known as the temporal dynamics problem, and it is one of the main challenges in recommender system modeling. Different from the popular recurrent modeling approach, we propose a new solution named LeapRec to the temporal dynamic problem by using trajectory-based meta-learning to model time dependencies. LeapRec characterizes temporal dynamics by two complement components named global time leap (GTL) and ordered time leap (OTL). By design, GTL learns long-term patterns by finding the shortest learning path across unordered temporal data. Cooperatively, OTL learns short-term patterns by considering the sequential nature of the temporal data. Our experimental results show that LeapRec consistently outperforms the state-of-the-art methods on several datasets and recommendation metrics. Furthermore, we provide an empirical study of the interaction between GTL and OTL, showing the effects of long- and short-term modeling.
△ Less
Submitted 28 December, 2021; v1 submitted 10 December, 2021;
originally announced December 2021.
-
Simplifying approach to Node Classification in Graph Neural Networks
Authors:
Sunil Kumar Maurya,
Xin Liu,
Tsuyoshi Murata
Abstract:
Graph Neural Networks have become one of the indispensable tools to learn from graph-structured data, and their usefulness has been shown in wide variety of tasks. In recent years, there have been tremendous improvements in architecture design, resulting in better performance on various prediction tasks. In general, these neural architectures combine node feature aggregation and feature transforma…
▽ More
Graph Neural Networks have become one of the indispensable tools to learn from graph-structured data, and their usefulness has been shown in wide variety of tasks. In recent years, there have been tremendous improvements in architecture design, resulting in better performance on various prediction tasks. In general, these neural architectures combine node feature aggregation and feature transformation using learnable weight matrix in the same layer. This makes it challenging to analyze the importance of node features aggregated from various hops and the expressiveness of the neural network layers. As different graph datasets show varying levels of homophily and heterophily in features and class label distribution, it becomes essential to understand which features are important for the prediction tasks without any prior information. In this work, we decouple the node feature aggregation step and depth of graph neural network, and empirically analyze how different aggregated features play a role in prediction performance. We show that not all features generated via aggregation steps are useful, and often using these less informative features can be detrimental to the performance of the GNN model. Through our experiments, we show that learning certain subsets of these features can lead to better performance on wide variety of datasets. We propose to use softmax as a regularizer and "soft-selector" of features aggregated from neighbors at different hop distances; and L2-Normalization over GNN layers. Combining these techniques, we present a simple and shallow model, Feature Selection Graph Neural Network (FSGNN), and show empirically that the proposed model achieves comparable or even higher accuracy than state-of-the-art GNN models in nine benchmark datasets for the node classification task, with remarkable improvements up to 51.1%.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Natural Image Reconstruction from fMRI using Deep Learning: A Survey
Authors:
Zarina Rakhimberdina,
Quentin Jodelet,
Xin Liu,
Tsuyoshi Murata
Abstract:
With the advent of brain imaging techniques and machine learning tools, much effort has been devoted to building computational models to capture the encoding of visual information in the human brain. One of the most challenging brain decoding tasks is the accurate reconstruction of the perceived natural images from brain activities measured by functional magnetic resonance imaging (fMRI). In this…
▽ More
With the advent of brain imaging techniques and machine learning tools, much effort has been devoted to building computational models to capture the encoding of visual information in the human brain. One of the most challenging brain decoding tasks is the accurate reconstruction of the perceived natural images from brain activities measured by functional magnetic resonance imaging (fMRI). In this work, we survey the most recent deep learning methods for natural image reconstruction from fMRI. We examine these methods in terms of architectural design, benchmark datasets, and evaluation metrics and present a fair performance evaluation across standardized evaluation metrics. Finally, we discuss the strengths and limitations of existing studies and present potential future directions.
△ Less
Submitted 24 November, 2021; v1 submitted 18 October, 2021;
originally announced October 2021.
-
Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph
Authors:
Nuttapong Chairatanakul,
Noppayut Sriwatanasakdi,
Nontawat Charoenphakdee,
Xin Liu,
Tsuyoshi Murata
Abstract:
In cross-lingual text classification, it is required that task-specific training data in high-resource source languages are available, where the task is identical to that of a low-resource target language. However, collecting such training data can be infeasible because of the labeling cost, task characteristics, and privacy concerns. This paper proposes an alternative solution that uses only task…
▽ More
In cross-lingual text classification, it is required that task-specific training data in high-resource source languages are available, where the task is identical to that of a low-resource target language. However, collecting such training data can be infeasible because of the labeling cost, task characteristics, and privacy concerns. This paper proposes an alternative solution that uses only task-independent word embeddings of high-resource languages and bilingual dictionaries. First, we construct a dictionary-based heterogeneous graph (DHG) from bilingual dictionaries. This opens the possibility to use graph neural networks for cross-lingual transfer. The remaining challenge is the heterogeneity of DHG because multiple languages are considered. To address this challenge, we propose dictionary-based heterogeneous graph neural network (DHGNet) that effectively handles the heterogeneity of DHG by two-step aggregations, which are word-level and language-level aggregations. Experimental results demonstrate that our method outperforms pretrained models even though it does not access to large corpora. Furthermore, it can perform well even though dictionaries contain many incorrect translations. Its robustness allows the usage of a wider range of dictionaries such as an automatically constructed dictionary and crowdsourced dictionary, which are convenient for real-world applications.
△ Less
Submitted 9 September, 2021; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Dynamics of inflation with mutually orthogonal vector fields in a closed universe
Authors:
Tomoaki Murata,
Tsutomu Kobayashi
Abstract:
We study the dynamics of a homogeneous, isotropic, and positively curved universe in the presence of a SU(2) gauge field or a triplet of mutually orthogonal vector fields. In the SU(2) case we use the previously known ansatz for the gauge-field configuration, but the case without non-abelian symmetries is more nontrivial and we develop a new ansatz. We in particular consider axion-SU(2) inflation…
▽ More
We study the dynamics of a homogeneous, isotropic, and positively curved universe in the presence of a SU(2) gauge field or a triplet of mutually orthogonal vector fields. In the SU(2) case we use the previously known ansatz for the gauge-field configuration, but the case without non-abelian symmetries is more nontrivial and we develop a new ansatz. We in particular consider axion-SU(2) inflation and inflation with vector fields having U(1)$\times$U(1)$\times$U(1) symmetry, and analyze their dynamics in detail numerically. Novel effects of the spatial curvature come into play through vector fields, which causes unconventional pre-inflationary dynamics. It is found that the closed universe with vector fields is slightly more stable against collapse than that filled solely with an inflaton field.
△ Less
Submitted 20 September, 2021; v1 submitted 15 July, 2021;
originally announced July 2021.
-
Improving Graph Neural Networks with Simple Architecture Design
Authors:
Sunil Kumar Maurya,
Xin Liu,
Tsuyoshi Murata
Abstract:
Graph Neural Networks have emerged as a useful tool to learn on the data by applying additional constraints based on the graph structure. These graphs are often created with assumed intrinsic relations between the entities. In recent years, there have been tremendous improvements in the architecture design, pushing the performance up in various prediction tasks. In general, these neural architectu…
▽ More
Graph Neural Networks have emerged as a useful tool to learn on the data by applying additional constraints based on the graph structure. These graphs are often created with assumed intrinsic relations between the entities. In recent years, there have been tremendous improvements in the architecture design, pushing the performance up in various prediction tasks. In general, these neural architectures combine layer depth and node feature aggregation steps. This makes it challenging to analyze the importance of features at various hops and the expressiveness of the neural network layers. As different graph datasets show varying levels of homophily and heterophily in features and class label distribution, it becomes essential to understand which features are important for the prediction tasks without any prior information. In this work, we decouple the node feature aggregation step and depth of graph neural network and introduce several key design strategies for graph neural networks. More specifically, we propose to use softmax as a regularizer and "Soft-Selector" of features aggregated from neighbors at different hop distances; and "Hop-Normalization" over GNN layers. Combining these techniques, we present a simple and shallow model, Feature Selection Graph Neural Network (FSGNN), and show empirically that the proposed model outperforms other state of the art GNN models and achieves up to 64% improvements in accuracy on node classification tasks. Moreover, analyzing the learned soft-selection parameters of the model provides a simple way to study the importance of features in the prediction tasks. Finally, we demonstrate with experiments that the model is scalable for large graphs with millions of nodes and billions of edges.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
The isotropic attractor solution of axion-SU(2) inflation: Universal isotropization in Bianchi type-I geometry
Authors:
Ira Wolfson,
Azadeh Maleknejad,
Tomoaki Murata,
Eiichiro Komatsu,
Tsutomu Kobayashi
Abstract:
SU(2) gauge fields coupled to an axion field can acquire an isotropic background solution during inflation. We study homogeneous but anisotropic inflationary solutions in the presence of such (massless) gauge fields. A gauge field in the cosmological background may pose a threat to spatial isotropy. We show, however, that such models $\textit{generally}$ isotropize in Bianchi type-I geometry, and…
▽ More
SU(2) gauge fields coupled to an axion field can acquire an isotropic background solution during inflation. We study homogeneous but anisotropic inflationary solutions in the presence of such (massless) gauge fields. A gauge field in the cosmological background may pose a threat to spatial isotropy. We show, however, that such models $\textit{generally}$ isotropize in Bianchi type-I geometry, and the isotropic solution is the attractor. Restricting the setup by adding an axial symmetry, we revisited the numerical analysis presented in Wolfson et.al (2020). We find that the reported numerical breakdown in the previous analysis is an artifact of parametrization singularity. We use a new parametrization that is well-defined all over the phase space. We show that the system respects the cosmic no-hair conjecture and the anisotropies always dilute away within a few e-folds.
△ Less
Submitted 27 September, 2021; v1 submitted 12 May, 2021;
originally announced May 2021.
-
Balanced softmax cross-entropy for incremental learning with and without memory
Authors:
Quentin Jodelet,
Xin Liu,
Tsuyoshi Murata
Abstract:
When incrementally trained on new classes, deep neural networks are subject to catastrophic forgetting which leads to an extreme deterioration of their performance on the old classes while learning the new ones. Using a small memory containing few samples from past classes has shown to be an effective method to mitigate catastrophic forgetting. However, due to the limited size of the replay memory…
▽ More
When incrementally trained on new classes, deep neural networks are subject to catastrophic forgetting which leads to an extreme deterioration of their performance on the old classes while learning the new ones. Using a small memory containing few samples from past classes has shown to be an effective method to mitigate catastrophic forgetting. However, due to the limited size of the replay memory, there is a large imbalance between the number of samples for the new and the old classes in the training dataset resulting in bias in the final model. To address this issue, we propose to use the Balanced Softmax Cross-Entropy and show that it can be seamlessly combined with state-of-the-art approaches for class-incremental learning in order to improve their accuracy while also potentially decreasing the computational cost of the training procedure. We further extend this approach to the more demanding class-incremental learning without memory setting and achieve competitive results with memory-based approaches. Experiments on the challenging ImageNet, ImageNet-Subset and CIFAR100 benchmarks with various settings demonstrate the benefits of our approach.
△ Less
Submitted 14 November, 2022; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Investigation of alpha particle induced reactions on natural silver in the 40-50 MeV energy range
Authors:
F. Ditrói,
S. Takács,
H. Haba,
Y. Komori,
M. Aikawa,
M. Saito,
T. Murata
Abstract:
Natural silver targets have been irradiated by using a 50 MeV alpha-particle beam in order to measure the activation cross sections of radioisotopes in the 40-50 MeV energy range. Among the radio-products there are medically important isotopes such as $^{110m}$In and $^{111}$In. For optimizing the production of these radioisotopes and regarding their purity and specific activity the cross section…
▽ More
Natural silver targets have been irradiated by using a 50 MeV alpha-particle beam in order to measure the activation cross sections of radioisotopes in the 40-50 MeV energy range. Among the radio-products there are medically important isotopes such as $^{110m}$In and $^{111}$In. For optimizing the production of these radioisotopes and regarding their purity and specific activity the cross section data for every produced radioisotope are important. New data are measured in this energy range and the results of some previous measurements have been confirmed. Physical yield curves have been calculated by using the new cross section data completed with the results from the literature.
△ Less
Submitted 19 March, 2021;
originally announced March 2021.
-
Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning
Authors:
Tomoya Murata,
Taiji Suzuki
Abstract:
Recently, local SGD has got much attention and been extensively studied in the distributed learning community to overcome the communication bottleneck problem. However, the superiority of local SGD to minibatch SGD only holds in quite limited situations. In this paper, we study a new local algorithm called Bias-Variance Reduced Local SGD (BVR-L-SGD) for nonconvex distributed optimization. Algorith…
▽ More
Recently, local SGD has got much attention and been extensively studied in the distributed learning community to overcome the communication bottleneck problem. However, the superiority of local SGD to minibatch SGD only holds in quite limited situations. In this paper, we study a new local algorithm called Bias-Variance Reduced Local SGD (BVR-L-SGD) for nonconvex distributed optimization. Algorithmically, our proposed bias and variance reduced local gradient estimator fully utilizes small second-order heterogeneity of local objectives and suggests randomly picking up one of the local models instead of taking the average of them when workers are synchronized. Theoretically, under small heterogeneity of local objectives, we show that BVR-L-SGD achieves better communication complexity than both the previous non-local and local methods under mild conditions, and particularly BVR-L-SGD is the first method that breaks the barrier of communication complexity $Θ(1/\varepsilon)$ for general nonconvex smooth objectives when the heterogeneity is small and the local computation budget is large. Numerical results are given to verify the theoretical findings and give empirical evidence of the superiority of our method.
△ Less
Submitted 13 June, 2021; v1 submitted 5 February, 2021;
originally announced February 2021.
-
Stacked Graph Filter
Authors:
Hoang NT,
Takanori Maehara,
Tsuyoshi Murata
Abstract:
We study Graph Convolutional Networks (GCN) from the graph signal processing viewpoint by addressing a difference between learning graph filters with fully connected weights versus trainable polynomial coefficients. We find that by stacking graph filters with learnable polynomial parameters, we can build a highly adaptive and robust vertex classification model. Our treatment here relaxes the low-f…
▽ More
We study Graph Convolutional Networks (GCN) from the graph signal processing viewpoint by addressing a difference between learning graph filters with fully connected weights versus trainable polynomial coefficients. We find that by stacking graph filters with learnable polynomial parameters, we can build a highly adaptive and robust vertex classification model. Our treatment here relaxes the low-frequency (or equivalently, high homophily) assumptions in existing vertex classification models, resulting a more ubiquitous solution in terms of spectral properties. Empirically, by using only one hyper-parameter setting, our model achieves strong results on most benchmark datasets across the frequency spectrum.
△ Less
Submitted 22 November, 2020;
originally announced November 2020.
-
Sparse identification of nonlinear dynamics with low-dimensionalized flow representations
Authors:
Kai Fukami,
Takaaki Murata,
Kai Zhang,
Koji Fukagata
Abstract:
We perform a sparse identification of nonlinear dynamics (SINDy) for low-dimensionalized complex flow phenomena. We first apply the SINDy with two regression methods, the thresholded least square algorithm (TLSA) and the adaptive Lasso (Alasso) which show reasonable ability with a wide range of sparsity constant in our preliminary tests, to a two-dimensional single cylinder wake at $Re_D=100$, its…
▽ More
We perform a sparse identification of nonlinear dynamics (SINDy) for low-dimensionalized complex flow phenomena. We first apply the SINDy with two regression methods, the thresholded least square algorithm (TLSA) and the adaptive Lasso (Alasso) which show reasonable ability with a wide range of sparsity constant in our preliminary tests, to a two-dimensional single cylinder wake at $Re_D=100$, its transient process, and a wake of two-parallel cylinders, as examples of high-dimensional fluid data. To handle these high dimensional data with SINDy whose library matrix is suitable for low-dimensional variable combinations, a convolutional neural network-based autoencoder (CNN-AE) is utilized. The CNN-AE is employed to map a high-dimensional dynamics into a low-dimensional latent space. The SINDy then seeks a governing equation of the mapped low-dimensional latent vector. Temporal evolution of high-dimensional dynamics can be provided by combining the predicted latent vector by SINDy with the CNN decoder which can remap the low-dimensional latent vector to the original dimension. The SINDy can provide a stable solution as the governing equation of the latent dynamics and the CNN-SINDy based modeling can reproduce high-dimensional flow fields successfully, although more terms are required to represent the transient flow and the two-parallel cylinder wake than the periodic shedding. A nine-equation turbulent shear flow model is finally considered to examine the applicability of SINDy to turbulence, although without using CNN-AE. The present results suggest that the proposed scheme with an appropriate parameter choice enables us to analyze high-dimensional nonlinear dynamics with interpretable low-dimensional manifolds.
△ Less
Submitted 1 August, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Generic tropical initial ideals of Cohen-Macaulay algebras
Authors:
Kiumars Kaveh,
Christopher Manon,
Takuya Murata
Abstract:
We study the generic tropical initial ideals of a positively graded Cohen-Macaulay algebra $R$ over an algebraically closed field $\mathbf{k}$. Building on work of Römer and Schmitz, we give a formula for each initial ideal, and we express the associated quasivaluations in terms of certain $I$-adic filtrations. As a corollary, we show that in the case that $R$ is a domain, every initial ideal comi…
▽ More
We study the generic tropical initial ideals of a positively graded Cohen-Macaulay algebra $R$ over an algebraically closed field $\mathbf{k}$. Building on work of Römer and Schmitz, we give a formula for each initial ideal, and we express the associated quasivaluations in terms of certain $I$-adic filtrations. As a corollary, we show that in the case that $R$ is a domain, every initial ideal coming from the codimension-$1$ skeleton of the tropical variety is prime, so "generic presentations of Cohen-Macaulay domains are well-poised in codimension-$1$."
△ Less
Submitted 15 January, 2021; v1 submitted 10 September, 2020;
originally announced September 2020.
-
MetAL: Active Semi-Supervised Learning on Graphs via Meta Learning
Authors:
Kaushalya Madhawa,
Tsuyoshi Murata
Abstract:
The objective of active learning (AL) is to train classification models with less number of labeled instances by selecting only the most informative instances for labeling. The AL algorithms designed for other data types such as images and text do not perform well on graph-structured data. Although a few heuristics-based AL algorithms have been proposed for graphs, a principled approach is lacking…
▽ More
The objective of active learning (AL) is to train classification models with less number of labeled instances by selecting only the most informative instances for labeling. The AL algorithms designed for other data types such as images and text do not perform well on graph-structured data. Although a few heuristics-based AL algorithms have been proposed for graphs, a principled approach is lacking. In this paper, we propose MetAL, an AL approach that selects unlabeled instances that directly improve the future performance of a classification model. For a semi-supervised learning problem, we formulate the AL task as a bilevel optimization problem. Based on recent work in meta-learning, we use the meta-gradients to approximate the impact of retraining the model with any unlabeled instance on the model performance. Using multiple graph datasets belonging to different domains, we demonstrate that MetAL efficiently outperforms existing state-of-the-art AL algorithms.
△ Less
Submitted 22 July, 2020;
originally announced July 2020.
-
Graph Convolutional Networks for Graphs Containing Missing Features
Authors:
Hibiki Taguchi,
Xin Liu,
Tsuyoshi Murata
Abstract:
Graph Convolutional Network (GCN) has experienced great success in graph analysis tasks. It works by smoothing the node features across the graph. The current GCN models overwhelmingly assume that the node feature information is complete. However, real-world graph data are often incomplete and containing missing features. Traditionally, people have to estimate and fill in the unknown features base…
▽ More
Graph Convolutional Network (GCN) has experienced great success in graph analysis tasks. It works by smoothing the node features across the graph. The current GCN models overwhelmingly assume that the node feature information is complete. However, real-world graph data are often incomplete and containing missing features. Traditionally, people have to estimate and fill in the unknown features based on imputation techniques and then apply GCN. However, the process of feature filling and graph learning are separated, resulting in degraded and unstable performance. This problem becomes more serious when a large number of features are missing. We propose an approach that adapts GCN to graphs containing missing features. In contrast to traditional strategy, our approach integrates the processing of missing features and graph learning within the same neural network architecture. Our idea is to represent the missing data by Gaussian Mixture Model (GMM) and calculate the expected activation of neurons in the first hidden layer of GCN, while kee** the other layers of the network unchanged. This enables us to learn the GMM parameters and network weight parameters in an end-to-end manner. Notably, our approach does not increase the computational complexity of GCN and it is consistent with GCN when the features are complete. We demonstrate through extensive experiments that our approach significantly outperforms the imputation-based methods in node classification and link prediction tasks. We show that the performance of our approach for the case with a low level of missing features is even superior to GCN for the case with complete features.
△ Less
Submitted 6 December, 2020; v1 submitted 9 July, 2020;
originally announced July 2020.
-
Gradient Descent in RKHS with Importance Labeling
Authors:
Tomoya Murata,
Taiji Suzuki
Abstract:
Labeling cost is often expensive and is a fundamental limitation of supervised learning. In this paper, we study importance labeling problem, in which we are given many unlabeled data and select a limited number of data to be labeled from the unlabeled data, and then a learning algorithm is executed on the selected one. We propose a new importance labeling scheme that can effectively select an inf…
▽ More
Labeling cost is often expensive and is a fundamental limitation of supervised learning. In this paper, we study importance labeling problem, in which we are given many unlabeled data and select a limited number of data to be labeled from the unlabeled data, and then a learning algorithm is executed on the selected one. We propose a new importance labeling scheme that can effectively select an informative subset of unlabeled data in least squares regression in Reproducing Kernel Hilbert Spaces (RKHS). We analyze the generalization error of gradient descent combined with our labeling scheme and show that the proposed algorithm achieves the optimal rate of convergence in much wider settings and especially gives much better generalization ability in a small label noise setting than the usual uniform sampling scheme. Numerical experiments verify our theoretical findings.
△ Less
Submitted 12 April, 2021; v1 submitted 18 June, 2020;
originally announced June 2020.
-
Machine-learning-based reduced order modeling for unsteady flows around bluff bodies of various shapes
Authors:
Kazuto Hasegawa,
Kai Fukami,
Takaaki Murata,
Koji Fukagata
Abstract:
We propose a method to construct a reduced order model with machine learning for unsteady flows. The present machine-learned reduced order model (ML-ROM) is constructed by combining a convolutional neural network autoencoder (CNN-AE) and a long short-term memory (LSTM), which are trained in a sequential manner. First, the CNN-AE is trained using direct numerical simulation (DNS) data so as to map…
▽ More
We propose a method to construct a reduced order model with machine learning for unsteady flows. The present machine-learned reduced order model (ML-ROM) is constructed by combining a convolutional neural network autoencoder (CNN-AE) and a long short-term memory (LSTM), which are trained in a sequential manner. First, the CNN-AE is trained using direct numerical simulation (DNS) data so as to map the high-dimensional flow data into low-dimensional latent space. Then, the LSTM is utilized to establish a temporal prediction system for the low-dimensionalized vectors obtained by CNN-AE. As a test case, we consider flows around a bluff body whose shape is defined using a combination of trigonometric functions with random amplitudes. The present ML-ROMs are trained on a set of 80 bluff body shapes and tested on a different set of 20 bluff body shapes not used for training, with both training and test shapes chosen from the same random distribution. The flow fields are confirmed to be well reproduced by the present ML-ROM in terms of various statistics. We also focus on the influence of two main parameters: (1) the latent vector size in the CNN-AE, and (2) the time step size between the mapped vectors used for the LSTM. The present results show that the ML-ROM works well even for unseen shapes of bluff bodies when these parameters are properly chosen, which implies great potential for the present type of ML-ROM to be applied to more complex flows
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
Nonlinear mode decomposition with convolutional neural networks for fluid dynamics
Authors:
Takaaki Murata,
Kai Fukami,
Koji Fukagata
Abstract:
We present a new nonlinear mode decomposition method to visualize the decomposed flow fields, named the mode decomposing convolutional neural network autoencoder (MD-CNN-AE). The proposed method is applied to a flow around a circular cylinder at $Re_D=100$ as a test case. The flow attributes are mapped into two modes in the latent space and then these two modes are visualized in the physical space…
▽ More
We present a new nonlinear mode decomposition method to visualize the decomposed flow fields, named the mode decomposing convolutional neural network autoencoder (MD-CNN-AE). The proposed method is applied to a flow around a circular cylinder at $Re_D=100$ as a test case. The flow attributes are mapped into two modes in the latent space and then these two modes are visualized in the physical space. Because the MD-CNN-AEs with nonlinear activation functions show lower reconstruction errors than the proper orthogonal decomposition (POD), the nonlinearity contained in the activation function is considered the key to improve the capability of the model. It is found by applying POD to each field decomposed using the MD-CNN-AE with hyperbolic tangent activation that a single nonlinear MD-CNN-AE mode contains multiple orthogonal bases, in contrast to the linear methods, i.e., POD and the MD-CNN-AE with linear activation. We further assess the proposed MD-CNN-AE by applying it to a transient process of a circular cylinder wake in order to examine its capability for flows containing high-order spatial modes. The present results suggest a great potential for the nonlinear MD-CNN-AE to be used for feature extraction of flow fields in lower dimension than POD, while retaining interpretable relationships with the conventional POD modes.
△ Less
Submitted 10 October, 2019; v1 submitted 7 June, 2019;
originally announced June 2019.
-
Accelerated Sparsified SGD with Error Feedback
Authors:
Tomoya Murata,
Taiji Suzuki
Abstract:
A stochastic gradient method for synchronous distributed optimization is studied. For reducing communication cost, we particularly focus on utilization of compression of communicated gradients. Several work has shown that {\it{sparsified}} stochastic gradient descent method (SGD) with {\it{error feedback}} asymptotically achieves the same rate as (non-sparsified) parallel SGD. However, from a view…
▽ More
A stochastic gradient method for synchronous distributed optimization is studied. For reducing communication cost, we particularly focus on utilization of compression of communicated gradients. Several work has shown that {\it{sparsified}} stochastic gradient descent method (SGD) with {\it{error feedback}} asymptotically achieves the same rate as (non-sparsified) parallel SGD. However, from a viewpoint of non-asymptotic behavior, the compression error may cause slower convergence than non-sparsified SGD in early iterations. This is problematic in practical situations since early stop** is often adopted to maximize the generalization ability of learned models. For improving the previous results, we propose and theoretically analyse a sparsified stochastic gradient method with error feedback scheme combined with {\it{Nesterov's acceleration}}. It is shown that the necessary per iteration communication cost for maintaining the same rate as vanilla SGD can be smaller than non-accelerated methods in convex and even in nonconvex optimization problems. This indicates that our proposed method makes a better use of compressed information than previous methods. Numerical experiments are provided and empirically validates our theoretical findings.
△ Less
Submitted 18 June, 2020; v1 submitted 29 May, 2019;
originally announced May 2019.
-
Learning Graph Neural Networks with Noisy Labels
Authors:
Hoang NT,
Choong Jun **,
Tsuyoshi Murata
Abstract:
We study the robustness to symmetric label noise of GNNs training procedures. By combining the nonlinear neural message-passing models (e.g. Graph Isomorphism Networks, GraphSAGE, etc.) with loss correction methods, we present a noise-tolerant approach for the graph classification task. Our experiments show that test accuracy can be improved under the artificial symmetric noisy setting.
We study the robustness to symmetric label noise of GNNs training procedures. By combining the nonlinear neural message-passing models (e.g. Graph Isomorphism Networks, GraphSAGE, etc.) with loss correction methods, we present a noise-tolerant approach for the graph classification task. Our experiments show that test accuracy can be improved under the artificial symmetric noisy setting.
△ Less
Submitted 4 May, 2019;
originally announced May 2019.
-
Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation
Authors:
Tomoya Murata,
Taiji Suzuki
Abstract:
We develop new stochastic gradient methods for efficiently solving sparse linear regression in a partial attribute observation setting, where learners are only allowed to observe a fixed number of actively chosen attributes per example at training and prediction times. It is shown that the methods achieve essentially a sample complexity of $O(1/\varepsilon)$ to attain an error of $\varepsilon$ und…
▽ More
We develop new stochastic gradient methods for efficiently solving sparse linear regression in a partial attribute observation setting, where learners are only allowed to observe a fixed number of actively chosen attributes per example at training and prediction times. It is shown that the methods achieve essentially a sample complexity of $O(1/\varepsilon)$ to attain an error of $\varepsilon$ under a variant of restricted eigenvalue condition, and the rate has better dependency on the problem dimension than existing methods. Particularly, if the smallest magnitude of the non-zero components of the optimal solution is not too small, the rate of our proposed {\it Hybrid} algorithm can be boosted to near the minimax optimal sample complexity of {\it full information} algorithms. The core ideas are (i) efficient construction of an unbiased gradient estimator by the iterative usage of the hard thresholding operator for configuring an exploration algorithm; and (ii) an adaptive combination of the exploration and an exploitation algorithms for quickly identifying the support of the optimum and efficiently searching the optimal parameter in its support. Experimental results are presented to validate our theoretical findings and the superiority of our proposed methods.
△ Less
Submitted 30 November, 2018; v1 submitted 5 September, 2018;
originally announced September 2018.
-
Exploring the Applications of Faster R-CNN and Single-Shot Multi-box Detection in a Smart Nursery Domain
Authors:
Somnuk Phon-Amnuaisuk,
Ken T. Murata,
Praphan Pavarangkoon,
Kazunori Yamamoto,
Takamichi Mizuhara
Abstract:
The ultimate goal of a baby detection task concerns detecting the presence of a baby and other objects in a sequence of 2D images, tracking them and understanding the semantic contents of the scene. Recent advances in deep learning and computer vision offer various powerful tools in general object detection and can be applied to a baby detection task. In this paper, the Faster Region-based Convolu…
▽ More
The ultimate goal of a baby detection task concerns detecting the presence of a baby and other objects in a sequence of 2D images, tracking them and understanding the semantic contents of the scene. Recent advances in deep learning and computer vision offer various powerful tools in general object detection and can be applied to a baby detection task. In this paper, the Faster Region-based Convolutional Neural Network and the Single-Shot Multi-Box Detection approaches are explored. They are the two state-of-the-art object detectors based on the region proposal tactic and the multi-box tactic. The presence of a baby in the scene obtained from these detectors, tested using different pre-trained models, are discussed. This study is important since the behaviors of these detectors in a baby detection task using different pre-trained models are still not well understood. This exploratory study reveals many useful insights into the applications of these object detectors in the smart nursery domain.
△ Less
Submitted 26 August, 2018;
originally announced August 2018.
-
Spectral Pruning: Compressing Deep Neural Networks via Spectral Analysis and its Generalization Error
Authors:
Taiji Suzuki,
Hiroshi Abe,
Tomoya Murata,
Shingo Horiuchi,
Kotaro Ito,
Tokuma Wachi,
So Hirai,
Masatoshi Yukishima,
Tomoaki Nishimura
Abstract:
Compression techniques for deep neural network models are becoming very important for the efficient execution of high-performance deep learning systems on edge-computing devices. The concept of model compression is also important for analyzing the generalization error of deep learning, known as the compression-based error bound. However, there is still huge gap between a practically effective comp…
▽ More
Compression techniques for deep neural network models are becoming very important for the efficient execution of high-performance deep learning systems on edge-computing devices. The concept of model compression is also important for analyzing the generalization error of deep learning, known as the compression-based error bound. However, there is still huge gap between a practically effective compression method and its rigorous background of statistical learning theory. To resolve this issue, we develop a new theoretical framework for model compression and propose a new pruning method called {\it spectral pruning} based on this framework. We define the ``degrees of freedom'' to quantify the intrinsic dimensionality of a model by using the eigenvalue distribution of the covariance matrix across the internal nodes and show that the compression ability is essentially controlled by this quantity. Moreover, we present a sharp generalization error bound of the compressed model and characterize the bias--variance tradeoff induced by the compression procedure. We apply our method to several datasets to justify our theoretical analyses and show the superiority of the the proposed method.
△ Less
Submitted 13 July, 2020; v1 submitted 26 August, 2018;
originally announced August 2018.
-
Exploring Partially Observed Networks with Nonparametric Bandits
Authors:
Kaushalya Madhawa,
Tsuyoshi Murata
Abstract:
Real-world networks such as social and communication networks are too large to be observed entirely. Such networks are often partially observed such that network size, network topology, and nodes of the original network are unknown. In this paper we formalize the Adaptive Graph Exploring problem. We assume that we are given an incomplete snapshot of a large network and additional nodes can be disc…
▽ More
Real-world networks such as social and communication networks are too large to be observed entirely. Such networks are often partially observed such that network size, network topology, and nodes of the original network are unknown. In this paper we formalize the Adaptive Graph Exploring problem. We assume that we are given an incomplete snapshot of a large network and additional nodes can be discovered by querying nodes in the currently observed network. The goal of this problem is to maximize the number of observed nodes within a given query budget. Querying which set of nodes maximizes the size of the observed network? We formulate this problem as an exploration-exploitation problem and propose a novel nonparametric multi-arm bandit (MAB) algorithm for identifying which nodes to be queried. Our contributions include: (1) $i$KNN-UCB, a novel nonparametric MAB algorithm, applies $k$-nearest neighbor UCB to the setting when the arms are presented in a vector space, (2) provide theoretical guarantee that $i$KNN-UCB algorithm has sublinear regret, and (3) applying $i$KNN-UCB algorithm on synthetic networks and real-world networks from different domains, we show that our method discovers up to 40% more nodes compared to existing baselines.
△ Less
Submitted 19 April, 2018;
originally announced April 2018.
-
On degenerations of projective varieties to complexity-one T-varieties
Authors:
Kiumars Kaveh,
Christopher Manon,
Takuya Murata
Abstract:
Let $R$ be a positively graded finitely generated $\textbf{k}$-domain with Krull dimension $d+1$. We show that there is a homogeneous valuation $\mathfrak{v}: R \setminus \{0\} \to \mathbb{Z}^d$ of rank $d$ such that the associated graded $\text{gr}_\mathfrak{v}(R)$ is finitely generated. This then implies that any polarized $d$-dimensional projective variety $X$ has a flat deformation over…
▽ More
Let $R$ be a positively graded finitely generated $\textbf{k}$-domain with Krull dimension $d+1$. We show that there is a homogeneous valuation $\mathfrak{v}: R \setminus \{0\} \to \mathbb{Z}^d$ of rank $d$ such that the associated graded $\text{gr}_\mathfrak{v}(R)$ is finitely generated. This then implies that any polarized $d$-dimensional projective variety $X$ has a flat deformation over $\mathbb{A}^1$, with reduced and irreducible fibers, to a polarized projective complexity-one $T$-variety (i.e. a variety with a faithful action of a $(d-1)$-dimensional torus $T$). As an application we conclude that any $d$-dimensional complex smooth projective variety $X$ equipped with an integral Kähler form has a proper $(d-1)$-dimensional Hamiltonian torus action on an open dense subset that extends continuously to all of $X$.
△ Less
Submitted 13 January, 2020; v1 submitted 8 August, 2017;
originally announced August 2017.
-
Energy flow in biological system: Bioenergy transduction of V1-ATPase molecular rotary motor from E. hirae
Authors:
Ichiro Yamato,
Takeshi Murata,
Andrei Khrennikov
Abstract:
We classify research fields in biology into those on flows of materials, energy, and information. As a representative energy transducing machinery in biology, our research target, V1-ATPase from a bacterium Enterococcus hirae, a typical molecular rotary motor is introduced. Structures of several intermediates of the rotary motor are described and the molecular mechanism of the motor converting che…
▽ More
We classify research fields in biology into those on flows of materials, energy, and information. As a representative energy transducing machinery in biology, our research target, V1-ATPase from a bacterium Enterococcus hirae, a typical molecular rotary motor is introduced. Structures of several intermediates of the rotary motor are described and the molecular mechanism of the motor converting chemical energy into mechanical energy is discussed. Comments and considerations on the information flow in biology, especially on the thermodynamic entropy in quantum physical and biological systems, are added in a separate section containing the biologist friendly presentation of this complicated question.
△ Less
Submitted 27 May, 2017;
originally announced May 2017.
-
Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization
Authors:
Tomoya Murata,
Taiji Suzuki
Abstract:
In this paper, we develop a new accelerated stochastic gradient method for efficiently solving the convex regularized empirical risk minimization problem in mini-batch settings. The use of mini-batches is becoming a golden standard in the machine learning community, because mini-batch settings stabilize the gradient estimate and can easily make good use of parallel computing. The core of our propo…
▽ More
In this paper, we develop a new accelerated stochastic gradient method for efficiently solving the convex regularized empirical risk minimization problem in mini-batch settings. The use of mini-batches is becoming a golden standard in the machine learning community, because mini-batch settings stabilize the gradient estimate and can easily make good use of parallel computing. The core of our proposed method is the incorporation of our new "double acceleration" technique and variance reduction technique. We theoretically analyze our proposed method and show that our method much improves the mini-batch efficiencies of previous accelerated stochastic methods, and essentially only needs size $\sqrt{n}$ mini-batches for achieving the optimal iteration complexities for both non-strongly and strongly convex objectives, where $n$ is the training set size. Further, we show that even in non-mini-batch settings, our method achieves the best known convergence rate for both non-strongly and strongly convex objectives.
△ Less
Submitted 19 September, 2017; v1 submitted 1 March, 2017;
originally announced March 2017.
-
Towards a Unified Model of Neutrino-Nucleus Reactions for Neutrino Oscillation Experiments
Authors:
S. X. Nakamura,
H. Kamano,
Y. Hayato,
M. Hirai,
W. Horiuchi,
S. Kumano,
T. Murata,
K. Saito,
M. Sakuda,
T. Sato,
Y. Suzuki
Abstract:
A precise description of neutrino-nucleus reactions will play a key role in addressing fundamental questions such as the leptonic CP violation and the neutrino mass hierarchy through analyzing data from next-generation neutrino oscillation experiments. The neutrino energy relevant to the neutrino-nucleus reactions spans a broad range and, accordingly, the dominant reaction mechanism varies across…
▽ More
A precise description of neutrino-nucleus reactions will play a key role in addressing fundamental questions such as the leptonic CP violation and the neutrino mass hierarchy through analyzing data from next-generation neutrino oscillation experiments. The neutrino energy relevant to the neutrino-nucleus reactions spans a broad range and, accordingly, the dominant reaction mechanism varies across the energy region from quasi-elastic scattering through nucleon resonance excitations to deep inelastic scattering. This corresponds to transitions of the effective degree of freedom for theoretical description from nucleons through meson-baryon to quarks. The main purpose of this review is to report our recent efforts towards a unified description of the neutrino-nucleus reactions over the wide energy range; recent overall progress in the field is also sketched. Starting with an overview of the current status of neutrino-nucleus scattering experiments, we formulate the cross section to be commonly used for the reactions over all the energy regions. A description of the neutrino-nucleon reactions follows and, in particular, a dynamical coupled-channels model for meson productions in and beyond the $Δ$(1232) region is discussed in detail. We then discuss the neutrino-nucleus reactions, putting emphasis on our theoretical approaches. We start the discussion with electroweak processes in few-nucleon systems studied with the correlated Gaussian method. Then we describe quasi-elastic scattering with nuclear spectral functions, and meson productions with a $Δ$-hole model. Nuclear modifications of the parton distribution functions determined through a global analysis are also discussed. Finally, we discuss issues to be addressed for future developments.
△ Less
Submitted 11 April, 2017; v1 submitted 4 October, 2016;
originally announced October 2016.
-
Neutrino Induced 4He Break-up Reaction -- Application of the Maximum Entropy Method in Calculating Nuclear Strength Function
Authors:
T. Murata,
W. Horiuchi,
T. Sato,
S. X. Nakamura
Abstract:
The maximum entropy method is examined as a new tool for solving the ill-posed inversion problem involved in the Lorentz integral transformation (LIT) method. As an example, we apply the method to the spin-dipole strength function of 4He. We show that the method can be successfully used for inversion of LIT, provided the LIT function is available with a sufficient accuracy.
The maximum entropy method is examined as a new tool for solving the ill-posed inversion problem involved in the Lorentz integral transformation (LIT) method. As an example, we apply the method to the spin-dipole strength function of 4He. We show that the method can be successfully used for inversion of LIT, provided the LIT function is available with a sufficient accuracy.
△ Less
Submitted 11 April, 2016;
originally announced April 2016.
-
Stochastic dual averaging methods using variance reduction techniques for regularized empirical risk minimization problems
Authors:
Tomoya Murata,
Taiji Suzuki
Abstract:
We consider a composite convex minimization problem associated with regularized empirical risk minimization, which often arises in machine learning. We propose two new stochastic gradient methods that are based on stochastic dual averaging method with variance reduction. Our methods generate a sparser solution than the existing methods because we do not need to take the average of the history of t…
▽ More
We consider a composite convex minimization problem associated with regularized empirical risk minimization, which often arises in machine learning. We propose two new stochastic gradient methods that are based on stochastic dual averaging method with variance reduction. Our methods generate a sparser solution than the existing methods because we do not need to take the average of the history of the solutions. This is favorable in terms of both interpretability and generalization. Moreover, our methods have theoretical support for both a strongly and a non-strongly convex regularizer and achieve the best known convergence rates among existing nonaccelerated stochastic gradient methods.
△ Less
Submitted 8 March, 2016;
originally announced March 2016.
-
Extraction of Neutrino Flux from the Inclusive Muon Cross Section
Authors:
Tomoya Murata,
Toru Sato
Abstract:
We have studied a method to extract neutrino flux from the data of neutrino-nucleus reaction by using maximum entropy method. We demonstrate a promising example to extract neutrino flux from the inclusive cross section of muon production without selecting a particular reaction process such as quasi-elastic nucleon knockout.
We have studied a method to extract neutrino flux from the data of neutrino-nucleus reaction by using maximum entropy method. We demonstrate a promising example to extract neutrino flux from the inclusive cross section of muon production without selecting a particular reaction process such as quasi-elastic nucleon knockout.
△ Less
Submitted 23 January, 2015;
originally announced January 2015.
-
Detecting network communities beyond assortativity-related attributes
Authors:
Xin Liu,
Tsuyoshi Murata,
Ken Wakita
Abstract:
In network science, assortativity refers to the tendency of links to exist between nodes with similar attributes. In social networks, for example, links tend to exist between individuals of similar age, nationality, location, race, income, educational level, religious belief, and language. Thus, various attributes jointly affect the network topology. An interesting problem is to detect community s…
▽ More
In network science, assortativity refers to the tendency of links to exist between nodes with similar attributes. In social networks, for example, links tend to exist between individuals of similar age, nationality, location, race, income, educational level, religious belief, and language. Thus, various attributes jointly affect the network topology. An interesting problem is to detect community structure beyond some specific assortativity-related attributes $ρ$, i.e., to take out the effect of $ρ$ on network topology and reveal the hidden community structure which are due to other attributes. An approach to this problem is to redefine the null model of the modularity measure, so as to simulate the effect of $ρ$ on network topology. However, a challenge is that we do not know to what extent the network topology is affected by $ρ$ and by other attributes. In this paper, we propose Dist-Modularity which allows us to freely choose any suitable function to simulate the effect of $ρ$. Such freedom can help us probe the effect of $ρ$ and detect the hidden communities which are due to other attributes. We test the effectiveness of Dist-Modularity on synthetic benchmarks and two real-world networks.
△ Less
Submitted 14 July, 2014;
originally announced July 2014.