Search | arXiv e-print repository

State-Free Inference of State-Space Models: The Transfer Function Approach

Authors: Rom N. Parnichkun, Stefano Massaroli, Alessandro Moro, Jimmy T. H. Smith, Ramin Hasani, Mathias Lechner, Qi An, Christopher Ré, Hajime Asama, Stefano Ermon, Taiji Suzuki, Atsushi Yamashita, Michael Poli

Abstract: We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of… ▽ More We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of the proposed frequency domain transfer function parametrization, which enables direct computation of its corresponding convolutional kernel's spectrum via a single Fast Fourier Transform. Our experimental results across multiple sequence lengths and state sizes illustrates, on average, a 35% training speed improvement over S4 layers -- parametrized in time-domain -- on the Long Range Arena benchmark, while delivering state-of-the-art downstream performances over other attention-free approaches. Moreover, we report improved perplexity in language modeling over a long convolutional Hyena baseline, by simply introducing our transfer function parametrization. Our code is available at https://github.com/ruke1ire/RTF. △ Less

Submitted 1 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Resubmission 02/06/2024: Fixed minor typo of recurrent form RTF

arXiv:2307.16160 [pdf, ps, other]

Motion Degeneracy in Self-supervised Learning of Elevation Angle Estimation for 2D Forward-Looking Sonar

Authors: Yusheng Wang, Yonghoon Ji, Chujie Wu, Hiroshi Tsuchiya, Hajime Asama, Atsushi Yamashita

Abstract: 2D forward-looking sonar is a crucial sensor for underwater robotic perception. A well-known problem in this field is estimating missing information in the elevation direction during sonar imaging. There are demands to estimate 3D information per image for 3D map** and robot navigation during fly-through missions. Recent learning-based methods have demonstrated their strengths, but there are sti… ▽ More 2D forward-looking sonar is a crucial sensor for underwater robotic perception. A well-known problem in this field is estimating missing information in the elevation direction during sonar imaging. There are demands to estimate 3D information per image for 3D map** and robot navigation during fly-through missions. Recent learning-based methods have demonstrated their strengths, but there are still drawbacks. Supervised learning methods have achieved high-quality results but may require further efforts to acquire 3D ground-truth labels. The existing self-supervised method requires pretraining using synthetic images with 3D supervision. This study aims to realize stable self-supervised learning of elevation angle estimation without pretraining using synthetic images. Failures during self-supervised learning may be caused by motion degeneracy problems. We first analyze the motion field of 2D forward-looking sonar, which is related to the main supervision signal. We utilize a modern learning framework and prove that if the training dataset is built with effective motions, the network can be trained in a self-supervised manner without the knowledge of synthetic data. Both simulation and real experiments validate the proposed method. △ Less

Submitted 31 July, 2023; v1 submitted 30 July, 2023; originally announced July 2023.

Comments: IROS2023

arXiv:2304.08146 [pdf, ps, other]

2D Forward Looking Sonar Simulation with Ground Echo Modeling

Authors: Yusheng Wang, Chujie Wu, Yonghoon Ji, Hiroshi Tsuchiya, Hajime Asama, Atsushi Yamashita

Abstract: Imaging sonar produces clear images in underwater environments, independent of water turbidity and lighting conditions. The next generation 2D forward looking sonars are compact in size and able to generate high-resolution images which facilitate underwater robotics research. Considering the difficulties and expenses of implementing experiments in underwater environments, tremendous work has been… ▽ More Imaging sonar produces clear images in underwater environments, independent of water turbidity and lighting conditions. The next generation 2D forward looking sonars are compact in size and able to generate high-resolution images which facilitate underwater robotics research. Considering the difficulties and expenses of implementing experiments in underwater environments, tremendous work has been focused on sonar image simulation. However, sonar artifacts like multi-path reflection were not sufficiently discussed, which cannot be ignored in water tank environments. In this paper, we focus on the influence of echoes from the flat ground. We propose a method to simulate the ground echo effect physically in acoustic images. We model the multi-bounce situations using the single-bounce framework for computation efficiency. We compare the real image captured in the water tank with the synthetic images to validate the proposed methods. △ Less

Submitted 24 February, 2024; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: Final version of UR2023

arXiv:2208.00233 [pdf, other]

Learning Pseudo Front Depth for 2D Forward-Looking Sonar-based Multi-view Stereo

Authors: Yusheng Wang, Yonghoon Ji, Hiroshi Tsuchiya, Hajime Asama, Atsushi Yamashita

Abstract: Retrieving the missing dimension information in acoustic images from 2D forward-looking sonar is a well-known problem in the field of underwater robotics. There are works attempting to retrieve 3D information from a single image which allows the robot to generate 3D maps with fly-through motion. However, owing to the unique image formulation principle, estimating 3D information from a single image… ▽ More Retrieving the missing dimension information in acoustic images from 2D forward-looking sonar is a well-known problem in the field of underwater robotics. There are works attempting to retrieve 3D information from a single image which allows the robot to generate 3D maps with fly-through motion. However, owing to the unique image formulation principle, estimating 3D information from a single image faces severe ambiguity problems. Classical methods of multi-view stereo can avoid the ambiguity problems, but may require a large number of viewpoints to generate an accurate model. In this work, we propose a novel learning-based multi-view stereo method to estimate 3D information. To better utilize the information from multiple frames, an elevation plane swee** method is proposed to generate the depth-azimuth-elevation cost volume. The volume after regularization can be considered as a probabilistic volumetric representation of the target. Instead of performing regression on the elevation angles, we use pseudo front depth from the cost volume to represent the 3D information which can avoid the 2D-3D problem in acoustic imaging. High-accuracy results can be generated with only two or three images. Synthetic datasets were generated to simulate various underwater targets. We also built the first real dataset with accurate ground truth in a large scale water tank. Experimental results demonstrate the superiority of our method, compared to other state-of-the-art methods. △ Less

Submitted 30 July, 2022; originally announced August 2022.

Comments: Accepted at IROS 2022

arXiv:2106.11581 [pdf, other]

Continuous-Depth Neural Models for Dynamic Graph Prediction

Authors: Michael Poli, Stefano Massaroli, Clayton M. Rabideau, Junyoung Park, Atsushi Yamashita, Hajime Asama, **kyoo Park

Abstract: We introduce the framework of continuous-depth graph neural networks (GNNs). Neural graph differential equations (Neural GDEs) are formalized as the counterpart to GNNs where the input-output relationship is determined by a continuum of GNN layers, blending discrete topological structures and differential equations. The proposed framework is shown to be compatible with static GNN models and is ext… ▽ More We introduce the framework of continuous-depth graph neural networks (GNNs). Neural graph differential equations (Neural GDEs) are formalized as the counterpart to GNNs where the input-output relationship is determined by a continuum of GNN layers, blending discrete topological structures and differential equations. The proposed framework is shown to be compatible with static GNN models and is extended to dynamic and stochastic settings through hybrid dynamical system theory. Here, Neural GDEs improve performance by exploiting the underlying dynamics geometry, further introducing the ability to accommodate irregularly sampled data. Results prove the effectiveness of the proposed models across applications, such as traffic forecasting or prediction in genetic regulatory networks. △ Less

Submitted 22 June, 2021; originally announced June 2021.

Comments: Extended version of the workshop paper "Graph Neural Ordinary Differential Equations". arXiv admin note: substantial text overlap with arXiv:1911.07532

arXiv:2106.04165 [pdf, other]

Neural Hybrid Automata: Learning Dynamics with Multiple Modes and Stochastic Transitions

Authors: Michael Poli, Stefano Massaroli, Luca Scimeca, Seong Joon Oh, Sanghyuk Chun, Atsushi Yamashita, Hajime Asama, **kyoo Park, Animesh Garg

Abstract: Effective control and prediction of dynamical systems often require appropriate handling of continuous-time and discrete, event-triggered processes. Stochastic hybrid systems (SHSs), common across engineering domains, provide a formalism for dynamical systems subject to discrete, possibly stochastic, state jumps and multi-modal continuous-time flows. Despite the versatility and importance of SHSs… ▽ More Effective control and prediction of dynamical systems often require appropriate handling of continuous-time and discrete, event-triggered processes. Stochastic hybrid systems (SHSs), common across engineering domains, provide a formalism for dynamical systems subject to discrete, possibly stochastic, state jumps and multi-modal continuous-time flows. Despite the versatility and importance of SHSs across applications, a general procedure for the explicit learning of both discrete events and multi-mode continuous dynamics remains an open problem. This work introduces Neural Hybrid Automata (NHAs), a recipe for learning SHS dynamics without a priori knowledge on the number of modes and inter-modal transition dynamics. NHAs provide a systematic inference method based on normalizing flows, neural differential equations and self-supervision. We showcase NHAs on several tasks, including mode recovery and flow learning in systems with stochastic transitions, and end-to-end learning of hierarchical robot controllers. △ Less

Submitted 8 June, 2021; originally announced June 2021.

arXiv:2106.03885 [pdf, other]

Differentiable Multiple Shooting Layers

Authors: Stefano Massaroli, Michael Poli, Sho Sonoda, Taji Suzuki, **kyoo Park, Atsushi Yamashita, Hajime Asama

Abstract: We detail a novel class of implicit neural models. Leveraging time-parallel methods for differential equations, Multiple Shooting Layers (MSLs) seek solutions of initial value problems via parallelizable root-finding algorithms. MSLs broadly serve as drop-in replacements for neural ordinary differential equations (Neural ODEs) with improved efficiency in number of function evaluations (NFEs) and w… ▽ More We detail a novel class of implicit neural models. Leveraging time-parallel methods for differential equations, Multiple Shooting Layers (MSLs) seek solutions of initial value problems via parallelizable root-finding algorithms. MSLs broadly serve as drop-in replacements for neural ordinary differential equations (Neural ODEs) with improved efficiency in number of function evaluations (NFEs) and wall-clock inference time. We develop the algorithmic framework of MSLs, analyzing the different choices of solution methods from a theoretical and computational perspective. MSLs are showcased in long horizon optimal control of ODEs and PDEs and as latent models for sequence generation. Finally, we investigate the speedups obtained through application of MSL inference in neural controlled differential equations (Neural CDEs) for time series classification of medical data. △ Less

Submitted 7 June, 2021; originally announced June 2021.

arXiv:2106.03780 [pdf, other]

doi 10.1109/LCSYS.2021.3086672.

Learning Stochastic Optimal Policies via Gradient Descent

Authors: Stefano Massaroli, Michael Poli, Stefano Peluchetti, **kyoo Park, Atsushi Yamashita, Hajime Asama

Abstract: We systematically develop a learning-based treatment of stochastic optimal control (SOC), relying on direct optimization of parametric control policies. We propose a derivation of adjoint sensitivity results for stochastic differential equations through direct application of variational calculus. Then, given an objective function for a predetermined task specifying the desiderata for the controlle… ▽ More We systematically develop a learning-based treatment of stochastic optimal control (SOC), relying on direct optimization of parametric control policies. We propose a derivation of adjoint sensitivity results for stochastic differential equations through direct application of variational calculus. Then, given an objective function for a predetermined task specifying the desiderata for the controller, we optimize their parameters via iterative gradient descent methods. In doing so, we extend the range of applicability of classical SOC techniques, often requiring strict assumptions on the functional form of system and control. We verify the performance of the proposed approach on a continuous-time, finite horizon portfolio optimization with proportional transaction costs. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Journal ref: IEEE Control Systems Letters, 2021

arXiv:2101.06563 [pdf, other]

doi 10.1080/01691864.2020.1869586

Stereo Camera Visual SLAM with Hierarchical Masking and Motion-state Classification at Outdoor Construction Sites Containing Large Dynamic Objects

Authors: Runqiu Bao, Ren Komatsu, Renato Miyagusuku, Masaki Chino, Atsushi Yamashita, Hajime Asama

Abstract: At modern construction sites, utilizing GNSS (Global Navigation Satellite System) to measure the real-time location and orientation (i.e. pose) of construction machines and navigate them is very common. However, GNSS is not always available. Replacing GNSS with on-board cameras and visual simultaneous localization and map** (visual SLAM) to navigate the machines is a cost-effective solution. Nev… ▽ More At modern construction sites, utilizing GNSS (Global Navigation Satellite System) to measure the real-time location and orientation (i.e. pose) of construction machines and navigate them is very common. However, GNSS is not always available. Replacing GNSS with on-board cameras and visual simultaneous localization and map** (visual SLAM) to navigate the machines is a cost-effective solution. Nevertheless, at construction sites, multiple construction machines will usually work together and side-by-side, causing large dynamic occlusions in the cameras' view. Standard visual SLAM cannot handle large dynamic occlusions well. In this work, we propose a motion segmentation method to efficiently extract static parts from crowded dynamic scenes to enable robust tracking of camera ego-motion. Our method utilizes semantic information combined with object-level geometric constraints to quickly detect the static parts of the scene. Then, we perform a two-step coarse-to-fine ego-motion tracking with reference to the static parts. This leads to a novel dynamic visual SLAM formation. We test our proposals through a real implementation based on ORB-SLAM2, and datasets we collected from real construction sites. The results show that when standard visual SLAM fails, our method can still retain accurate camera ego-motion tracking in real-time. Comparing to state-of-the-art dynamic visual SLAM methods, ours shows outstanding efficiency and competitive result trajectory accuracy. △ Less

Submitted 16 January, 2021; originally announced January 2021.

Comments: This is an Accepted Manuscript of an article published by Taylor & Francis in Advanced Robotics on Jan. 11th, 2021, available online: https://www.tandfonline.com/doi/full/10.1080/01691864.2020.1869586 [Article DOI:10.1080/01691864.2020.1869586]

Journal ref: Advanced Robotics (2021) 1-14

arXiv:2101.05537 [pdf, other]

Optimal Energy Sha** via Neural Approximators

Authors: Stefano Massaroli, Michael Poli, Federico Califano, **kyoo Park, Atsushi Yamashita, Hajime Asama

Abstract: We introduce optimal energy sha** as an enhancement of classical passivity-based control methods. A promising feature of passivity theory, alongside stability, has traditionally been claimed to be intuitive performance tuning along the execution of a given task. However, a systematic approach to adjust performance within a passive control framework has yet to be developed, as each method relies… ▽ More We introduce optimal energy sha** as an enhancement of classical passivity-based control methods. A promising feature of passivity theory, alongside stability, has traditionally been claimed to be intuitive performance tuning along the execution of a given task. However, a systematic approach to adjust performance within a passive control framework has yet to be developed, as each method relies on few and problem-specific practical insights. Here, we cast the classic energy-sha** control design process in an optimal control framework; once a task-dependent performance metric is defined, an optimal solution is systematically obtained through an iterative procedure relying on neural networks and gradient-based optimization. The proposed method is validated on state-regulation tasks. △ Less

Submitted 14 January, 2021; originally announced January 2021.

arXiv:2011.03716 [pdf, other]

doi 10.1109/LCSYS.2020.3042827

Data-Driven Koopman Controller Synthesis Based on the Extended $\mathcal{H}_2$ Norm Characterization

Authors: Daisuke Uchida, Atsushi Yamashita, Hajime Asama

Abstract: This paper presents a new data-driven controller synthesis based on the Koopman operator and the extended $\mathcal{H}_2$ norm characterization of discrete-time linear systems. We model dynamical systems as polytope sets which are derived from multiple data-driven linear models obtained by the finite approximation of the Koopman operator and then used to design robust feedback controllers combined… ▽ More This paper presents a new data-driven controller synthesis based on the Koopman operator and the extended $\mathcal{H}_2$ norm characterization of discrete-time linear systems. We model dynamical systems as polytope sets which are derived from multiple data-driven linear models obtained by the finite approximation of the Koopman operator and then used to design robust feedback controllers combined with the $\mathcal{H}_2$ norm characterization. The use of the $\mathcal{H}_2$ norm characterization is aimed to deal with the model uncertainty that arises due to the nature of the data-driven setting of the problem. The effectiveness of the proposed controller synthesis is investigated through numerical simulations. △ Less

Submitted 7 November, 2020; originally announced November 2020.

Comments: 6 pages, 6 figures

MSC Class: 93-08

Journal ref: IEEE Control Systems Letters, Vol. 5, No. 5, pp. 1795-1800, 2021

arXiv:2009.09346 [pdf, other]

TorchDyn: A Neural Differential Equations Library

Authors: Michael Poli, Stefano Massaroli, Atsushi Yamashita, Hajime Asama, **kyoo Park

Abstract: Continuous-depth learning has recently emerged as a novel perspective on deep learning, improving performance in tasks related to dynamical systems and density estimation. Core to these approaches is the neural differential equation, whose forward passes are the solutions of an initial value problem parametrized by a neural network. Unlocking the full potential of continuous-depth models requires… ▽ More Continuous-depth learning has recently emerged as a novel perspective on deep learning, improving performance in tasks related to dynamical systems and density estimation. Core to these approaches is the neural differential equation, whose forward passes are the solutions of an initial value problem parametrized by a neural network. Unlocking the full potential of continuous-depth models requires a different set of software tools, due to peculiar differences compared to standard discrete neural networks, e.g inference must be carried out via numerical solvers. We introduce TorchDyn, a PyTorch library dedicated to continuous-depth learning, designed to elevate neural differential equations to be as accessible as regular plug-and-play deep learning primitives. This objective is achieved by identifying and subdividing different variants into common essential components, which can be combined and freely repurposed to obtain complex compositional architectures. TorchDyn further offers step-by-step tutorials and benchmarks designed to guide researchers and contributors. △ Less

Submitted 19 September, 2020; originally announced September 2020.

arXiv:2007.09601 [pdf, other]

Hypersolvers: Toward Fast Continuous-Depth Models

Authors: Michael Poli, Stefano Massaroli, Atsushi Yamashita, Hajime Asama, **kyoo Park

Abstract: The infinite-depth paradigm pioneered by Neural ODEs has launched a renaissance in the search for novel dynamical system-inspired deep learning primitives; however, their utilization in problems of non-trivial size has often proved impossible due to poor computational scalability. This work paves the way for scalable Neural ODEs with time-to-prediction comparable to traditional discrete networks.… ▽ More The infinite-depth paradigm pioneered by Neural ODEs has launched a renaissance in the search for novel dynamical system-inspired deep learning primitives; however, their utilization in problems of non-trivial size has often proved impossible due to poor computational scalability. This work paves the way for scalable Neural ODEs with time-to-prediction comparable to traditional discrete networks. We introduce hypersolvers, neural networks designed to solve ODEs with low overhead and theoretical guarantees on accuracy. The synergistic combination of hypersolvers and Neural ODEs allows for cheap inference and unlocks a new frontier for practical application of continuous-depth models. Experimental evaluations on standard benchmarks, such as sampling for continuous normalizing flows, reveal consistent pareto efficiency over classical numerical methods. △ Less

Submitted 29 December, 2020; v1 submitted 19 July, 2020; originally announced July 2020.

arXiv:2007.06891 [pdf, other]

doi 10.1109/IROS45743.2020.9340981

360$^\circ$ Depth Estimation from Multiple Fisheye Images with Origami Crown Representation of Icosahedron

Authors: Ren Komatsu, Hiromitsu Fujii, Yusuke Tamura, Atsushi Yamashita, Hajime Asama

Abstract: In this study, we present a method for all-around depth estimation from multiple omnidirectional images for indoor environments. In particular, we focus on plane-swee** stereo as the method for depth estimation from the images. We propose a new icosahedron-based representation and ConvNets for omnidirectional images, which we name "CrownConv" because the representation resembles a crown made of… ▽ More In this study, we present a method for all-around depth estimation from multiple omnidirectional images for indoor environments. In particular, we focus on plane-swee** stereo as the method for depth estimation from the images. We propose a new icosahedron-based representation and ConvNets for omnidirectional images, which we name "CrownConv" because the representation resembles a crown made of origami. CrownConv can be applied to both fisheye images and equirectangular images to extract features. Furthermore, we propose icosahedron-based spherical swee** for generating the cost volume on an icosahedron from the extracted features. The cost volume is regularized using the three-dimensional CrownConv, and the final depth is obtained by depth regression from the cost volume. Our proposed method is robust to camera alignments by using the extrinsic camera parameters; therefore, it can achieve precise depth estimation even when the camera alignment differs from that in the training dataset. We evaluate the proposed model on synthetic datasets and demonstrate its effectiveness. As our proposed method is computationally efficient, the depth is estimated from four fisheye images in less than a second using a laptop with a GPU. Therefore, it is suitable for real-world robotics applications. Our source code is available at https://github.com/matsuren/crownconv360depth. △ Less

Submitted 14 July, 2020; originally announced July 2020.

Comments: 8 pages, Accepted to the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2020). For supplementary video, see https://youtu.be/_vVD-zDMvyM

arXiv:2003.08063 [pdf, other]

Stable Neural Flows

Authors: Stefano Massaroli, Michael Poli, Michelangelo Bin, **kyoo Park, Atsushi Yamashita, Hajime Asama

Abstract: We introduce a provably stable variant of neural ordinary differential equations (neural ODEs) whose trajectories evolve on an energy functional parametrised by a neural network. Stable neural flows provide an implicit guarantee on asymptotic stability of the depth-flows, leading to robustness against input perturbations and low computational burden for the numerical solver. The learning procedure… ▽ More We introduce a provably stable variant of neural ordinary differential equations (neural ODEs) whose trajectories evolve on an energy functional parametrised by a neural network. Stable neural flows provide an implicit guarantee on asymptotic stability of the depth-flows, leading to robustness against input perturbations and low computational burden for the numerical solver. The learning procedure is cast as an optimal control problem, and an approximate solution is proposed based on adjoint sensivity analysis. We further introduce novel regularizers designed to ease the optimization process and speed up convergence. The proposed model class is evaluated on non-linear classification and function approximation tasks. △ Less

Submitted 18 March, 2020; originally announced March 2020.

arXiv:2002.08071 [pdf, other]

Dissecting Neural ODEs

Authors: Stefano Massaroli, Michael Poli, **kyoo Park, Atsushi Yamashita, Hajime Asama

Abstract: Continuous deep learning architectures have recently re-emerged as Neural Ordinary Differential Equations (Neural ODEs). This infinite-depth approach theoretically bridges the gap between deep learning and dynamical systems, offering a novel perspective. However, deciphering the inner working of these models is still an open challenge, as most applications apply them as generic black-box modules.… ▽ More Continuous deep learning architectures have recently re-emerged as Neural Ordinary Differential Equations (Neural ODEs). This infinite-depth approach theoretically bridges the gap between deep learning and dynamical systems, offering a novel perspective. However, deciphering the inner working of these models is still an open challenge, as most applications apply them as generic black-box modules. In this work we "open the box", further develo** the continuous-depth formulation with the aim of clarifying the influence of several design choices on the underlying dynamics. △ Less

Submitted 11 January, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

arXiv:1911.07532 [pdf, other]

Graph Neural Ordinary Differential Equations

Authors: Michael Poli, Stefano Massaroli, Junyoung Park, Atsushi Yamashita, Hajime Asama, **kyoo Park

Abstract: We introduce the framework of continuous--depth graph neural networks (GNNs). Graph neural ordinary differential equations (GDEs) are formalized as the counterpart to GNNs where the input-output relationship is determined by a continuum of GNN layers, blending discrete topological structures and differential equations. The proposed framework is shown to be compatible with various static and autore… ▽ More We introduce the framework of continuous--depth graph neural networks (GNNs). Graph neural ordinary differential equations (GDEs) are formalized as the counterpart to GNNs where the input-output relationship is determined by a continuum of GNN layers, blending discrete topological structures and differential equations. The proposed framework is shown to be compatible with various static and autoregressive GNN models. Results prove general effectiveness of GDEs: in static settings they offer computational advantages by incorporating numerical methods in their forward pass; in dynamic settings, on the other hand, they are shown to improve performance by exploiting the geometry of the underlying dynamics. △ Less

Submitted 22 June, 2021; v1 submitted 18 November, 2019; originally announced November 2019.

Comments: Accepted [Spotlight] at the AAAI workshop DLGMA20. For the extended version, see "Continuous-Depth Neural Models for Dynamic Graph Prediction"

arXiv:1909.02702 [pdf, other]

Port-Hamiltonian Approach to Neural Network Training

Authors: Stefano Massaroli, Michael Poli, Federico Califano, Angela Faragasso, **kyoo Park, Atsushi Yamashita, Hajime Asama

Abstract: Neural networks are discrete entities: subdivided into discrete layers and parametrized by weights which are iteratively optimized via difference equations. Recent work proposes networks with layer outputs which are no longer quantized but are solutions of an ordinary differential equation (ODE); however, these networks are still optimized via discrete methods (e.g. gradient descent). In this pape… ▽ More Neural networks are discrete entities: subdivided into discrete layers and parametrized by weights which are iteratively optimized via difference equations. Recent work proposes networks with layer outputs which are no longer quantized but are solutions of an ordinary differential equation (ODE); however, these networks are still optimized via discrete methods (e.g. gradient descent). In this paper, we explore a different direction: namely, we propose a novel framework for learning in which the parameters themselves are solutions of ODEs. By viewing the optimization process as the evolution of a port-Hamiltonian system, we can ensure convergence to a minimum of the objective function. Numerical experiments have been performed to show the validity and effectiveness of the proposed methods. △ Less

Submitted 5 September, 2019; originally announced September 2019.

Comments: To appear in the Proceedings of the 58th IEEE Conference on Decision and Control (CDC 2019). The first two authors contributed equally to the work

Showing 1–18 of 18 results for author: Asama, H