Search | arXiv e-print repository

Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches

Authors: Qing Yu, Mikihiro Tanaka, Kent Fujiwara

Abstract: To build a cross-modal latent space between 3D human motion and language, acquiring large-scale and high-quality human motion data is crucial. However, unlike the abundance of image data, the scarcity of motion data has limited the performance of existing motion-language models. To counter this, we introduce "motion patches", a new representation of motion sequences, and propose using Vision Trans… ▽ More To build a cross-modal latent space between 3D human motion and language, acquiring large-scale and high-quality human motion data is crucial. However, unlike the abundance of image data, the scarcity of motion data has limited the performance of existing motion-language models. To counter this, we introduce "motion patches", a new representation of motion sequences, and propose using Vision Transformers (ViT) as motion encoders via transfer learning, aiming to extract useful knowledge from the image domain and apply it to the motion domain. These motion patches, created by dividing and sorting skeleton joints based on body parts in motion sequences, are robust to varying skeleton structures, and can be regarded as color image patches in ViT. We find that transfer learning with pre-trained weights of ViT obtained through training with 2D image data can boost the performance of motion analysis, presenting a promising direction for addressing the issue of limited motion data. Our extensive experiments show that the proposed motion patches, used jointly with ViT, achieve state-of-the-art performance in the benchmarks of text-to-motion retrieval, and other novel challenging tasks, such as cross-skeleton recognition, zero-shot motion classification, and human interaction recognition, which are currently impeded by the lack of data. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Accepted to CVPR 2024, Project website: https://yu1ut.com/MotionPatches-HP/

arXiv:2401.16755 [pdf, other]

Diffusion model for relational inference

Authors: Shuhan Zheng, Ziqiang Li, Kantaro Fujiwara, Gouhei Tanaka

Abstract: Dynamical behaviors of complex interacting systems, including brain activities, financial price movements, and physical collective phenomena, are associated with underlying interactions between the system's components. The issue of uncovering interaction relations in such systems using observable dynamics is called relational inference. In this study, we propose a Diffusion model for Relational In… ▽ More Dynamical behaviors of complex interacting systems, including brain activities, financial price movements, and physical collective phenomena, are associated with underlying interactions between the system's components. The issue of uncovering interaction relations in such systems using observable dynamics is called relational inference. In this study, we propose a Diffusion model for Relational Inference (DiffRI), inspired by a self-supervised method for probabilistic time series imputation. DiffRI learns to infer the probability of the presence of connections between components through conditional diffusion modeling. △ Less

Submitted 20 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

arXiv:2307.01237 [pdf, other]

Dynamical Graph Echo State Networks with Snapshot Merging for Dissemination Process Classification

Authors: Ziqiang Li, Kantaro Fujiwara, Gouhei Tanaka

Abstract: The Dissemination Process Classification (DPC) is a popular application of temporal graph classification. The aim of DPC is to classify different spreading patterns of information or pestilence within a community represented by discrete-time temporal graphs. Recently, a reservoir computing-based model named Dynamical Graph Echo State Network (DynGESN) has been proposed for processing temporal grap… ▽ More The Dissemination Process Classification (DPC) is a popular application of temporal graph classification. The aim of DPC is to classify different spreading patterns of information or pestilence within a community represented by discrete-time temporal graphs. Recently, a reservoir computing-based model named Dynamical Graph Echo State Network (DynGESN) has been proposed for processing temporal graphs with relatively high effectiveness and low computational costs. In this study, we propose a novel model which combines a novel data augmentation strategy called snapshot merging with the DynGESN for dealing with DPC tasks. In our model, the snapshot merging strategy is designed for forming new snapshots by merging neighboring snapshots over time, and then multiple reservoir encoders are set for capturing spatiotemporal features from merged snapshots. After those, the logistic regression is adopted for decoding the sum-pooled embeddings into the classification results. Experimental results on six benchmark DPC datasets show that our proposed model has better classification performances than the DynGESN and several kernel-based models. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2203.03103 [pdf, other]

Prediction of transport property via machine learning molecular movements

Authors: Ikki Yasuda, Yusei Kobayashi, Katsuhiro Endo, Yoshihiro Hayakawa, Kazuhiko Fujiwara, Kuniaki Yajima, Noriyoshi Arai, Kenji Yasuoka

Abstract: Molecular dynamics (MD) simulations are increasingly being combined with machine learning (ML) to predict material properties. The molecular configurations obtained from MD are represented by multiple features, such as thermodynamic properties, and are used as the ML input. However, to accurately find the input--output patterns, ML requires a sufficiently sized dataset that depends on the complexi… ▽ More Molecular dynamics (MD) simulations are increasingly being combined with machine learning (ML) to predict material properties. The molecular configurations obtained from MD are represented by multiple features, such as thermodynamic properties, and are used as the ML input. However, to accurately find the input--output patterns, ML requires a sufficiently sized dataset that depends on the complexity of the ML model. Generating such a large dataset from MD simulations is not ideal because of their high computation cost. In this study, we present a simple supervised ML method to predict the transport properties of materials. To simplify the model, an unsupervised ML method obtains an efficient representation of molecular movements. This method was applied to predict the viscosity of lubricant molecules in confinement with shear flow. Furthermore, simplicity facilitates the interpretation of the model to understand the molecular mechanics of viscosity. We revealed two types of molecular mechanisms that contribute to low viscosity. △ Less

Submitted 6 March, 2022; originally announced March 2022.

arXiv:2007.13354 [pdf]

doi 10.1016/j.aca.2019.08.064

Feature visualization of Raman spectrum analysis with deep convolutional neural network

Authors: Masashi Fukuhara, Kazuhiko Fujiwara, Yoshihiro Maruyama, Hiroyasu Itoh

Abstract: We demonstrate a recognition and feature visualization method that uses a deep convolutional neural network for Raman spectrum analysis. The visualization is achieved by calculating important regions in the spectra from weights in pooling and fully-connected layers. The method is first examined for simple Lorentzian spectra, then applied to the spectra of pharmaceutical compounds and numerically m… ▽ More We demonstrate a recognition and feature visualization method that uses a deep convolutional neural network for Raman spectrum analysis. The visualization is achieved by calculating important regions in the spectra from weights in pooling and fully-connected layers. The method is first examined for simple Lorentzian spectra, then applied to the spectra of pharmaceutical compounds and numerically mixed amino acids. We investigate the effects of the size and number of convolution filters on the extracted regions for Raman-peak signals using the Lorentzian spectra. It is confirmed that the Raman peak contributes to the recognition by visualizing the extracted features. A near-zero weight value is obtained at the background level region, which appears to be used for baseline correction. Common component extraction is confirmed by an evaluation of numerically mixed amino acid spectra. High weight values at the common peaks and negative values at the distinctive peaks appear, even though the model is given one-hot vectors as the training labels (without a mix ratio). This proposed method is potentially suitable for applications such as the validation of trained models, ensuring the reliability of common component extraction from compound samples for spectral analysis. △ Less

Submitted 27 July, 2020; originally announced July 2020.

Journal ref: Analytica Chimica Acta, Volume 1087, 9 December 2019, Pages 11-19

arXiv:1809.04820 [pdf, other]

Canonical and Compact Point Cloud Representation for Shape Classification

Authors: Kent Fujiwara, Ikuro Sato, Mitsuru Ambai, Yuichi Yoshida, Yoshiaki Sakakura

Abstract: We present a novel compact point cloud representation that is inherently invariant to scale, coordinate change and point permutation. The key idea is to parametrize a distance field around an individual shape into a unique, canonical, and compact vector in an unsupervised manner. We firstly project a distance field to a $4$D canonical space using singular value decomposition. We then train a neura… ▽ More We present a novel compact point cloud representation that is inherently invariant to scale, coordinate change and point permutation. The key idea is to parametrize a distance field around an individual shape into a unique, canonical, and compact vector in an unsupervised manner. We firstly project a distance field to a $4$D canonical space using singular value decomposition. We then train a neural network for each instance to non-linearly embed its distance field into network parameters. We employ a bias-free Extreme Learning Machine (ELM) with ReLU activation units, which has scale-factor commutative property between layers. We demonstrate the descriptiveness of the instance-wise, shape-embedded network parameters by using them to classify shapes in $3$D datasets. Our learning-based representation requires minimal augmentation and simple neural networks, where previous approaches demand numerous representations to handle coordinate change and point permutation. △ Less

Submitted 13 September, 2018; originally announced September 2018.

Comments: 16 pages, 5 figures

arXiv:1808.02320 [pdf, other]

doi 10.11309/jssst.32.1_47

A Survey of Refactoring Detection Techniques Based on Change History Analysis

Authors: Eunjong Choi, Kenji Fujiwara, Norihiro Yoshida, Shinpei Hayashi

Abstract: Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. Not only researchers, but also practitioners, need to know about past refactoring instances performed in a software development project. So far, a number of techniques have been proposed for automatic detection of refactoring instances… ▽ More Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. Not only researchers, but also practitioners, need to know about past refactoring instances performed in a software development project. So far, a number of techniques have been proposed for automatic detection of refactoring instances. Those techniques have been presented in various international conferences and journals, however, it is difficult for researchers and practitioners to grasp the current status of studies on refactoring detection techniques. In this survey paper, we review various refactoring detection techniques, especially techniques based on change history analysis. First, we give the definition and categorization of refactoring detection methods in this paper, and then introduce refactoring detection techniques based on change history analysis. Finally, we discuss possible future research directions for refactoring detection. △ Less

Submitted 7 August, 2018; originally announced August 2018.

Comments: This article is a private translation of the article published in the JSSST journal Computer Software

Journal ref: JSSST journal Computer Software, 32(1):47-59, 2015

Showing 1–7 of 7 results for author: Fujiwara, K