Search | arXiv e-print repository

Top-Down Bayesian Posterior Sampling for Sum-Product Networks

Abstract: Sum-product networks (SPNs) are probabilistic models characterized by exact and fast evaluation of fundamental probabilistic operations. Its superior computational tractability has led to applications in many fields, such as machine learning with time constraints or accuracy requirements and real-time systems. The structural constraints of SPNs supporting fast inference, however, lead to increased… ▽ More Sum-product networks (SPNs) are probabilistic models characterized by exact and fast evaluation of fundamental probabilistic operations. Its superior computational tractability has led to applications in many fields, such as machine learning with time constraints or accuracy requirements and real-time systems. The structural constraints of SPNs supporting fast inference, however, lead to increased learning-time complexity and can be an obstacle to building highly expressive SPNs. This study aimed to develop a Bayesian learning approach that can be efficiently implemented on large-scale SPNs. We derived a new full conditional probability of Gibbs sampling by marginalizing multiple random variables to expeditiously obtain the posterior distribution. The complexity analysis revealed that our sampling algorithm works efficiently even for the largest possible SPN. Furthermore, we proposed a hyperparameter tuning method that balances the diversity of the prior distribution and optimization efficiency in large-scale SPNs. Our method has improved learning-time complexity and demonstrated computational speed tens to more than one hundred times faster and superior predictive performance in numerical experiments on more than 20 datasets. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: KDD 2024

arXiv:2405.16747 [pdf, other]

Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

Authors: Akiyoshi Tomihari, Issei Sato

Abstract: The two-stage fine-tuning (FT) method, linear probing then fine-tuning (LP-FT), consistently outperforms linear probing (LP) and FT alone in terms of accuracy for both in-distribution (ID) and out-of-distribution (OOD) data. This success is largely attributed to the preservation of pre-trained features, achieved through a near-optimal linear head obtained during LP. However, despite the widespread… ▽ More The two-stage fine-tuning (FT) method, linear probing then fine-tuning (LP-FT), consistently outperforms linear probing (LP) and FT alone in terms of accuracy for both in-distribution (ID) and out-of-distribution (OOD) data. This success is largely attributed to the preservation of pre-trained features, achieved through a near-optimal linear head obtained during LP. However, despite the widespread use of large language models, the exploration of complex architectures such as Transformers remains limited. In this paper, we analyze the training dynamics of LP-FT for classification models on the basis of the neural tangent kernel (NTK) theory. Our analysis decomposes the NTK matrix into two components, highlighting the importance of the linear head norm alongside the prediction accuracy at the start of the FT stage. We also observe a significant increase in the linear head norm during LP, stemming from training with the cross-entropy (CE) loss, which effectively minimizes feature changes. Furthermore, we find that this increased norm can adversely affect model calibration, a challenge that can be addressed by temperature scaling. Additionally, we extend our analysis with the NTK to the low-rank adaptation (LoRA) method and validate its effectiveness. Our experiments with a Transformer-based model on natural language processing tasks across multiple benchmarks confirm our theoretical analysis and demonstrate the effectiveness of LP-FT in fine-tuning language models. Code is available at https://github.com/tom4649/lp-ft_ntk. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.05995 [pdf, ps, other]

Absolute zeta functions and periodicity of quantum walks on cycles

Authors: Jirô Akahori, Norio Konno, Iwao Sato, Yuma Tamura

Abstract: The quantum walk is a quantum counterpart of the classical random walk. On the other hand, absolute zeta functions can be considered as zeta functions over $\mathbb{F}_1$. This study presents a connection between quantum walks and absolute zeta functions. In this paper, we focus on Hadamard walks and $3$-state Grover walks on cycle graphs. The Hadamard walks and the Grover walks are typical models… ▽ More The quantum walk is a quantum counterpart of the classical random walk. On the other hand, absolute zeta functions can be considered as zeta functions over $\mathbb{F}_1$. This study presents a connection between quantum walks and absolute zeta functions. In this paper, we focus on Hadamard walks and $3$-state Grover walks on cycle graphs. The Hadamard walks and the Grover walks are typical models of the quantum walks. We consider the periods and zeta functions of such quantum walks. Moreover, we derive the explicit forms of the absolute zeta functions of corresponding zeta functions. Also, it is shown that our zeta functions of quantum walks are absolute automorphic forms. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 17 pages

arXiv:2402.09050 [pdf, other]

End-to-End Training Induces Information Bottleneck through Layer-Role Differentiation: A Comparative Analysis with Layer-wise Training

Authors: Keitaro Sakamoto, Issei Sato

Abstract: End-to-end (E2E) training, optimizing the entire model through error backpropagation, fundamentally supports the advancements of deep learning. Despite its high performance, E2E training faces the problems of memory consumption, parallel computing, and discrepancy with the functionalities of the actual brain. Various alternative methods have been proposed to overcome these difficulties; however, n… ▽ More End-to-end (E2E) training, optimizing the entire model through error backpropagation, fundamentally supports the advancements of deep learning. Despite its high performance, E2E training faces the problems of memory consumption, parallel computing, and discrepancy with the functionalities of the actual brain. Various alternative methods have been proposed to overcome these difficulties; however, no one can yet match the performance of E2E training, thereby falling short in practicality. Furthermore, there is no deep understanding regarding differences in the trained model properties beyond the performance gap. In this paper, we reconsider why E2E training demonstrates a superior performance through a comparison with layer-wise training, a non-E2E method that locally sets errors. On the basis of the observation that E2E training has an advantage in propagating input information, we analyze the information plane dynamics of intermediate representations based on the Hilbert-Schmidt independence criterion (HSIC). The results of our normalized HSIC value analysis reveal the E2E training ability to exhibit different information dynamics across layers, in addition to efficient information propagation. Furthermore, we show that this layer-role differentiation leads to the final representation following the information bottleneck principle. It suggests the need to consider the cooperative interactions between layers, not just the final layer when analyzing the information bottleneck of deep learning. △ Less

Submitted 31 May, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: TMLR2024

arXiv:2402.00280 [pdf, ps, other]

A quantization of interacting particle systems

Authors: Jirô Akahori, Norio Konno, Rikuki Okamoto, Iwao Sato

Abstract: Interacting particle systems studied in this paper are probabilistic cellular automata with nearest-neighbor interaction including the Domany-Kinzel model. A special case of the Domany-Kinzel model is directed percolation. We regard the interacting particle system as a Markov chain on a graph. Then we present a new quantization of the interacting particle system. After that, we introduce a zeta fu… ▽ More Interacting particle systems studied in this paper are probabilistic cellular automata with nearest-neighbor interaction including the Domany-Kinzel model. A special case of the Domany-Kinzel model is directed percolation. We regard the interacting particle system as a Markov chain on a graph. Then we present a new quantization of the interacting particle system. After that, we introduce a zeta function of the quantized model and give its determinant expression. Moreover, we calculate the absolute zeta function of the quantized model for the Domany-Kinzel model. △ Less

Submitted 23 March, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: 16 pages

Journal ref: Quantum Information and Computation, Vol.24, No.3&4, pp.210-226 (2024)

arXiv:2310.17951 [pdf, other]

Understanding Parameter Saliency via Extreme Value Theory

Authors: Shuo Wang, Issei Sato

Abstract: Deep neural networks are being increasingly implemented throughout society in recent years. It is useful to identify which parameters trigger misclassification in diagnosing undesirable model behaviors. The concept of parameter saliency is proposed and used to diagnose convolutional neural networks (CNNs) by ranking convolution filters that may have caused misclassification on the basis of paramet… ▽ More Deep neural networks are being increasingly implemented throughout society in recent years. It is useful to identify which parameters trigger misclassification in diagnosing undesirable model behaviors. The concept of parameter saliency is proposed and used to diagnose convolutional neural networks (CNNs) by ranking convolution filters that may have caused misclassification on the basis of parameter saliency. It is also shown that fine-tuning the top ranking salient filters efficiently corrects misidentification on ImageNet. However, there is still a knowledge gap in terms of understanding why parameter saliency ranking can find the filters inducing misidentification. In this work, we attempt to bridge the gap by analyzing parameter saliency ranking from a statistical viewpoint, namely, extreme value theory. We first show that the existing work implicitly assumes that the gradient norm computed for each filter follows a normal distribution. Then, we clarify the relationship between parameter saliency and the score based on the peaks-over-threshold (POT) method, which is often used to model extreme values. Finally, we reformulate parameter saliency in terms of the POT method, where this reformulation is regarded as statistical anomaly detection and does not require the implicit assumptions of the existing parameter-saliency formulation. Our experimental results demonstrate that our reformulation can detect malicious filters as well. Furthermore, we show that the existing parameter saliency method exhibits a bias against the depth of layers in deep neural networks. In particular, this bias has the potential to inhibit the discovery of filters that cause misidentification in situations where domain shift occurs. In contrast, parameter saliency based on POT shows less of this bias. △ Less

Submitted 5 December, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.06379 [pdf, other]

Initialization Bias of Fourier Neural Operator: Revisiting the Edge of Chaos

Authors: Takeshi Koshizuka, Masahiro Fujisawa, Yusuke Tanaka, Issei Sato

Abstract: This paper investigates the initialization bias of the Fourier neural operator (FNO). A mean-field theory for FNO is established, analyzing the behavior of the random FNO from an \emph{edge of chaos} perspective. We uncover that the forward and backward propagation behaviors exhibit characteristics unique to FNO, induced by mode truncation, while also showcasing similarities to those of densely co… ▽ More This paper investigates the initialization bias of the Fourier neural operator (FNO). A mean-field theory for FNO is established, analyzing the behavior of the random FNO from an \emph{edge of chaos} perspective. We uncover that the forward and backward propagation behaviors exhibit characteristics unique to FNO, induced by mode truncation, while also showcasing similarities to those of densely connected networks. Building upon this observation, we also propose an edge of chaos initialization scheme for FNO to mitigate the negative initialization bias leading to training instability. Experimental results show the effectiveness of our initialization scheme, enabling stable training of deep FNO without skip-connection. △ Less

Submitted 15 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

arXiv:2307.14023 [pdf, other]

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

Authors: Tokio Kajitsuka, Issei Sato

Abstract: Existing analyses of the expressive capacity of Transformer models have required excessively deep layers for data memorization, leading to a discrepancy with the Transformers actually used in practice. This is primarily due to the interpretation of the softmax function as an approximation of the hardmax function. By clarifying the connection between the softmax function and the Boltzmann operator,… ▽ More Existing analyses of the expressive capacity of Transformer models have required excessively deep layers for data memorization, leading to a discrepancy with the Transformers actually used in practice. This is primarily due to the interpretation of the softmax function as an approximation of the hardmax function. By clarifying the connection between the softmax function and the Boltzmann operator, we prove that a single layer of self-attention with low-rank weight matrices possesses the capability to perfectly capture the context of an entire input sequence. As a consequence, we show that one-layer and single-head Transformers have a memorization capacity for finite samples, and that Transformers consisting of one self-attention layer with two feed-forward neural networks are universal approximators for continuous permutation equivariant functions on a compact domain. △ Less

Submitted 29 January, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: ICLR 2024

MSC Class: 68T07 ACM Class: I.2.0

arXiv:2307.07106 [pdf, ps, other]

Absolute zeta functions for zeta functions of quantum cellular automata

Authors: Jirô Akahori, Norio Konno, Iwao Sato

Abstract: Our previous work dealt with the zeta function for the interacting particle system (IPS) including quantum cellular automaton (QCA) as a typical model in the study of ``IPS/Zeta Correspondence". On the other hand, the absolute zeta function is a zeta function over F_1 defined by a function satisfying an absolute automorphy. This paper proves that a new zeta function given by QCA is an absolute aut… ▽ More Our previous work dealt with the zeta function for the interacting particle system (IPS) including quantum cellular automaton (QCA) as a typical model in the study of ``IPS/Zeta Correspondence". On the other hand, the absolute zeta function is a zeta function over F_1 defined by a function satisfying an absolute automorphy. This paper proves that a new zeta function given by QCA is an absolute automorphic form of weight depending on the size of the configuration space. As an example, we calculate an absolute zeta function for a tensor-type QCA, and show that it is expressed as the multiple gamma function. In addition, we obtain its functional equation by the multiple sine function. △ Less

Submitted 17 January, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: 14 pages

Journal ref: Quantum Information and Computation, Vol.23, No.15&16, pp.1261-1274 (2023)

arXiv:2305.19743 [pdf, other]

Towards Monocular Shape from Refraction

Authors: Antonin Sulc, Imari Sato, Bastian Goldluecke, Tali Treibitz

Abstract: Refraction is a common physical phenomenon and has long been researched in computer vision. Objects imaged through a refractive object appear distorted in the image as a function of the shape of the interface between the media. This hinders many computer vision applications, but can be utilized for obtaining the geometry of the refractive interface. Previous approaches for refractive surface recov… ▽ More Refraction is a common physical phenomenon and has long been researched in computer vision. Objects imaged through a refractive object appear distorted in the image as a function of the shape of the interface between the media. This hinders many computer vision applications, but can be utilized for obtaining the geometry of the refractive interface. Previous approaches for refractive surface recovery largely relied on various priors or additional information like multiple images of the analyzed surface. In contrast, we claim that a simple energy function based on Snell's law enables the reconstruction of an arbitrary refractive surface geometry using just a single image and known background texture and geometry. In the case of a single point, Snell's law has two degrees of freedom, therefore to estimate a surface depth, we need additional information. We show that solving for an entire surface at once introduces implicit parameter-free spatial regularization and yields convincing results when an intelligent initial guess is provided. We demonstrate our approach through simulations and real-world experiments, where the reconstruction shows encouraging results in the single-frame monocular setting. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: 12 pages, 6 figures, The 32nd British Machine Vision Conference (BMVC)

Journal ref: 32nd British Machine Vision Conference 2021, BMVA Press, 2021,

arXiv:2305.16573 [pdf, other]

Exploring Weight Balancing on Long-Tailed Recognition Problem

Authors: Naoya Hasegawa, Issei Sato

Abstract: Recognition problems in long-tailed data, in which the sample size per class is heavily skewed, have gained importance because the distribution of the sample size per class in a dataset is generally exponential unless the sample size is intentionally adjusted. Various methods have been devised to address these problems.Recently, weight balancing, which combines well-known classical regularization… ▽ More Recognition problems in long-tailed data, in which the sample size per class is heavily skewed, have gained importance because the distribution of the sample size per class in a dataset is generally exponential unless the sample size is intentionally adjusted. Various methods have been devised to address these problems.Recently, weight balancing, which combines well-known classical regularization techniques with two-stage training, has been proposed. Despite its simplicity, it is known for its high performance compared with existing methods devised in various ways. However, there is a lack of understanding as to why this method is effective for long-tailed data. In this study, we analyze weight balancing by focusing on neural collapse and the cone effect at each training stage and found that it can be decomposed into an increase in Fisher's discriminant ratio of the feature extractor caused by weight decay and cross entropy loss and implicit logit adjustment caused by weight decay and class-balanced loss. Our analysis enables the training method to be further simplified by reducing the number of training stages to one while increasing accuracy. Code is available at https://github.com/HN410/Exploring-Weight-Balancing-on-Long-Tailed-Recognition-Problem. △ Less

Submitted 28 April, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Paper accepted for publication at ICLR 2024

arXiv:2302.09583 [pdf, ps, other]

Alternating Walk/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: We consider the alternating zeta function and the alternating $L$-function of a graph $G$, and express them by using the Ihara zeta function of $G$. Next, we define a generalized alternating zeta function of a graph, and express the generalized alternating zeta function of a vertex-transitive regular graph by spectra of the transition probability matrix of the symmetric simple random walk on it an… ▽ More We consider the alternating zeta function and the alternating $L$-function of a graph $G$, and express them by using the Ihara zeta function of $G$. Next, we define a generalized alternating zeta function of a graph, and express the generalized alternating zeta function of a vertex-transitive regular graph by spectra of the transition probability matrix of the symmetric simple random walk on it and its Laplacian. Furthermore, we present an integral expression for the limit of the generalized alternating zeta functions of a series of vertex-transitive regular graphs. As an example, we treat the generalized alternating zeta functions of a finite torus. Finally, we treat the relation between the Mahler measure and the alternating zeta function of a graph. △ Less

Submitted 19 February, 2023; originally announced February 2023.

Comments: 24 pages. arXiv admin note: text overlap with arXiv:2205.00457; text overlap with arXiv:1905.13182 by other authors

MSC Class: 05C50; 15A15

arXiv:2212.13704 [pdf, other]

Ronkin/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato, Kohei Sato

Abstract: The Ronkin function was defined by Ronkin in the consideration of the zeros of almost periodic function. Recently, this function has been used in various research fields in mathematics, physics and so on. Especially in mathematics, it has a closed connections with tropical geometry, amoebas, Newton polytopes and dimer models. On the other hand, we have been investigated a new class of zeta funct… ▽ More The Ronkin function was defined by Ronkin in the consideration of the zeros of almost periodic function. Recently, this function has been used in various research fields in mathematics, physics and so on. Especially in mathematics, it has a closed connections with tropical geometry, amoebas, Newton polytopes and dimer models. On the other hand, we have been investigated a new class of zeta functions for various kinds of walks including quantum walks by a series of our previous work on Zeta Correspondence. The quantum walk is a quantum counterpart of the random walk. In this paper, we present a new relation between the Ronkin function and our zeta function for random walks and quantum walks. Firstly we consider this relation in the case of one-dimensional random walks. Afterwards we deal with higher-dimensional random walks. For comparison with the case of the quantum walk, we also treat the case of one-dimensional quantum walks. Our results bridge between the Ronkin function and the zeta function via quantum walks for the first time. △ Less

Submitted 12 February, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: 19 pages. arXiv admin note: substantial text overlap with arXiv:2202.05966

arXiv:2212.10352 [pdf, other]

Fixed-Weight Difference Target Propagation

Authors: Tatsukichi Shibuya, Nakamasa Inoue, Rei Kawakami, Ikuro Sato

Abstract: Target Propagation (TP) is a biologically more plausible algorithm than the error backpropagation (BP) to train deep networks, and improving practicality of TP is an open issue. TP methods require the feedforward and feedback networks to form layer-wise autoencoders for propagating the target values generated at the output layer. However, this causes certain drawbacks; e.g., careful hyperparameter… ▽ More Target Propagation (TP) is a biologically more plausible algorithm than the error backpropagation (BP) to train deep networks, and improving practicality of TP is an open issue. TP methods require the feedforward and feedback networks to form layer-wise autoencoders for propagating the target values generated at the output layer. However, this causes certain drawbacks; e.g., careful hyperparameter tuning is required to synchronize the feedforward and feedback training, and frequent updates of the feedback path are usually required than that of the feedforward path. Learning of the feedforward and feedback networks is sufficient to make TP methods capable of training, but is having these layer-wise autoencoders a necessary condition for TP to work? We answer this question by presenting Fixed-Weight Difference Target Propagation (FW-DTP) that keeps the feedback weights constant during training. We confirmed that this simple method, which naturally resolves the abovementioned problems of TP, can still deliver informative target values to hidden layers for a given task; indeed, FW-DTP consistently achieves higher test performance than a baseline, the Difference Target Propagation (DTP), on four classification datasets. We also present a novel propagation architecture that explains the exact form of the feedback function of DTP to analyze FW-DTP. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: Accepted at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23). 9 pages and 3 figures in main manuscript; 11 pages and 5 figures in supplementary material

arXiv:2211.11492 [pdf, other]

ClipCrop: Conditioned Crop** Driven by Vision-Language Model

Authors: Zhihang Zhong, Mingxi Cheng, Zhirong Wu, Yuhui Yuan, Yinqiang Zheng, Ji Li, Han Hu, Stephen Lin, Yoichi Sato, Imari Sato

Abstract: Image crop** has progressed tremendously under the data-driven paradigm. However, current approaches do not account for the intentions of the user, which is an issue especially when the composition of the input image is complex. Moreover, labeling of crop** data is costly and hence the amount of data is limited, leading to poor generalization performance of current algorithms in the wild. In t… ▽ More Image crop** has progressed tremendously under the data-driven paradigm. However, current approaches do not account for the intentions of the user, which is an issue especially when the composition of the input image is complex. Moreover, labeling of crop** data is costly and hence the amount of data is limited, leading to poor generalization performance of current algorithms in the wild. In this work, we take advantage of vision-language models as a foundation for creating robust and user-intentional crop** algorithms. By adapting a transformer decoder with a pre-trained CLIP-based detection model, OWL-ViT, we develop a method to perform crop** with a text or image query that reflects the user's intention as guidance. In addition, our pipeline design allows the model to learn text-conditioned aesthetic crop** with a small crop** dataset, while inheriting the open-vocabulary ability acquired from millions of text-image pairs. We validate our model through extensive experiments on existing datasets as well as a new crop** test set we compiled that is characterized by content ambiguity. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.11423 [pdf, other]

Blur Interpolation Transformer for Real-World Motion from Blur

Authors: Zhihang Zhong, Mingdeng Cao, Xiang Ji, Yinqiang Zheng, Imari Sato

Abstract: This paper studies the challenging problem of recovering motion from blur, also known as joint deblurring and interpolation or blur temporal super-resolution. The challenges are twofold: 1) the current methods still leave considerable room for improvement in terms of visual quality even on the synthetic dataset, and 2) poor generalization to real-world data. To this end, we propose a blur interpol… ▽ More This paper studies the challenging problem of recovering motion from blur, also known as joint deblurring and interpolation or blur temporal super-resolution. The challenges are twofold: 1) the current methods still leave considerable room for improvement in terms of visual quality even on the synthetic dataset, and 2) poor generalization to real-world data. To this end, we propose a blur interpolation transformer (BiT) to effectively unravel the underlying temporal correlation encoded in blur. Based on multi-scale residual Swin transformer blocks, we introduce dual-end temporal supervision and temporally symmetric ensembling strategies to generate effective features for time-varying motion rendering. In addition, we design a hybrid camera system to collect the first real-world dataset of one-to-many blur-sharp video pairs. Experimental results show that BiT has a significant gain over the state-of-the-art methods on the public dataset Adobe240. Besides, the proposed real-world dataset effectively helps the model generalize well to real blurry scenarios. Code and data are available at https://github.com/zzh-tech/BiT. △ Less

Submitted 7 March, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: Accepted by CVPR2023

arXiv:2211.10382 [pdf, other]

doi 10.1145/3551626.3564942

Informative Sample-Aware Proxy for Deep Metric Learning

Authors: Aoyu Li, Ikuro Sato, Kohta Ishikawa, Rei Kawakami, Rio Yokota

Abstract: Among various supervised deep metric learning methods proxy-based approaches have achieved high retrieval accuracies. Proxies, which are class-representative points in an embedding space, receive updates based on proxy-sample similarities in a similar manner to sample representations. In existing methods, a relatively small number of samples can produce large gradient magnitudes (ie, hard samples)… ▽ More Among various supervised deep metric learning methods proxy-based approaches have achieved high retrieval accuracies. Proxies, which are class-representative points in an embedding space, receive updates based on proxy-sample similarities in a similar manner to sample representations. In existing methods, a relatively small number of samples can produce large gradient magnitudes (ie, hard samples), and a relatively large number of samples can produce small gradient magnitudes (ie, easy samples); these can play a major part in updates. Assuming that acquiring too much sensitivity to such extreme sets of samples would deteriorate the generalizability of a method, we propose a novel proxy-based method called Informative Sample-Aware Proxy (Proxy-ISA), which directly modifies a gradient weighting factor for each sample using a scheduled threshold function, so that the model is more sensitive to the informative samples. Extensive experiments on the CUB-200-2011, Cars-196, Stanford Online Products and In-shop Clothes Retrieval datasets demonstrate the superiority of Proxy-ISA compared with the state-of-the-art methods. △ Less

Submitted 18 November, 2022; originally announced November 2022.

Comments: Accepted at ACM Multimedia Asia (MMAsia) 2022

arXiv:2211.08583 [pdf, other]

Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Authors: Hiroki Naganuma, Kartik Ahuja, Shiro Takagi, Tetsuya Motokawa, Rio Yokota, Kohta Ishikawa, Ikuro Sato, Ioannis Mitliagkas

Abstract: Modern deep learning systems do not generalize well when the test data distribution is slightly different to the training data distribution. While much promising work has been accomplished to address this fragility, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular firs… ▽ More Modern deep learning systems do not generalize well when the test data distribution is slightly different to the training data distribution. While much promising work has been accomplished to address this fragility, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular first-order optimizers for different classes of distributional shift under empirical risk minimization and invariant risk minimization. We address this question for image and text classification using DomainBed, WILDS, and Backgrounds Challenge as testbeds for studying different types of shifts -- namely correlation and diversity shift. We search over a wide range of hyperparameters and examine classification accuracy (in-distribution and out-of-distribution) for over 20,000 models. We arrive at the following findings, which we expect to be helpful for practitioners: i) adaptive optimizers (e.g., Adam) perform worse than non-adaptive optimizers (e.g., SGD, momentum SGD) on out-of-distribution performance. In particular, even though there is no significant difference in in-distribution performance, we show a measurable difference in out-of-distribution performance. ii) in-distribution performance and out-of-distribution performance exhibit three types of behavior depending on the dataset -- linear returns, increasing returns, and diminishing returns. For example, in the training of natural language data using Adam, fine-tuning the performance of in-distribution performance does not significantly contribute to the out-of-distribution generalization performance. △ Less

Submitted 5 June, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: Accepted to TMLR

arXiv:2207.10123 [pdf, other]

Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance

Authors: Zhihang Zhong, Xiao Sun, Zhirong Wu, Yinqiang Zheng, Stephen Lin, Imari Sato

Abstract: We study the challenging problem of recovering detailed motion from a single motion-blurred image. Existing solutions to this problem estimate a single image sequence without considering the motion ambiguity for each region. Therefore, the results tend to converge to the mean of the multi-modal possibilities. In this paper, we explicitly account for such motion ambiguity, allowing us to generate m… ▽ More We study the challenging problem of recovering detailed motion from a single motion-blurred image. Existing solutions to this problem estimate a single image sequence without considering the motion ambiguity for each region. Therefore, the results tend to converge to the mean of the multi-modal possibilities. In this paper, we explicitly account for such motion ambiguity, allowing us to generate multiple plausible solutions all in sharp detail. The key idea is to introduce a motion guidance representation, which is a compact quantization of 2D optical flow with only four discrete motion directions. Conditioned on the motion guidance, the blur decomposition is led to a specific, unambiguous solution by using a novel two-stage decomposition network. We propose a unified framework for blur decomposition, which supports various interfaces for generating our motion guidance, including human input, motion information from adjacent video frames, and learning from a video dataset. Extensive experiments on synthesized datasets and real-world data show that the proposed framework is qualitatively and quantitatively superior to previous methods, and also offers the merit of producing physically plausible and diverse solutions. Code is available at https://github.com/zzh-tech/Animation-from-Blur. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: ECCV2022

arXiv:2207.01847 [pdf, other]

PoF: Post-Training of Feature Extractor for Improving Generalization

Authors: Ikuro Sato, Ryota Yamada, Masayuki Tanaka, Nakamasa Inoue, Rei Kawakami

Abstract: It has been intensively investigated that the local shape, especially flatness, of the loss landscape near a minimum plays an important role for generalization of deep models. We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. The characteristics are two-fold: 1) Feat… ▽ More It has been intensively investigated that the local shape, especially flatness, of the loss landscape near a minimum plays an important role for generalization of deep models. We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. The characteristics are two-fold: 1) Feature extractor is trained under parameter perturbations in the higher-layer parameter space, based on observations that suggest flattening higher-layer parameter space, and 2) the perturbation range is determined in a data-driven manner aiming to reduce a part of test loss caused by the positive loss curvature. We provide a theoretical analysis that shows the proposed algorithm implicitly reduces the target Hessian components as well as the loss. Experimental results show that PoF improved model performance against baseline methods on both CIFAR-10 and CIFAR-100 datasets for only 10-epoch post-training, and on SVHN dataset for 50-epoch post-training. Source code is available at: \url{https://github.com/DensoITLab/PoF-v1 △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: Accepted to ICML2022. Contains a link to the code

arXiv:2206.01606 [pdf, ps, other]

Excess risk analysis for epistemic uncertainty with application to variational inference

Authors: Futoshi Futami, Tomoharu Iwata, Naonori Ueda, Issei Sato, Masashi Sugiyama

Abstract: Bayesian deep learning plays an important role especially for its ability evaluating epistemic uncertainty (EU). Due to computational complexity issues, approximation methods such as variational inference (VI) have been used in practice to obtain posterior distributions and their generalization abilities have been analyzed extensively, for example, by PAC-Bayesian theory; however, little analysis… ▽ More Bayesian deep learning plays an important role especially for its ability evaluating epistemic uncertainty (EU). Due to computational complexity issues, approximation methods such as variational inference (VI) have been used in practice to obtain posterior distributions and their generalization abilities have been analyzed extensively, for example, by PAC-Bayesian theory; however, little analysis exists on EU, although many numerical experiments have been conducted on it. In this study, we analyze the EU of supervised learning in approximate Bayesian inference by focusing on its excess risk. First, we theoretically show the novel relations between generalization error and the widely used EU measurements, such as the variance and mutual information of predictive distribution, and derive their convergence behaviors. Next, we clarify how the objective function of VI regularizes the EU. With this analysis, we propose a new objective function for VI that directly controls the prediction performance and the EU based on the PAC-Bayesian theory. Numerical experiments show that our algorithm significantly improves the EU evaluation over the existing VI methods. △ Less

Submitted 11 October, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

arXiv:2206.00944 [pdf, other]

Feature Space Particle Inference for Neural Network Ensembles

Authors: Shingo Yashima, Teppei Suzuki, Kohta Ishikawa, Ikuro Sato, Rei Kawakami

Abstract: Ensembles of deep neural networks demonstrate improved performance over single models. For enhancing the diversity of ensemble members while kee** their performance, particle-based inference methods offer a promising approach from a Bayesian perspective. However, the best way to apply these methods to neural networks is still unclear: seeking samples from the weight-space posterior suffers from… ▽ More Ensembles of deep neural networks demonstrate improved performance over single models. For enhancing the diversity of ensemble members while kee** their performance, particle-based inference methods offer a promising approach from a Bayesian perspective. However, the best way to apply these methods to neural networks is still unclear: seeking samples from the weight-space posterior suffers from inefficiency due to the over-parameterization issues, while seeking samples directly from the function-space posterior often results in serious underfitting. In this study, we propose optimizing particles in the feature space where the activation of a specific intermediate layer lies to address the above-mentioned difficulties. Our method encourages each member to capture distinct features, which is expected to improve ensemble prediction robustness. Extensive evaluation on real-world datasets shows that our model significantly outperforms the gold-standard Deep Ensembles on various metrics, including accuracy, calibration, and robustness. Code is available at https://github.com/DensoITLab/featurePI . △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: ICML2022

arXiv:2205.07320 [pdf, other]

Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective

Authors: Keitaro Sakamoto, Issei Sato

Abstract: The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large le… ▽ More The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods. △ Less

Submitted 28 September, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

Comments: NeurIPS 2022

arXiv:2205.00457 [pdf, ps, other]

Metzler/Zeta Correspondence

Authors: Yusuke Ide, Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: We present an explicit formula for the determinant on the Metzler matrix of a digraph $D$. Furthermore, we introduce a walk-type zeta function with respect to this Metzler matrix of the symmetric digraph of a finite torus, and express its limit formula by using the integral expression. We present an explicit formula for the determinant on the Metzler matrix of a digraph $D$. Furthermore, we introduce a walk-type zeta function with respect to this Metzler matrix of the symmetric digraph of a finite torus, and express its limit formula by using the integral expression. △ Less

Submitted 1 July, 2022; v1 submitted 1 May, 2022; originally announced May 2022.

Comments: 16 pages

MSC Class: 05C50; 15A15

arXiv:2204.13849 [pdf, other]

Goldilocks-curriculum Domain Randomization and Fractal Perlin Noise with Application to Sim2Real Pneumonia Lesion Detection

Authors: Takahiro Suzuki, Shouhei Hanaoka, Issei Sato

Abstract: A computer-aided detection (CAD) system based on machine learning is expected to assist radiologists in making a diagnosis. It is desirable to build CAD systems for the various types of diseases accumulating daily in a hospital. An obstacle in develo** a CAD system for a disease is that the number of medical images is typically too small to improve the performance of the machine learning model.… ▽ More A computer-aided detection (CAD) system based on machine learning is expected to assist radiologists in making a diagnosis. It is desirable to build CAD systems for the various types of diseases accumulating daily in a hospital. An obstacle in develo** a CAD system for a disease is that the number of medical images is typically too small to improve the performance of the machine learning model. In this paper, we aim to explore ways to address this problem through a sim2real transfer approach in medical image fields. To build a platform to evaluate the performance of sim2real transfer methods in the field of medical imaging, we construct a benchmark dataset that consists of $101$ chest X-images with difficult-to-identify pneumonia lesions judged by an experienced radiologist and a simulator based on fractal Perlin noise and the X-ray principle for generating pseudo pneumonia lesions. We then develop a novel domain randomization method, called Goldilocks-curriculum domain randomization (GDR) and evaluate our method in this platform. △ Less

Submitted 28 April, 2022; originally announced April 2022.

arXiv:2204.08226 [pdf, other]

Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey

Authors: Kento Nozawa, Issei Sato

Abstract: Representation learning enables us to automatically extract generic feature representations from a dataset to solve another machine learning task. Recently, extracted feature representations by a representation learning algorithm and a simple predictor have exhibited state-of-the-art performance on several machine learning tasks. Despite its remarkable progress, there exist various ways to evaluat… ▽ More Representation learning enables us to automatically extract generic feature representations from a dataset to solve another machine learning task. Recently, extracted feature representations by a representation learning algorithm and a simple predictor have exhibited state-of-the-art performance on several machine learning tasks. Despite its remarkable progress, there exist various ways to evaluate representation learning algorithms depending on the application because of the flexibility of representation learning. To understand the current representation learning, we review evaluation methods of representation learning algorithms and theoretical analyses. On the basis of our evaluation survey, we also discuss the future direction of representation learning. Note that this survey is the extended version of Nozawa and Sato (2022). △ Less

Submitted 18 April, 2022; originally announced April 2022.

Comments: The extended version of "Kento Nozawa and Issei Sato. Evaluation Methods for Representation Learning: A Survey. In IJCAI-ECAI Survey Track, 2022."

arXiv:2204.04853 [pdf, other]

Neural Lagrangian Schrödinger Bridge: Diffusion Modeling for Population Dynamics

Authors: Takeshi Koshizuka, Issei Sato

Abstract: Population dynamics is the study of temporal and spatial variation in the size of populations of organisms and is a major part of population ecology. One of the main difficulties in analyzing population dynamics is that we can only obtain observation data with coarse time intervals from fixed-point observations due to experimental costs or measurement constraints. Recently, modeling population dyn… ▽ More Population dynamics is the study of temporal and spatial variation in the size of populations of organisms and is a major part of population ecology. One of the main difficulties in analyzing population dynamics is that we can only obtain observation data with coarse time intervals from fixed-point observations due to experimental costs or measurement constraints. Recently, modeling population dynamics by using continuous normalizing flows (CNFs) and dynamic optimal transport has been proposed to infer the sample trajectories from a fixed-point observed population. While the sample behavior in CNFs is deterministic, the actual sample in biological systems moves in an essentially random yet directional manner. Moreover, when a sample moves from point A to point B in dynamical systems, its trajectory typically follows the principle of least action in which the corresponding action has the smallest possible value. To satisfy these requirements of the sample trajectories, we formulate the Lagrangian Schrödinger bridge (LSB) problem and propose to solve it approximately by modeling the advection-diffusion process with regularized neural SDE. We also develop a model architecture that enables faster computation of the loss function. Experimental results show that the proposed method can efficiently approximate the population-level dynamics even for high-dimensional data and that using the prior knowledge introduced by the Lagrangian enables us to estimate the sample-level dynamics with stochastic behavior. △ Less

Submitted 26 February, 2023; v1 submitted 10 April, 2022; originally announced April 2022.

Comments: Published at ICLR 2023 (notable top 25%)

arXiv:2203.13694 [pdf, other]

Implicit Neural Representations for Variable Length Human Motion Generation

Authors: Pablo Cervantes, Yusuke Sekikawa, Ikuro Sato, Koichi Shinoda

Abstract: We propose an action-conditional human motion generation method using variational implicit neural representations (INR). The variational formalism enables action-conditional distributions of INRs, from which one can easily sample representations to generate novel human motion sequences. Our method offers variable-length sequence generation by construction because a part of INR is optimized for a w… ▽ More We propose an action-conditional human motion generation method using variational implicit neural representations (INR). The variational formalism enables action-conditional distributions of INRs, from which one can easily sample representations to generate novel human motion sequences. Our method offers variable-length sequence generation by construction because a part of INR is optimized for a whole sequence of arbitrary length with temporal embeddings. In contrast, previous works reported difficulties with modeling variable-length sequences. We confirm that our method with a Transformer decoder outperforms all relevant methods on HumanAct12, NTU-RGBD, and UESTC datasets in terms of realism and diversity of generated motions. Surprisingly, even our method with an MLP decoder consistently outperforms the state-of-the-art Transformer-based auto-encoder. In particular, we show that variable-length motions generated by our method are better than fixed-length motions generated by the state-of-the-art method in terms of realism and diversity. Code at https://github.com/PACerv/ImplicitMotion. △ Less

Submitted 15 July, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

Comments: Accepted to ECCV 2022

arXiv:2203.06451 [pdf, other]

Bringing Rolling Shutter Images Alive with Dual Reversed Distortion

Authors: Zhihang Zhong, Mingdeng Cao, Xiao Sun, Zhirong Wu, Zhongyi Zhou, Yinqiang Zheng, Stephen Lin, Imari Sato

Abstract: Rolling shutter (RS) distortion can be interpreted as the result of picking a row of pixels from instant global shutter (GS) frames over time during the exposure of the RS camera. This means that the information of each instant GS frame is partially, yet sequentially, embedded into the row-dependent distortion. Inspired by this fact, we address the challenging task of reversing this process, i.e.,… ▽ More Rolling shutter (RS) distortion can be interpreted as the result of picking a row of pixels from instant global shutter (GS) frames over time during the exposure of the RS camera. This means that the information of each instant GS frame is partially, yet sequentially, embedded into the row-dependent distortion. Inspired by this fact, we address the challenging task of reversing this process, i.e., extracting undistorted GS frames from images suffering from RS distortion. However, since RS distortion is coupled with other factors such as readout settings and the relative velocity of scene elements to the camera, models that only exploit the geometric correlation between temporally adjacent images suffer from poor generality in processing data with different readout settings and dynamic scenes with both camera motion and object motion. In this paper, instead of two consecutive frames, we propose to exploit a pair of images captured by dual RS cameras with reversed RS directions for this highly challenging task. Grounded on the symmetric and complementary nature of dual reversed distortion, we develop a novel end-to-end model, IFED, to generate dual optical flow sequence through iterative learning of the velocity field during the RS time. Extensive experimental results demonstrate that IFED is superior to naive cascade schemes, as well as the state-of-the-art which utilizes adjacent RS images. Most importantly, although it is trained on a synthetic dataset, IFED is shown to be effective at retrieving GS frame sequences from real-world RS distorted images of dynamic scenes. Code is available at https://github.com/zzh-tech/Dual-Reversed-RS. △ Less

Submitted 20 July, 2022; v1 submitted 12 March, 2022; originally announced March 2022.

Comments: ECCV2022 Oral

arXiv:2202.06001 [pdf, other]

The Ihara expression for generalized weighted zeta functions of Bartholdi type on finite digraphs

Authors: Ayaka Ishikawa, Hideaki Morita, Iwao Sato

Abstract: The Ihara expression of a weighted zeta function for a general finite digraph is given. It unifies all the Ihara expressions obtained for known zeta functions for finite digraphs. Any digraph in this paper permits multi-edges and multi-loops. The Ihara expression of a weighted zeta function for a general finite digraph is given. It unifies all the Ihara expressions obtained for known zeta functions for finite digraphs. Any digraph in this paper permits multi-edges and multi-loops. △ Less

Submitted 12 February, 2022; originally announced February 2022.

arXiv:2202.05966 [pdf, ps, other]

Mahler/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato, Shunya Tamura

Abstract: The Mahler measure was introduced by Mahler in the study of number theory. It is known that the Mahler measure appears in different areas of mathematics and physics. On the other hand, we have been investigated a new class of zeta functions for various kinds of walks including quantum walks by a series of our previous work on "Zeta Correspondence". The quantum walk is a quantum counterpart of the… ▽ More The Mahler measure was introduced by Mahler in the study of number theory. It is known that the Mahler measure appears in different areas of mathematics and physics. On the other hand, we have been investigated a new class of zeta functions for various kinds of walks including quantum walks by a series of our previous work on "Zeta Correspondence". The quantum walk is a quantum counterpart of the random walk. In this paper, we present a new relation between the Mahler measure and our zeta function for quantum walks. Firstly we consider this relation in the case of one-dimensional quantum walks. Afterwards we deal with higher-dimensional quantum walks. For comparison with the case of the quantum walk, we also treat the case of higher-dimensional random walks. Our results bridge between the Mahler measure and the zeta function via quantum walks for the first time. △ Less

Submitted 11 February, 2022; originally announced February 2022.

Comments: 27 pages. arXiv admin note: text overlap with arXiv:2109.07664, arXiv:2104.10287

arXiv:2201.03973 [pdf, ps, other]

A Generalized Grover/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato, Shunya Tamura

Abstract: We introduce a generalized Grover matrix of a graph and present an explicit formula for its characteristic polynomial. As a corollary, we give the spectra for the generalized Grover matrix of a regular graph. Next, we define a zeta function and a generalized zeta function of a graph $G$ with respect to its generalized Grover matrix as an analog of the Ihara zeta function and present explicit formu… ▽ More We introduce a generalized Grover matrix of a graph and present an explicit formula for its characteristic polynomial. As a corollary, we give the spectra for the generalized Grover matrix of a regular graph. Next, we define a zeta function and a generalized zeta function of a graph $G$ with respect to its generalized Grover matrix as an analog of the Ihara zeta function and present explicit formulas for their zeta functions for a vertex-transitive graph. As applications, we express the limit on the generalized zeta functions of a family of finite vertex-transitive regular graphs by an integral. Furthermore, we give the limit on the generalized zeta functions of a family of finite tori as an integral expression. △ Less

Submitted 9 January, 2022; originally announced January 2022.

Comments: 10 pages. arXiv admin note: text overlap with arXiv:2011.14162

MSC Class: 60F05; 05C50; 15A15; 05C25

arXiv:2110.05076 [pdf, other]

A Closer Look at Prototype Classifier for Few-shot Image Classification

Authors: Mingcheng Hou, Issei Sato

Abstract: The prototypical network is a prototype classifier based on meta-learning and is widely used for few-shot learning because it classifies unseen examples by constructing class-specific prototypes without adjusting hyper-parameters during meta-testing. Interestingly, recent research has attracted a lot of attention, showing that training a new linear classifier, which does not use a meta-learning al… ▽ More The prototypical network is a prototype classifier based on meta-learning and is widely used for few-shot learning because it classifies unseen examples by constructing class-specific prototypes without adjusting hyper-parameters during meta-testing. Interestingly, recent research has attracted a lot of attention, showing that training a new linear classifier, which does not use a meta-learning algorithm, performs comparably with the prototypical network. However, the training of a new linear classifier requires the retraining of the classifier every time a new class appears. In this paper, we analyze how a prototype classifier works equally well without training a new linear classifier or meta-learning. We experimentally find that directly using the feature vectors, which is extracted by using standard pre-trained models to construct a prototype classifier in meta-testing, does not perform as well as the prototypical network and training new linear classifiers on the feature vectors of pre-trained models. Thus, we derive a novel generalization bound for a prototypical classifier and show that the transformation of a feature vector can improve the performance of prototype classifiers. We experimentally investigate several normalization methods for minimizing the derived bound and find that the same performance can be obtained by using the L2 normalization and minimizing the ratio of the within-class variance to the between-class variance without training a new classifier or meta-learning. △ Less

Submitted 15 September, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: 21 pages with 10 appendix section Our paper has been accepted in 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

arXiv:2110.00716 [pdf, ps, other]

Quantum walks driven by quantum coins with two multiple eigenvalues

Authors: Norio Konno, Iwao Sato, Etsuo Segawa, Yutaka Shikano

Abstract: We consider a spectral analysis on the quantum walks on graph $G=(V,E)$ with the local coin operators $\{C_u\}_{u\in V}$ and the flip flop shift. The quantum coin operators have commonly two distinct eigenvalues $κ,κ'$ and $p=\dim(\ker(κ-C_u))$ for any $u\in V$ with $1\leq p\leq δ(G)$, where $δ(G)$ is the minimum degrees of $G$. We show that this quantum walk can be decomposed into a cellular auto… ▽ More We consider a spectral analysis on the quantum walks on graph $G=(V,E)$ with the local coin operators $\{C_u\}_{u\in V}$ and the flip flop shift. The quantum coin operators have commonly two distinct eigenvalues $κ,κ'$ and $p=\dim(\ker(κ-C_u))$ for any $u\in V$ with $1\leq p\leq δ(G)$, where $δ(G)$ is the minimum degrees of $G$. We show that this quantum walk can be decomposed into a cellular automaton on $\ell^2(V;\mathbb{C}^p)$ whose time evolution is described by a self adjoint operator $T$ and its remainder. We obtain how the eigenvalues and its eigenspace of $T$ are lifted up to as those of the original quantum walk. As an application, we express the eigenpolynomial of the Grover walk on $\mathbb{Z}^d$ with the moving shift in the Fourier space. △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: 29 pages, 1 figure

arXiv:2108.13753 [pdf, other]

Disentanglement Analysis with Partial Information Decomposition

Authors: Seiya Tokui, Issei Sato

Abstract: We propose a framework to analyze how multivariate representations disentangle ground-truth generative factors. A quantitative analysis of disentanglement has been based on metrics designed to compare how one variable explains each generative factor. Current metrics, however, may fail to detect entanglement that involves more than two variables, e.g., representations that duplicate and rotate gene… ▽ More We propose a framework to analyze how multivariate representations disentangle ground-truth generative factors. A quantitative analysis of disentanglement has been based on metrics designed to compare how one variable explains each generative factor. Current metrics, however, may fail to detect entanglement that involves more than two variables, e.g., representations that duplicate and rotate generative factors in high dimensional spaces. In this work, we establish a framework to analyze information sharing in a multivariate representation with Partial Information Decomposition and propose a new disentanglement metric. This framework enables us to understand disentanglement in terms of uniqueness, redundancy, and synergy. We develop an experimental protocol to assess how increasingly entangled representations are evaluated with each metric and confirm that the proposed metric correctly responds to entanglement. Through experiments on variational autoencoders, we find that models with similar disentanglement scores have a variety of characteristics in entanglement, for each of which a distinct strategy may be required to obtain a disentangled representation. △ Less

Submitted 9 February, 2022; v1 submitted 31 August, 2021; originally announced August 2021.

Comments: ICLR 2022

arXiv:2107.03590 [pdf, ps, other]

CTM/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: In our previous work, we investigated the relation between zeta functions and discrete-time models including random and quantum walks. In this paper, we introduce a zeta function for the continuous-time model (CTM) and consider CTMs including the corresponding random and quantum walks on the d-dimensional torus. In our previous work, we investigated the relation between zeta functions and discrete-time models including random and quantum walks. In this paper, we introduce a zeta function for the continuous-time model (CTM) and consider CTMs including the corresponding random and quantum walks on the d-dimensional torus. △ Less

Submitted 15 March, 2022; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: 9 pages, minor corrections, Quantum Studies: Mathematics and Foundations, Volume 9, pp.165-173 (2022)

arXiv:2107.03300 [pdf, ps, other]

Vertex-Face/Zeta correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: We present the characteristic polynomial for the transition matrix of a vertex-face walk on a graph, and obtain its spectra. Furthermore, we express the characteristic polynomial for the transition matrix of a vertex-face walk on the 2-dimensional torus by using its adjacency matrix, and obtain its spectra. As an application, we define a new walk-type zeta function with respect to the transition m… ▽ More We present the characteristic polynomial for the transition matrix of a vertex-face walk on a graph, and obtain its spectra. Furthermore, we express the characteristic polynomial for the transition matrix of a vertex-face walk on the 2-dimensional torus by using its adjacency matrix, and obtain its spectra. As an application, we define a new walk-type zeta function with respect to the transition matrix of a vertex-face walk on the 2-dimensional torus, and present its explicit formula. △ Less

Submitted 4 July, 2021; originally announced July 2021.

Comments: 14 pages. arXiv admin note: text overlap with arXiv:2103.12971, arXiv:2011.14162

MSC Class: 60F05; 05C10; 05C50; 15A15

arXiv:2106.16028 [pdf, other]

Real-world Video Deblurring: A Benchmark Dataset and An Efficient Recurrent Neural Network

Authors: Zhihang Zhong, Ye Gao, Yinqiang Zheng, Bo Zheng, Imari Sato

Abstract: Real-world video deblurring in real time still remains a challenging task due to the complexity of spatially and temporally varying blur itself and the requirement of low computational cost. To improve the network efficiency, we adopt residual dense blocks into RNN cells, so as to efficiently extract the spatial features of the current frame. Furthermore, a global spatio-temporal attention module… ▽ More Real-world video deblurring in real time still remains a challenging task due to the complexity of spatially and temporally varying blur itself and the requirement of low computational cost. To improve the network efficiency, we adopt residual dense blocks into RNN cells, so as to efficiently extract the spatial features of the current frame. Furthermore, a global spatio-temporal attention module is proposed to fuse the effective hierarchical features from past and future frames to help better deblur the current frame. Another issue that needs to be addressed urgently is the lack of a real-world benchmark dataset. Thus, we contribute a novel dataset (BSD) to the community, by collecting paired blurry/sharp video clips using a co-axis beam splitter acquisition system. Experimental results show that the proposed method (ESTRNN) can achieve better deblurring performance both quantitatively and qualitatively with less computational cost against state-of-the-art video deblurring methods. In addition, cross-validation experiments between datasets illustrate the high generality of BSD over the synthetic datasets. The code and dataset are released at https://github.com/zzh-tech/ESTRNN. △ Less

Submitted 15 October, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

Comments: Accepted by IJCV (extended version of ECCV2020)

arXiv:2106.05010 [pdf, ps, other]

Loss function based second-order Jensen inequality and its application to particle variational inference

Authors: Futoshi Futami, Tomoharu Iwata, Naonori Ueda, Issei Sato, Masashi Sugiyama

Abstract: Bayesian model averaging, obtained as the expectation of a likelihood function by a posterior distribution, has been widely used for prediction, evaluation of uncertainty, and model selection. Various approaches have been developed to efficiently capture the information in the posterior distribution; one such approach is the optimization of a set of models simultaneously with interaction to ensure… ▽ More Bayesian model averaging, obtained as the expectation of a likelihood function by a posterior distribution, has been widely used for prediction, evaluation of uncertainty, and model selection. Various approaches have been developed to efficiently capture the information in the posterior distribution; one such approach is the optimization of a set of models simultaneously with interaction to ensure the diversity of the individual models in the same way as ensemble learning. A representative approach is particle variational inference (PVI), which uses an ensemble of models as an empirical approximation for the posterior distribution. PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models. However, despite its promising performance, a theoretical understanding of this repulsion and its association with the generalization ability remains unclear. In this paper, we tackle this problem in light of PAC-Bayesian analysis. First, we provide a new second-order Jensen inequality, which has the repulsion term based on the loss function. Thanks to the repulsion term, it is tighter than the standard Jensen inequality. Then, we derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models. Finally, we derive a new PVI that optimizes the generalization error bound directly. Numerical experiments demonstrate that the performance of the proposed PVI compares favorably with existing methods in the experiment. △ Less

Submitted 9 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

arXiv:2105.11599 [pdf, other]

Multi-view 3D Reconstruction of a Texture-less Smooth Surface of Unknown Generic Reflectance

Authors: Ziang Cheng, Hongdong Li, Yuta Asano, Yinqiang Zheng, Imari Sato

Abstract: Recovering the 3D geometry of a purely texture-less object with generally unknown surface reflectance (e.g. non-Lambertian) is regarded as a challenging task in multi-view reconstruction. The major obstacle revolves around establishing cross-view correspondences where photometric constancy is violated. This paper proposes a simple and practical solution to overcome this challenge based on a co-loc… ▽ More Recovering the 3D geometry of a purely texture-less object with generally unknown surface reflectance (e.g. non-Lambertian) is regarded as a challenging task in multi-view reconstruction. The major obstacle revolves around establishing cross-view correspondences where photometric constancy is violated. This paper proposes a simple and practical solution to overcome this challenge based on a co-located camera-light scanner device. Unlike existing solutions, we do not explicitly solve for correspondence. Instead, we argue the problem is generally well-posed by multi-view geometrical and photometric constraints, and can be solved from a small number of input views. We formulate the reconstruction task as a joint energy minimization over the surface geometry and reflectance. Despite this energy is highly non-convex, we develop an optimization algorithm that robustly recovers globally optimal shape and reflectance even from a random initialization. Extensive experiments on both simulated and real data have validated our method, and possible future extensions are discussed. △ Less

Submitted 24 May, 2021; originally announced May 2021.

Comments: Accepted to CVPR2021

arXiv:2105.04056 [pdf, ps, other]

IPS/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: Our previous works presented zeta functions by the Konno-Sato theorem or the Fourier analysis for one-particle models including random walks, correlated random walks, quantum walks, and open quantum random walks. This paper introduces a new zeta function for multi-particle models with probabilistic or quantum interactions, called the interacting particle system (IPS). We compute the zeta function… ▽ More Our previous works presented zeta functions by the Konno-Sato theorem or the Fourier analysis for one-particle models including random walks, correlated random walks, quantum walks, and open quantum random walks. This paper introduces a new zeta function for multi-particle models with probabilistic or quantum interactions, called the interacting particle system (IPS). We compute the zeta function for some tensor-type IPSs. △ Less

Submitted 6 February, 2022; v1 submitted 9 May, 2021; originally announced May 2021.

Comments: 19 pages

Journal ref: Quantum Information and Computation, Vol.22, No.3 & 4, pp.251-269 (2022)

arXiv:2105.02678 [pdf, ps, other]

The trace formula with respect to the twisted Grover matrix of a mixed digraph

Authors: Takashi Komatsu, Sho Kubota, Norio Konno, Iwao Sato

Abstract: We define a zeta function woth respect to the twisted Grover matrix of a mixed digraph, and present an exponential expression and a determinant expression of this zeta function. As an application, we give a trace formula with respect to the twisted Grover matrix of a mixed digraph. We define a zeta function woth respect to the twisted Grover matrix of a mixed digraph, and present an exponential expression and a determinant expression of this zeta function. As an application, we give a trace formula with respect to the twisted Grover matrix of a mixed digraph. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: 17 pages

MSC Class: 05C50; 11M06

arXiv:2105.02677 [pdf, ps, other]

The scattering matrix with respect to an Hermitian matrix of a graph

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: Recently, Gnutzmann and Smilansky presented a formula for the bond scattering matrix of a graph with respect to a Hermitian matrix. We present another proof for this Gnutzmann and Smilansky's formula by a technique used in the zeta function of a graph. Furthermore, we generalize Gnutzmann and Smilansky's formula to a regular covering of a graph. Finally, we define an $L$-fuction of a graph, and pr… ▽ More Recently, Gnutzmann and Smilansky presented a formula for the bond scattering matrix of a graph with respect to a Hermitian matrix. We present another proof for this Gnutzmann and Smilansky's formula by a technique used in the zeta function of a graph. Furthermore, we generalize Gnutzmann and Smilansky's formula to a regular covering of a graph. Finally, we define an $L$-fuction of a graph, and present a determinant expression. As a corollary, we express the generalization of Gnutzmann and Smilansky's formula to a regular covering of a graph by using its $L$-functions. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: 21 pages. arXiv admin note: substantial text overlap with arXiv:1211.4719

MSC Class: 05C50; 15A15

arXiv:2104.10287 [pdf, ps, other]

Walk/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: Our previous work presented explicit formulas for the generalized zeta function and the generalized Ihara zeta function corresponding to the Grover walk and the positive-support version of the Grover walk on the regular graph via the Konno-Sato theorem, respectively. This paper extends these walks to a class of walks including random walks, correlated random walks, quantum walks, and open quantum… ▽ More Our previous work presented explicit formulas for the generalized zeta function and the generalized Ihara zeta function corresponding to the Grover walk and the positive-support version of the Grover walk on the regular graph via the Konno-Sato theorem, respectively. This paper extends these walks to a class of walks including random walks, correlated random walks, quantum walks, and open quantum random walks on the torus by the Fourier analysis. △ Less

Submitted 20 December, 2022; v1 submitted 20 April, 2021; originally announced April 2021.

Comments: 31 pages

Journal ref: Journal of Statistical Physics, volume 190, Article number: 36 (2023)

arXiv:2104.05014 [pdf, other]

One Ring to Rule Them All: a simple solution to multi-view 3D-Reconstruction of shapes with unknown BRDF via a small Recurrent ResNet

Authors: Ziang Cheng, Hongdong Li, Richard Hartley, Yinqiang Zheng, Imari Sato

Abstract: This paper proposes a simple method which solves an open problem of multi-view 3D-Reconstruction for objects with unknown and generic surface materials, imaged by a freely moving camera and a freely moving point light source. The object can have arbitrary (e.g. non-Lambertian), spatially-varying (or everywhere different) surface reflectances (svBRDF). Our solution consists of two smallsized neural… ▽ More This paper proposes a simple method which solves an open problem of multi-view 3D-Reconstruction for objects with unknown and generic surface materials, imaged by a freely moving camera and a freely moving point light source. The object can have arbitrary (e.g. non-Lambertian), spatially-varying (or everywhere different) surface reflectances (svBRDF). Our solution consists of two smallsized neural networks (dubbed the 'Shape-Net' and 'BRDFNet'), each having about 1,000 neurons, used to parameterize the unknown shape and unknown svBRDF, respectively. Key to our method is a special network design (namely, a ResNet with a global feedback or 'ring' connection), which has a provable guarantee for finding a valid diffeomorphic shape parameterization. Despite the underlying problem is highly non-convex hence impractical to solve by traditional optimization techniques, our method converges reliably to high quality solutions, even without initialization. Extensive experiments demonstrate the superiority of our method, and it naturally enables a wide range of special-effect applications including novel-view-synthesis, relighting, material retouching, and shape exchange without additional coding effort. We encourage the reader to view our demo video for better visualizations. △ Less

Submitted 11 April, 2021; originally announced April 2021.

arXiv:2104.01601 [pdf, other]

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes

Authors: Zhihang Zhong, Yinqiang Zheng, Imari Sato

Abstract: Joint rolling shutter correction and deblurring (RSCD) techniques are critical for the prevalent CMOS cameras. However, current approaches are still based on conventional energy optimization and are developed for static scenes. To enable learning-based approaches to address real-world RSCD problem, we contribute the first dataset, BS-RSCD, which includes both ego-motion and object-motion in dynami… ▽ More Joint rolling shutter correction and deblurring (RSCD) techniques are critical for the prevalent CMOS cameras. However, current approaches are still based on conventional energy optimization and are developed for static scenes. To enable learning-based approaches to address real-world RSCD problem, we contribute the first dataset, BS-RSCD, which includes both ego-motion and object-motion in dynamic scenes. Real distorted and blurry videos with corresponding ground truth are recorded simultaneously via a beam-splitter-based acquisition system. Since direct application of existing individual rolling shutter correction (RSC) or global shutter deblurring (GSD) methods on RSCD leads to undesirable results due to inherent flaws in the network architecture, we further present the first learning-based model (JCD) for RSCD. The key idea is that we adopt bi-directional war** streams for displacement compensation, while also preserving the non-warped deblurring stream for details restoration. The experimental results demonstrate that JCD achieves state-of-the-art performance on the realistic RSCD dataset (BS-RSCD) and the synthetic RSC dataset (Fastec-RS). The dataset and code are available at https://github.com/zzh-tech/RSCD. △ Less

Submitted 4 April, 2021; originally announced April 2021.

Comments: To be published in CVPR 2021

arXiv:2103.12971 [pdf, ps, other]

Grover/Zeta Correspondence based on the Konno-Sato theorem

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: Recently the Ihara zeta function for the finite graph was extended to infinite one by Clair and Chinta et al. In this paper, we obtain the same expressions by a different approach from their analytical method. Our new approach is to take a suitable limit of a sequence of finite graphs via the Konno-Sato theorem. This theorem is related to explicit formulas of characteristic polynomials for the evo… ▽ More Recently the Ihara zeta function for the finite graph was extended to infinite one by Clair and Chinta et al. In this paper, we obtain the same expressions by a different approach from their analytical method. Our new approach is to take a suitable limit of a sequence of finite graphs via the Konno-Sato theorem. This theorem is related to explicit formulas of characteristic polynomials for the evolution matrix of the Grover walk. The walk is one of the most well-investigated quantum walks which are quantum counterpart of classical random walks. We call the relation between the Grover walk and the zeta function based on the Konno-Sato theorem "Grover/Zeta Correspondence" here. △ Less

Submitted 19 August, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

Comments: 12 pages. arXiv admin note: text overlap with arXiv:2011.14162

Journal ref: Quantum Information Processing, Volume 20, Article number: 268 (2021)

arXiv:2103.09414 [pdf, other]

Toward Neural-Network-Guided Program Synthesis and Verification

Authors: Naoki Kobayashi, Taro Sekiyama, Issei Sato, Hiroshi Unno

Abstract: We propose a novel framework of program and invariant synthesis called neural network-guided synthesis. We first show that, by suitably designing and training neural networks, we can extract logical formulas over integers from the weights and biases of the trained neural networks. Based on the idea, we have implemented a tool to synthesize formulas from positive/negative examples and implication c… ▽ More We propose a novel framework of program and invariant synthesis called neural network-guided synthesis. We first show that, by suitably designing and training neural networks, we can extract logical formulas over integers from the weights and biases of the trained neural networks. Based on the idea, we have implemented a tool to synthesize formulas from positive/negative examples and implication constraints, and obtained promising experimental results. We also discuss two applications of our synthesis method. One is the use of our tool for qualifier discovery in the framework of ICE-learning-based CHC solving, which can in turn be applied to program verification and inductive invariant synthesis. Another application is to a new program development framework called oracle-based programming, which is a neural-network-guided variation of Solar-Lezama's program synthesis by sketching. △ Less

Submitted 25 August, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

Comments: A summary will appear in Proceedings of SAS 2021, Springer LNCS

arXiv:2102.12232 [pdf, ps, other]

Abelian Neural Networks

Authors: Kenshin Abe, Takanori Maehara, Issei Sato

Abstract: We study the problem of modeling a binary operation that satisfies some algebraic requirements. We first construct a neural network architecture for Abelian group operations and derive a universal approximation property. Then, we extend it to Abelian semigroup operations using the characterization of associative symmetric polynomials. Both models take advantage of the analytic invertibility of inv… ▽ More We study the problem of modeling a binary operation that satisfies some algebraic requirements. We first construct a neural network architecture for Abelian group operations and derive a universal approximation property. Then, we extend it to Abelian semigroup operations using the characterization of associative symmetric polynomials. Both models take advantage of the analytic invertibility of invertible neural networks. For each case, by repeating the binary operations, we can represent a function for multiset input thanks to the algebraic structure. Naturally, our multiset architecture has size-generalization ability, which has not been obtained in existing methods. Further, we present modeling the Abelian group operation itself is useful in a word analogy task. We train our models over fixed word embeddings and demonstrate improved performance over the original word2vec and another naive learning method. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:2102.09486 [pdf, ps, other]

Zeta functions of periodic graphs derived from quantum walk

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: We define a zeta function of a finite graph derived from time evolution matrix of quantum walk, and give its determinant expression. Furthermore, we generalize the above result to a periodic graph. We define a zeta function of a finite graph derived from time evolution matrix of quantum walk, and give its determinant expression. Furthermore, we generalize the above result to a periodic graph. △ Less

Submitted 11 February, 2021; originally announced February 2021.

Comments: 16 pages. arXiv admin note: substantial text overlap with arXiv:1910.12782

MSC Class: 60F05; 05C50; 15A15; 05C25

Showing 1–50 of 154 results for author: Sato, I