Skip to main content

Showing 51–100 of 154 results for author: Sato, I

.
  1. arXiv:2102.06866  [pdf, other

    cs.LG stat.ML

    Understanding Negative Samples in Instance Discriminative Self-supervised Representation Learning

    Authors: Kento Nozawa, Issei Sato

    Abstract: Instance discriminative self-supervised representation learning has been attracted attention thanks to its unsupervised nature and informative feature representation for downstream tasks. In practice, it commonly uses a larger number of negative samples than the number of supervised classes. However, there is an inconsistency in the existing analysis; theoretically, a large number of negative samp… ▽ More

    Submitted 14 January, 2022; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021. 26 pages, 6 figures, and 6 tables

  2. arXiv:2102.00678  [pdf, other

    cs.LG stat.ML

    Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification

    Authors: Nan Lu, Shida Lei, Gang Niu, Issei Sato, Masashi Sugiyama

    Abstract: To cope with high annotation costs, training a classifier only from weakly supervised data has attracted a great deal of attention these days. Among various approaches, strengthening supervision from completely unsupervised classification is a promising direction, which typically employs class priors as the only supervision and trains a binary classifier from unlabeled (U) datasets. While existing… ▽ More

    Submitted 11 June, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: ICML2021 camera-ready version

  3. arXiv:2012.09619  [pdf, ps, other

    math.PR math.CO

    A Characteristic Polynomial for The Transition Probability Matrix of A Correlated Random Walk on A Graph

    Authors: Takashi Komatsu, Norio Konno, Iwao Sato

    Abstract: We define a correlated random walk (CRW) induced from the time evolution matrix (the Grover matrix) of the Grover walk on a graph $G$, and present a formula for the characteristic polynomial of the transition probability matrix of this CRW by using a determinant expression for the generalized weighted zeta function of $G$. As applications, we give the spectrum of the transition probability matrice… ▽ More

    Submitted 18 December, 2020; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: 16 pages. arXiv admin note: text overlap with arXiv:2011.14162

    MSC Class: 05C50; 15A15

  4. arXiv:2011.14162  [pdf, ps, other

    math.CO math-ph

    A note on the Grover walk and the generalized Ihara zeta function of the one-dimensional integer lattice

    Authors: Takashi Komatsu, Norio Konno, Iwao Sato

    Abstract: Chinta, Jorgenson and Karlsson introduced a generalized version of the determinant formula for the Ihara zeta function associated to finite or infinite regular graphs. On the other hand, Konno and Sato obtained a formula of the characteristic polynomial of the Grover matrix by using the determinant expression for the second weighted zeta function of a finite graph. In this paper, we focus on a rel… ▽ More

    Submitted 15 December, 2021; v1 submitted 28 November, 2020; originally announced November 2020.

    Comments: 8 pages, Yokohama Mathematical Journal (in press)

  5. arXiv:2011.11152  [pdf, other

    cs.LG cs.AI

    On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective

    Authors: Zeke Xie, Zhiqiang Xu, **gzhao Zhang, Issei Sato, Masashi Sugiyama

    Abstract: Weight decay is a simple yet powerful regularization technique that has been very widely used in training of deep neural networks (DNNs). While weight decay has attracted much attention, previous studies fail to discover some overlooked pitfalls on large gradient norms resulted by weight decay. In this paper, we discover that, weight decay can unfortunately lead to large gradient norms at the fina… ▽ More

    Submitted 19 October, 2023; v1 submitted 22 November, 2020; originally announced November 2020.

    Comments: NeurIPS 2023, 21 pages, 20 figures. Keywords: Weight Decay, Regularization, Optimization, Deep Learning

  6. arXiv:2011.06220  [pdf, other

    cs.LG

    Artificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting

    Authors: Zeke Xie, Fengxiang He, Shaopeng Fu, Issei Sato, Dacheng Tao, Masashi Sugiyama

    Abstract: Deep learning is often criticized by two serious issues which rarely exist in natural nervous systems: overfitting and catastrophic forgetting. It can even memorize randomly labelled data, which has little knowledge behind the instance-label pairs. When a deep network continually learns over time by accommodating new tasks, it usually quickly overwrites the knowledge learned from previous tasks. R… ▽ More

    Submitted 10 May, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

    Comments: Accepted by Neural Computation, MIT Press;20 pages; 13 figures; Key Words: Neural Variability, Neuroscience, Deep Learning, Label Noise, Catastrophic Forgetting

  7. arXiv:2008.00645  [pdf, other

    cs.LG stat.ML

    Active Classification with Uncertainty Comparison Queries

    Authors: Zhenghang Cui, Issei Sato

    Abstract: Noisy pairwise comparison feedback has been incorporated to improve the overall query complexity of interactively learning binary classifiers. The \textit{positivity comparison oracle} is used to provide feedback on which is more likely to be positive given a pair of data points. Because it is impossible to infer accurate labels using this oracle alone \textit{without knowing the classification th… ▽ More

    Submitted 28 October, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: Code and Dataset: https://github.com/zchenry/uncertainty-comparison

  8. arXiv:2007.01659  [pdf, other

    stat.ML cs.LG

    Diagnostic Uncertainty Calibration: Towards Reliable Machine Predictions in Medical Domain

    Authors: Takahiro Mimori, Keiko Sasada, Hirotaka Matsui, Issei Sato

    Abstract: We propose an evaluation framework for class probability estimates (CPEs) in the presence of label uncertainty, which is commonly observed as diagnosis disagreement between experts in the medical domain. We also formalize evaluation metrics for higher-order statistics, including inter-rater disagreement, to assess predictions on label uncertainty. Moreover, we propose a novel post-hoc method calle… ▽ More

    Submitted 22 March, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: 31 pages, 6 figures

  9. arXiv:2006.15815  [pdf, other

    cs.LG stat.ML

    Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

    Authors: Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

    Abstract: Adaptive Moment Estimation (Adam), which combines Adaptive Learning Rate and Momentum, would be the most popular stochastic optimizer for accelerating the training of deep neural networks. However, it is empirically known that Adam often generalizes worse than Stochastic Gradient Descent (SGD). The purpose of this paper is to unveil the mystery of this behavior in the diffusion theoretical framewo… ▽ More

    Submitted 14 June, 2022; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: ICML2022, Long Oral Presentation, 30 pages, 14 figures, Key Words: Deep Learning Theory, Optimization, Adam, Adaptive Inertia, Flat Minima

  10. arXiv:2006.08306  [pdf, other

    cs.LG stat.ML

    LFD-ProtoNet: Prototypical Network Based on Local Fisher Discriminant Analysis for Few-shot Learning

    Authors: Kei Mukaiyama, Issei Sato, Masashi Sugiyama

    Abstract: The prototypical network (ProtoNet) is a few-shot learning framework that performs metric learning and classification using the distance to prototype representations of each class. It has attracted a great deal of attention recently since it is simple to implement, highly extensible, and performs well in experiments. However, it only takes into account the mean of the support vectors as prototypes… ▽ More

    Submitted 25 September, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: 20 pages

    MSC Class: 68T01(Primary); 68T05(Secondary)

  11. arXiv:2006.07571  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    $γ$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a Robust Divergence Estimator

    Authors: Masahiro Fujisawa, Takeshi Teshima, Issei Sato, Masashi Sugiyama

    Abstract: Approximate Bayesian computation (ABC) is a likelihood-free inference method that has been employed in various applications. However, ABC can be sensitive to outliers if a data discrepancy measure is chosen inappropriately. In this paper, we propose to use a nearest-neighbor-based $γ$-divergence estimator as a data discrepancy measure. We show that our estimator possesses a suitable theoretical ro… ▽ More

    Submitted 5 March, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

    Comments: The 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021); 48 pages, 22 figures

  12. arXiv:2006.06207  [pdf, other

    stat.ML cs.LG

    Pairwise Supervision Can Provably Elicit a Decision Boundary

    Authors: Han Bao, Takuya Shimada, Liyuan Xu, Issei Sato, Masashi Sugiyama

    Abstract: Similarity learning is a general problem to elicit useful representations by predicting the relationship between a pair of patterns. This problem is related to various important preprocessing tasks such as metric learning, kernel learning, and contrastive learning. A classifier built upon the representations is expected to perform well in downstream classification; however, little theory has been… ▽ More

    Submitted 28 February, 2022; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: In Proceedings of AISTATS2021

  13. arXiv:2005.09341  [pdf, ps, other

    math.CO math.NT

    The limit theorem with respect to the matrices on non-backtracking paths of a graph

    Authors: Takehiro Hasegawa, Takashi Komatsu, Norio Konno, Hayato Saigo, Seiken Saito, Iwao Sato, Shingo Sugiyama

    Abstract: We give a limit theorem with respect to the matrices related to non-backtracking paths of a regular graph. The limit obtained closely resembles the $k$th moments of the arcsine law. Furthermore, we obtain the asymptotics of the averages of the $p^m$th Fourier coefficients of the cusp forms related to the Ramanujan graphs defined by A. Lubotzky, R. Phillips and P. Sarnak.

    Submitted 16 October, 2022; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: The draft is improved: Typos are fixed, Corollaries 4.3 and 4.5 are improved, and Proposition 4.6 and Remark 4.7 are added

    MSC Class: 05C38 (Primary); 05C50; 11F30 (Secondary)

  14. arXiv:2005.04107  [pdf, other

    cs.GR cs.HC cs.LG

    Sequential Gallery for Interactive Visual Design Optimization

    Authors: Yuki Koyama, Issei Sato, Masataka Goto

    Abstract: Visual design tasks often involve tuning many design parameters. For example, color grading of a photograph involves many parameters, some of which non-expert users might be unfamiliar with. We propose a novel user-in-the-loop optimization method that allows users to efficiently find an appropriate parameter set by exploring such a high-dimensional design space through much easier two-dimensional… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: To be published at ACM Trans. Graph. (Proc. SIGGRAPH 2020); Project page available at https://koyama.xyz/project/sequential_gallery/

    Journal ref: ACM Trans. Graph. 39, 4 (July 2020), pp.88:1-88:12

  15. arXiv:2003.04691  [pdf, other

    stat.ML cs.LG

    Time-varying Gaussian Process Bandit Optimization with Non-constant Evaluation Time

    Authors: Hideaki Imamura, Nontawat Charoenphakdee, Futoshi Futami, Issei Sato, Junya Honda, Masashi Sugiyama

    Abstract: The Gaussian process bandit is a problem in which we want to find a maximizer of a black-box function with the minimum number of function evaluations. If the black-box function varies with time, then time-varying Bayesian optimization is a promising framework. However, a drawback with current methods is in the assumption that the evaluation time for every observation is constant, which can be unre… ▽ More

    Submitted 10 March, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

  16. arXiv:2002.03497  [pdf, other

    cs.LG stat.ML

    Few-shot Domain Adaptation by Causal Mechanism Transfer

    Authors: Takeshi Teshima, Issei Sato, Masashi Sugiyama

    Abstract: We study few-shot supervised domain adaptation (DA) for regression problems, where only a few labeled target domain data and many labeled source domain data are available. Many of the current DA methods base their transfer assumptions on either parametrized distribution shift or apparent distribution similarities, e.g., identical conditionals or small distributional discrepancies. However, these a… ▽ More

    Submitted 18 August, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

    Comments: 33 pages, 3 figures. Camera-ready version for Thirty-seventh International Conference on Machine Learning (ICML 2020)

  17. arXiv:2002.03495  [pdf, other

    cs.LG stat.ML

    A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima

    Authors: Zeke Xie, Issei Sato, Masashi Sugiyama

    Abstract: Stochastic Gradient Descent (SGD) and its variants are mainstream methods for training deep networks in practice. SGD is known to find a flat minimum that often generalizes well. However, it is mathematically unclear how deep learning can select a flat minimum among so many minima. To answer the question quantitatively, we develop a density diffusion theory (DDT) to reveal how minima selection qua… ▽ More

    Submitted 15 January, 2021; v1 submitted 9 February, 2020; originally announced February 2020.

    Comments: ICLR 2021; 28 pages; 19 figures

  18. arXiv:2001.07847  [pdf, other

    eess.IV cs.CV cs.LG

    A versatile anomaly detection method for medical images with a flow-based generative model in semi-supervision setting

    Authors: H. Shibata, S. Hanaoka, Y. Nomura, T. Nakao, I. Sato, D. Sato, N. Hayashi, O. Abe

    Abstract: Oversight in medical images is a crucial problem, and timely reporting of medical images is desired. Therefore, an all-purpose anomaly detection method that can detect virtually all types of lesions/diseases in a given image is strongly desired. However, few commercially available and versatile anomaly detection methods for medical images have been provided so far. Recently, anomaly detection meth… ▽ More

    Submitted 20 October, 2020; v1 submitted 21 January, 2020; originally announced January 2020.

  19. arXiv:1911.09011  [pdf, other

    stat.ML cs.LG

    Bayesian interpretation of SGD as Ito process

    Authors: Soma Yokoi, Issei Sato

    Abstract: The current interpretation of stochastic gradient descent (SGD) as a stochastic process lacks generality in that its numerical scheme restricts continuous-time dynamics as well as the loss function and the distribution of gradient noise. We introduce a simplified scheme with milder conditions that flexibly interprets SGD as a discrete-time approximation of an Ito process. The scheme also works as… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

  20. arXiv:1911.06181  [pdf, other

    cs.CV cs.LG

    Adversarial Transformations for Semi-Supervised Learning

    Authors: Teppei Suzuki, Ikuro Sato

    Abstract: We propose a Regularization framework based on Adversarial Transformations (RAT) for semi-supervised learning. RAT is designed to enhance robustness of the output distribution of class prediction for a given data against input perturbation. RAT is an extension of Virtual Adversarial Training (VAT) in such a way that RAT adversarialy transforms data along the underlying data distribution by a rich… ▽ More

    Submitted 18 November, 2019; v1 submitted 13 November, 2019; originally announced November 2019.

    Comments: Accepted by AAAI 2020

  21. arXiv:1911.06060  [pdf, ps, other

    math.CO

    A zeta function related to the transition matrix of the discrete-time quantum walk on a graph

    Authors: Norio Konno, Iwao Sato, Etsuo Segawa

    Abstract: We present the structure theorem for the positive support of the cube of the Grover transition matrix of the discrete-time quantum walk (the Grover walk) on a general graph $G$ under same condition. Thus, we introduce a zeta function on the positive support of the cube of the Grover transition matrix of $G$, and present its Euler product and its determinant expression. As a corollary, we give the… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.

    Comments: arXiv admin note: text overlap with arXiv:1103.0079

    MSC Class: 60F50; 05C50; 15A15; 05C60

  22. arXiv:1910.12782  [pdf, ps, other

    math.CO

    Zeta functions with respect to general coined quantum walk of periodic graphs

    Authors: Takashi Komatsu, Norio Konno, Iwao Sato

    Abstract: We define a zeta function of a graph by using the time evolution matrix of a general coined quantum walk on it, and give a determinant expression for the zeta function of a finite graph. Furthermore, we present a determinant expression for the zeta function of an (infinite) periodic graph.

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: 14 pages

    MSC Class: 60F50; 05C50; 15A15; 05C60

  23. arXiv:1908.09051  [pdf, ps, other

    math-ph

    A walk on max-plus algebra

    Authors: Sennosuke Watanabe, Akiko Fukuda, Etsuo Segawa, Iwao Sato

    Abstract: Max-plus algebra is a kind of idempotent semiring over $\mathbb{R}_{\max}:=\mathbb{R}\cup\{-\infty\}$ with two operations $\oplus := \max$ and $\otimes := +$.In this paper, we introduce a new model of a walk on one dimensional lattice on $\mathbb{Z}$, as an analogue of the quantum walk, over the max-plus algebra and we call it max-plus walk. In the conventional quantum walk, the summation of the… ▽ More

    Submitted 29 August, 2019; v1 submitted 23 August, 2019; originally announced August 2019.

    Comments: 17 pages, 1 figures

  24. arXiv:1907.10225  [pdf, ps, other

    cs.LG stat.ML

    Classification from Triplet Comparison Data

    Authors: Zhenghang Cui, Nontawat Charoenphakdee, Issei Sato, Masashi Sugiyama

    Abstract: Learning from triplet comparison data has been extensively studied in the context of metric learning, where we want to learn a distance metric between two instances, and ordinal embedding, where we want to learn an embedding in an Euclidean space of the given instances that preserves the comparison order as well as possible. Unlike fully-labeled data, triplet comparison data can be collected in a… ▽ More

    Submitted 18 April, 2020; v1 submitted 23 July, 2019; originally announced July 2019.

    Comments: Code: https://github.com/zchenry/triplet_classification

  25. arXiv:1906.09840  [pdf, other

    cs.GR cs.CV cs.HC cs.LG

    Interactive Optimization of Generative Image Modeling using Sequential Subspace Search and Content-based Guidance

    Authors: Toby Chong Long Hin, I-Chao Shen, Issei Sato, Takeo Igarashi

    Abstract: Generative image modeling techniques such as GAN demonstrate highly convincing image generation result. However, user interaction is often necessary to obtain the desired results. Existing attempts add interactivity but require either tailored architectures or extra data. We present a human-in-the-optimization method that allows users to directly explore and search the latent vector space of gener… ▽ More

    Submitted 29 August, 2020; v1 submitted 24 June, 2019; originally announced June 2019.

    Comments: 13 pages, Toby Chong Long Hin and I-Chao Shen contributed equally to the paper

  26. arXiv:1906.01150  [pdf, other

    cs.LG stat.ML

    Breaking Inter-Layer Co-Adaptation by Classifier Anonymization

    Authors: Ikuro Sato, Kohta Ishikawa, Guoqing Liu, Masayuki Tanaka

    Abstract: This study addresses an issue of co-adaptation between a feature extractor and a classifier in a neural network. A naive joint optimization of a feature extractor and a classifier often brings situations in which an excessively complex feature distribution adapted to a very specific classifier degrades the test performance. We introduce a method called Feature-extractor Optimization through Classi… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: 9 pages. Accepted to ICML 2019

  27. arXiv:1905.11623  [pdf, other

    cs.LG stat.ML

    Solving NP-Hard Problems on Graphs with Extended AlphaGo Zero

    Authors: Kenshin Abe, Zijian Xu, Issei Sato, Masashi Sugiyama

    Abstract: There have been increasing challenges to solve combinatorial optimization problems by machine learning. Khalil et al. proposed an end-to-end reinforcement learning framework, S2V-DQN, which automatically learns graph embeddings to construct solutions to a wide range of problems. To improve the generalization ability of their Q-learning method, we propose a novel learning strategy based on AlphaGo… ▽ More

    Submitted 7 March, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

  28. arXiv:1905.00593  [pdf, other

    cs.CV

    Directing DNNs Attention for Facial Attribution Classification using Gradient-weighted Class Activation Map**

    Authors: Xi Yang, Bojian Wu, Issei Sato, Takeo Igarashi

    Abstract: Deep neural networks (DNNs) have a high accuracy on image classification tasks. However, DNNs trained by such dataset with co-occurrence bias may rely on wrong features while making decisions for classification. It will greatly affect the transferability of pre-trained DNNs. In this paper, we propose an interactive method to direct classifiers paying attentions to the regions that are manually spe… ▽ More

    Submitted 2 May, 2019; originally announced May 2019.

    Comments: CVPR-19 Workshop on Explainable AI

  29. arXiv:1904.11717  [pdf, other

    cs.LG stat.ML

    Classification from Pairwise Similarities/Dissimilarities and Unlabeled Data via Empirical Risk Minimization

    Authors: Takuya Shimada, Han Bao, Issei Sato, Masashi Sugiyama

    Abstract: Pairwise similarities and dissimilarities between data points might be easier to obtain than fully labeled data in real-world classification problems, e.g., in privacy-aware situations. To handle such pairwise information, an empirical risk minimization approach has been proposed, giving an unbiased estimator of the classification risk that can be computed only from pairwise similarities and unlab… ▽ More

    Submitted 26 April, 2019; originally announced April 2019.

  30. arXiv:1903.12053  [pdf

    eess.IV

    Imaging cytometry without image reconstruction (ghost cytometry)

    Authors: Sadao Ota, Ryoichi Horisaki, Yoko Kawamura, Issei Sato, Hiroyuki Noji

    Abstract: Imaging and analysis of many single cells hold great potential in our understanding of heterogeneous and complex life systems and in enabling biomedical applications. We here introduce a recently realized image-free "imaging" cytometry technology, which we call ghost cytometry. While a compressive ghost imaging technique utilizing object's motion relative to a projected static light pattern allows… ▽ More

    Submitted 27 March, 2019; originally announced March 2019.

  31. arXiv:1903.09538  [pdf

    q-bio.QM cs.LG eess.IV

    Use of Ghost Cytometry to Differentiate Cells with Similar Gross Morphologic Characteristics

    Authors: Hiroaki Adachi, Yoko Kawamura, Keiji Nakagawa, Ryoichi Horisaki, Issei Sato, Satoko Yamaguchi, Katsuhito Fujiu, Kayo Waki, Hiroyuki Noji, Sadao Ota

    Abstract: Imaging flow cytometry shows significant potential for increasing our understanding of heterogeneous and complex life systems and is useful for biomedical applications. Ghost cytometry is a recently proposed approach for directly analyzing compressively measured signals, thereby relieving the computational bottleneck observed in high-throughput cytometry based on morphological information. While t… ▽ More

    Submitted 22 March, 2019; originally announced March 2019.

  32. arXiv:1903.06009  [pdf, other

    eess.IV cs.LG stat.ML

    On Learning from Ghost Imaging without Imaging

    Authors: Issei Sato

    Abstract: Computational ghost imaging is an imaging technique in which an object is imaged from light collected using a single-pixel detector with no spatial resolution. Recently, ghost cytometry has been proposed for a high-speed cell-classification method that involves ghost imaging and machine learning in flow cytometry. Ghost cytometry skips the reconstruction of cell images from signals and directly us… ▽ More

    Submitted 29 May, 2019; v1 submitted 14 March, 2019; originally announced March 2019.

  33. arXiv:1903.02750  [pdf, other

    stat.ML cs.LG

    On Transformations in Stochastic Gradient MCMC

    Authors: Soma Yokoi, Takuma Otsuka, Issei Sato

    Abstract: Stochastic gradient Langevin dynamics (SGLD) is a computationally efficient sampler for Bayesian posterior inference given a large scale dataset. Although SGLD is designed for unbounded random variables, many practical models incorporate variables with boundaries such as non-negative ones or those in a finite interval. To bridge this gap, we consider map** unbounded samples into the target inter… ▽ More

    Submitted 20 June, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

  34. arXiv:1902.04247  [pdf, ps, other

    cs.LG cs.CL stat.ML

    PAC-Bayes Analysis of Sentence Representation

    Authors: Kento Nozawa, Issei Sato

    Abstract: Learning sentence vectors from an unlabeled corpus has attracted attention because such vectors can represent sentences in a lower dimensional and continuous space. Simple heuristics using pre-trained word vectors are widely applied to machine learning tasks. However, they are not well understood from a theoretical perspective. We analyze learning sentence vectors from a transfer learning perspect… ▽ More

    Submitted 13 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: fix styles

  35. arXiv:1902.01056  [pdf, other

    cs.LG stat.ML

    Online Multiclass Classification Based on Prediction Margin for Partial Feedback

    Authors: Takuo Kaneko, Issei Sato, Masashi Sugiyama

    Abstract: We consider the problem of online multiclass classification with partial feedback, where an algorithm predicts a class for a new instance in each round and only receives its correctness. Although several methods have been developed for this problem, recent challenging real-world applications require further performance improvement. In this paper, we propose a novel online learning algorithm inspir… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

  36. arXiv:1902.00468  [pdf, other

    stat.ML cs.LG

    Multilevel Monte Carlo Variational Inference

    Authors: Masahiro Fujisawa, Issei Sato

    Abstract: We propose a variance reduction framework for variational inference using the Multilevel Monte Carlo (MLMC) method. Our framework is built on reparameterized gradient estimators and "recycles" parameters obtained from past update history in optimization. In addition, our framework provides a new optimization algorithm based on stochastic gradient descent (SGD) that adaptively estimates the sample… ▽ More

    Submitted 2 December, 2021; v1 submitted 1 February, 2019; originally announced February 2019.

    Comments: 44pages, 10 figures; Journal of Machine Learning Research (JMLR)

  37. arXiv:1901.11351  [pdf, other

    cs.LG stat.ML

    Semi-Supervised Ordinal Regression Based on Empirical Risk Minimization

    Authors: Taira Tsuchiya, Nontawat Charoenphakdee, Issei Sato, Masashi Sugiyama

    Abstract: Ordinal regression is aimed at predicting an ordinal class label. In this paper, we consider its semi-supervised formulation, in which we have unlabeled data along with ordinal-labeled data to train an ordinal regressor. There are several metrics to evaluate the performance of ordinal regression, such as the mean absolute error, mean zero-one error, and mean squared error. However, the existing st… ▽ More

    Submitted 10 June, 2021; v1 submitted 31 January, 2019; originally announced January 2019.

    Comments: 38 pages, 9 figures

  38. arXiv:1901.04653  [pdf, other

    stat.ML cs.LG

    Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis

    Authors: Yusuke Tsuzuku, Issei Sato, Masashi Sugiyama

    Abstract: The notion of flat minima has played a key role in the generalization studies of deep learning models. However, existing definitions of the flatness are known to be sensitive to the rescaling of parameters. The issue suggests that the previous definitions of the flatness might not be a good measure of generalization, because generalization is invariant to such rescalings. In this paper, from the P… ▽ More

    Submitted 28 January, 2019; v1 submitted 14 January, 2019; originally announced January 2019.

  39. Pathological Evidence Exploration in Deep Retinal Image Diagnosis

    Authors: Yuhao Niu, Lin Gu, Feng Lu, Feifan Lv, Zongji Wang, Imari Sato, Zijian Zhang, Yangyan Xiao, Xunzhang Dai, Tingting Cheng

    Abstract: Though deep learning has shown successful performance in classifying the label and severity stage of certain disease, most of them give few evidence on how to make prediction. Here, we propose to exploit the interpretability of deep learning application in medical diagnosis. Inspired by Koch's Postulates, a well-known strategy in medical research to identify the property of pathogen, we define a p… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

    Comments: to appear in AAAI (2019). The first two authors contributed equally to the paper. Corresponding Author: Feng Lu

    Journal ref: AAAI 2019: 1093-1101

  40. arXiv:1811.12104  [pdf, other

    cs.CV

    Generating Easy-to-Understand Referring Expressions for Target Identifications

    Authors: Mikihiro Tanaka, Takayuki Itamochi, Kenichi Narioka, Ikuro Sato, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: This paper addresses the generation of referring expressions that not only refer to objects correctly but also let humans find them quickly. As a target becomes relatively less salient, identifying referred objects itself becomes more difficult. However, the existing studies regarded all sentences that refer to objects correctly as equally good, ignoring whether they are easily understood by human… ▽ More

    Submitted 29 August, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

  41. arXiv:1811.02116  [pdf, ps, other

    quant-ph

    Eigenbasis of the Evolution Operator of 2-Tessellable Quantum Walks

    Authors: Yusuke Higuchi, Renato Portugal, Iwao Sato, Etsuo Segawa

    Abstract: Staggered quantum walks on graphs are based on the concept of graph tessellation and generalize some well-known discrete-time quantum walk models. In this work, we address the class of 2-tessellable quantum walks with the goal of obtaining an eigenbasis of the evolution operator. By interpreting the evolution operator as a quantum Markov chain on an underlying multigraph, we define the concept of… ▽ More

    Submitted 5 November, 2018; originally announced November 2018.

    Comments: 21 pages, 3 figures

  42. arXiv:1809.04997  [pdf, other

    cs.LG stat.ML

    Clipped Matrix Completion: A Remedy for Ceiling Effects

    Authors: Takeshi Teshima, Miao Xu, Issei Sato, Masashi Sugiyama

    Abstract: We consider the problem of recovering a low-rank matrix from its clipped observations. Clip** is conceivable in many scientific areas that obstructs statistical analyses. On the other hand, matrix completion (MC) methods can recover a low-rank matrix from various information deficits by using the principle of low-rank completion. However, the current theoretical guarantees for low-rank MC do not… ▽ More

    Submitted 4 March, 2019; v1 submitted 13 September, 2018; originally announced September 2018.

    Comments: 36 pages, 3 figures, The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)

  43. arXiv:1809.04820  [pdf, other

    cs.CV

    Canonical and Compact Point Cloud Representation for Shape Classification

    Authors: Kent Fujiwara, Ikuro Sato, Mitsuru Ambai, Yuichi Yoshida, Yoshiaki Sakakura

    Abstract: We present a novel compact point cloud representation that is inherently invariant to scale, coordinate change and point permutation. The key idea is to parametrize a distance field around an individual shape into a unique, canonical, and compact vector in an unsupervised manner. We firstly project a distance field to a $4$D canonical space using singular value decomposition. We then train a neura… ▽ More

    Submitted 13 September, 2018; originally announced September 2018.

    Comments: 16 pages, 5 figures

  44. arXiv:1809.04098  [pdf, other

    cs.CV

    On the Structural Sensitivity of Deep Convolutional Networks to the Directions of Fourier Basis Functions

    Authors: Yusuke Tsuzuku, Issei Sato

    Abstract: Data-agnostic quasi-imperceptible perturbations on inputs are known to degrade recognition accuracy of deep convolutional networks severely. This phenomenon is considered to be a potential security issue. Moreover, some results on statistical generalization guarantees indicate that the phenomenon can be a key to improve the networks' generalization. However, the characteristics of the shared direc… ▽ More

    Submitted 17 April, 2019; v1 submitted 11 September, 2018; originally announced September 2018.

    Comments: CVPR 2019

  45. arXiv:1809.03839  [pdf, other

    cs.LG stat.ML

    Unsupervised Domain Adaptation Based on Source-guided Discrepancy

    Authors: Seiichi Kuroki, Nontawat Charoenphakdee, Han Bao, Junya Honda, Issei Sato, Masashi Sugiyama

    Abstract: Unsupervised domain adaptation is the problem setting where data generating distributions in the source and target domains are different, and labels in the target domain are unavailable. One important question in unsupervised domain adaptation is how to measure the difference between the source and target domains. A previously proposed discrepancy that does not use the source domain labels require… ▽ More

    Submitted 19 November, 2018; v1 submitted 11 September, 2018; originally announced September 2018.

    Comments: To appear in AAAI-19

  46. arXiv:1805.07912  [pdf, ps, other

    stat.ML cs.LG

    Bayesian posterior approximation via greedy particle optimization

    Authors: Futoshi Futami, Zhenghang Cui, Issei Sato, Masashi Sugiyama

    Abstract: In Bayesian inference, the posterior distributions are difficult to obtain analytically for complex models such as neural networks. Variational inference usually uses a parametric distribution for approximation, from which we can easily draw samples. Recently discrete approximation by particles has attracted attention because of its high expression ability. An example is Stein variational gradient… ▽ More

    Submitted 31 January, 2019; v1 submitted 21 May, 2018; originally announced May 2018.

  47. arXiv:1803.04232  [pdf, other

    stat.ML

    Variational Inference for Gaussian Process with Panel Count Data

    Authors: Hongyi Ding, Young Lee, Issei Sato, Masashi Sugiyama

    Abstract: We present the first framework for Gaussian-process-modulated Poisson processes when the temporal data appear in the form of panel counts. Panel count data frequently arise when experimental subjects are observed only at discrete time points and only the numbers of occurrences of the events between subsequent observation times are available. The exact occurrence timestamps of the events are unknow… ▽ More

    Submitted 12 March, 2018; originally announced March 2018.

  48. arXiv:1802.04551  [pdf, other

    stat.ML cs.HC cs.LG

    Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model

    Authors: Hideaki Imamura, Issei Sato, Masashi Sugiyama

    Abstract: While crowdsourcing has become an important means to label data, there is great interest in estimating the ground truth from unreliable labels produced by crowdworkers. The Dawid and Skene (DS) model is one of the most well-known models in the study of crowdsourcing. Despite its practical popularity, theoretical error analysis for the DS model has been conducted only under restrictive assumptions… ▽ More

    Submitted 9 June, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

    Comments: Accepted to ICML2018 (International Conference on Machine Learning)

  49. arXiv:1802.04034  [pdf, other

    cs.CV cs.LG stat.ML

    Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks

    Authors: Yusuke Tsuzuku, Issei Sato, Masashi Sugiyama

    Abstract: High sensitivity of neural networks against malicious perturbations on inputs causes security concerns. To take a steady step towards robust classifiers, we aim to create neural network models provably defended from perturbations. Prior certification work requires strong assumptions on network structures and massive computational costs, and thus the range of their applications was limited. From th… ▽ More

    Submitted 31 October, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: To appear in NIPS2018

  50. arXiv:1802.03877  [pdf, other

    stat.ML

    Gaussian Process Classification with Privileged Information by Soft-to-Hard Labeling Transfer

    Authors: Ryosuke Kamesawa, Issei Sato, Masashi Sugiyama

    Abstract: Learning using privileged information is an attractive problem setting that helps many learning scenarios in the real world. A state-of-the-art method of Gaussian process classification (GPC) with privileged information is GPC+, which incorporates privileged information into a noise term of the likelihood. A drawback of GPC+ is that it requires numerical quadrature to calculate the posterior distr… ▽ More

    Submitted 11 February, 2018; originally announced February 2018.