Skip to main content

Showing 1–8 of 8 results for author: Matsushima, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.06499  [pdf, other

    stat.ML cs.IT cs.LG

    Detection of Unobserved Common Causes based on NML Code in Discrete, Mixed, and Continuous Variables

    Authors: Masatoshi Kobayashi, Kohei Miyagichi, Shin Matsushima

    Abstract: Causal discovery in the presence of unobserved common causes from observational data only is a crucial but challenging problem. We categorize all possible causal relationships between two random variables into the following four categories and aim to identify one from observed data: two cases in which either of the direct causality exists, a case that variables are independent, and a case that var… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: submitted to Journal of Data Mining and Knowledge Discovery

  2. arXiv:2203.14188  [pdf, ps, other

    cs.LG cs.CY cs.DC

    mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations

    Authors: Toyotaro Suzumura, Akiyoshi Sugiki, Hiroyuki Takizawa, Akira Imakura, Hiroshi Nakamura, Kenjiro Taura, Tomohiro Kudoh, Toshihiro Hanawa, Yuji Sekiya, Hiroki Kobayashi, Shin Matsushima, Yohei Kuga, Ryo Nakamura, Renhe Jiang, Junya Kawase, Masatoshi Hanai, Hiroshi Miyazaki, Tsutomu Ishizaki, Daisuke Shimotoku, Daisuke Miyamoto, Kento Aida, Atsuko Takefusa, Takashi Kurimoto, Koji Sasayama, Naoya Kitagawa , et al. (8 additional authors not shown)

    Abstract: The growing amount of data and advances in data science have created a need for a new kind of cloud platform that provides users with flexibility, strong security, and the ability to couple with supercomputers and edge devices through high-performance networks. We have built such a nation-wide cloud platform, called "mdx" to meet this need. The mdx platform's virtualization service, jointly operat… ▽ More

    Submitted 26 March, 2022; originally announced March 2022.

  3. arXiv:1802.03001  [pdf, ps, other

    stat.ML cs.LG

    Statistical Learnability of Generalized Additive Models based on Total Variation Regularization

    Authors: Shin Matsushima

    Abstract: A generalized additive model (GAM, Hastie and Tibshirani (1987)) is a nonparametric model by the sum of univariate functions with respect to each explanatory variable, i.e., $f({\mathbf x}) = \sum f_j(x_j)$, where $x_j\in\mathbb{R}$ is $j$-th component of a sample ${\mathbf x}\in \mathbb{R}^p$. In this paper, we introduce the total variation (TV) of a function as a measure of the complexity of fun… ▽ More

    Submitted 16 February, 2018; v1 submitted 8 February, 2018; originally announced February 2018.

  4. Grafting for Combinatorial Boolean Model using Frequent Itemset Mining

    Authors: Taito Lee, Shin Matsushima, Kenji Yamanishi

    Abstract: This paper introduces the combinatorial Boolean model (CBM), which is defined as the class of linear combinations of conjunctions of Boolean attributes. This paper addresses the issue of learning CBM from labeled data. CBM is of high knowledge interpretability but naïve learning of it requires exponentially large computation time with respect to data dimension and sample size. To overcome this com… ▽ More

    Submitted 13 November, 2017; v1 submitted 7 November, 2017; originally announced November 2017.

    Journal ref: Data Min Knowl Disc 34, 101-123 (2020)

  5. arXiv:1604.04706  [pdf, other

    cs.LG stat.ML

    DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression

    Authors: Parameswaran Raman, Sriram Srinivasan, Shin Matsushima, Xinhua Zhang, Hyokun Yun, S. V. N. Vishwanathan

    Abstract: Scaling multinomial logistic regression to datasets with very large number of data points and classes is challenging. This is primarily because one needs to compute the log-partition function on every data point. This makes distributing the computation hard. In this paper, we present a distributed stochastic gradient descent based optimization method (DS-MLR) for scaling up multinomial logistic re… ▽ More

    Submitted 3 August, 2018; v1 submitted 16 April, 2016; originally announced April 2016.

  6. arXiv:1506.02761  [pdf, other

    cs.CL cs.LG stat.ML

    WordRank: Learning Word Embeddings via Robust Ranking

    Authors: Shihao Ji, Hyokun Yun, Pinar Yanardag, Shin Matsushima, S. V. N. Vishwanathan

    Abstract: Embedding words in a vector space has gained a lot of attention in recent years. While state-of-the-art methods provide efficient computation of word similarities via a low-dimensional matrix embedding, their motivation is often left unclear. In this paper, we argue that word embedding can be naturally viewed as a ranking problem due to the ranking nature of the evaluation metrics. Then, based on… ▽ More

    Submitted 27 September, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

    Comments: Conference on Empirical Methods in Natural Language Processing (EMNLP), November 1-5, 2016, Austin, Texas, USA

  7. arXiv:1504.01446  [pdf, other

    cs.LG quant-ph

    Totally Corrective Boosting with Cardinality Penalization

    Authors: Vasil S. Denchev, Nan Ding, Shin Matsushima, S. V. N. Vishwanathan, Hartmut Neven

    Abstract: We propose a totally corrective boosting algorithm with explicit cardinality regularization. The resulting combinatorial optimization problems are not known to be efficiently solvable with existing classical methods, but emerging quantum optimization technology gives hope for achieving sparser models in practice. In order to demonstrate the utility of our algorithm, we use a distributed classical… ▽ More

    Submitted 6 April, 2015; originally announced April 2015.

  8. Distributed Stochastic Optimization of the Regularized Risk

    Authors: Shin Matsushima, Hyokun Yun, Xinhua Zhang, S. V. N. Vishwanathan

    Abstract: Many machine learning algorithms minimize a regularized risk, and stochastic optimization is widely used for this task. When working with massive data, it is desirable to perform stochastic optimization in parallel. Unfortunately, many existing stochastic optimization algorithms cannot be parallelized efficiently. In this paper we show that one can rewrite the regularized risk minimization problem… ▽ More

    Submitted 9 June, 2015; v1 submitted 17 June, 2014; originally announced June 2014.

    Journal ref: ECML PKDD 2017: Machine Learning and Knowledge Discovery in Databases pp 460-476