Skip to main content

Showing 1–24 of 24 results for author: Yokoi, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12353  [pdf, other

    stat.ML cs.LG

    Top-Down Bayesian Posterior Sampling for Sum-Product Networks

    Authors: Soma Yokoi, Issei Sato

    Abstract: Sum-product networks (SPNs) are probabilistic models characterized by exact and fast evaluation of fundamental probabilistic operations. Its superior computational tractability has led to applications in many fields, such as machine learning with time constraints or accuracy requirements and real-time systems. The structural constraints of SPNs supporting fast inference, however, lead to increased… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  2. arXiv:2310.15921  [pdf, other

    cs.CL

    Contrastive Learning-based Sentence Encoders Implicitly Weight Informative Words

    Authors: Hiroto Kurita, Goro Kobayashi, Sho Yokoi, Kentaro Inui

    Abstract: The performance of sentence encoders can be significantly improved through the simple practice of fine-tuning using contrastive loss. A natural question arises: what characteristics do models acquire during contrastive learning? This paper theoretically and experimentally shows that contrastive-based sentence encoders implicitly weight words based on information-theoretic quantities; that is, more… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 16 pages, 6 figures, accepted to EMNLP 2023 Findings (short paper)

  3. arXiv:2306.04116  [pdf, other

    cs.CL cs.LG

    Unbalanced Optimal Transport for Unbalanced Word Alignment

    Authors: Yuki Arase, Han Bao, Sho Yokoi

    Abstract: Monolingual word alignment is crucial to model semantic interactions between sentences. In particular, null alignment, a phenomenon in which words have no corresponding counterparts, is pervasive and critical in handling semantically divergent sentences. Identification of null alignment is useful on its own to reason about the semantic similarity of sentences by indicating there exists information… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: Accepted for the Annual Meeting of the Association for Computational Linguistics (ACL 2023)

  4. arXiv:2305.18294  [pdf, other

    cs.CL

    Transformer Language Models Handle Word Frequency in Prediction Head

    Authors: Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui

    Abstract: Prediction head is a crucial component of Transformer language models. Despite its direct impact on prediction, this component has often been overlooked in analyzing Transformers. In this study, we investigate the inner workings of the prediction head, specifically focusing on bias parameters. Our experiments with BERT and GPT-2 models reveal that the biases in their word prediction heads play a s… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 11 pages, 12 figures, accepted to ACL 2023 Findings (short paper)

  5. arXiv:2302.00456  [pdf, other

    cs.CL

    Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps

    Authors: Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui

    Abstract: Transformers are ubiquitous in wide tasks. Interpreting their internals is a pivotal goal. Nevertheless, their particular components, feed-forward (FF) blocks, have typically been less analyzed despite their substantial parameter amounts. We analyze the input contextualization effects of FF blocks by rendering them in the attention maps as a human-friendly visualization scheme. Our experiments wit… ▽ More

    Submitted 15 April, 2024; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: ICLR 2024 Spotlight; 37 pages, 32 figures, 3 tables

  6. arXiv:2212.09663  [pdf, other

    cs.CL

    Norm of Word Embedding Encodes Information Gain

    Authors: Momose Oyama, Sho Yokoi, Hidetoshi Shimodaira

    Abstract: Distributed representations of words encode lexical semantic information, but what type of information is encoded and how? Focusing on the skip-gram with negative-sampling method, we found that the squared norm of static word embedding encodes the information gain conveyed by the word; the information gain is defined by the Kullback-Leibler divergence of the co-occurrence distribution of the word… ▽ More

    Submitted 2 November, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: 23 pages, EMNLP 2023

  7. arXiv:2211.06229  [pdf, other

    cs.CL

    Improving word mover's distance by leveraging self-attention matrix

    Authors: Hiroaki Yamagiwa, Sho Yokoi, Hidetoshi Shimodaira

    Abstract: Measuring the semantic similarity between two sentences is still an important task. The word mover's distance (WMD) computes the similarity via the optimal alignment between the sets of word embeddings. However, WMD does not utilize word order, making it challenging to distinguish sentences with significant overlaps of similar words, even if they are semantically very different. Here, we attempt t… ▽ More

    Submitted 2 November, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: 24 pages, accepted to EMNLP 2023 Findings

  8. arXiv:2210.13034  [pdf, other

    cs.CL cs.LG

    Subspace Representations for Soft Set Operations and Sentence Similarities

    Authors: Yoichi Ishibashi, Sho Yokoi, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: In the field of natural language processing (NLP), continuous vector representations are crucial for capturing the semantic meanings of individual words. Yet, when it comes to the representations of sets of words, the conventional vector-based approaches often struggle with expressiveness and lack the essential set operations such as union, intersection, and complement. Inspired by quantum logic,… ▽ More

    Submitted 9 April, 2024; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted at NAACL 2024

  9. arXiv:2210.09495  [pdf, ps, other

    cs.CV

    5th Place Solution to Kaggle Google Universal Image Embedding Competition

    Authors: Noriaki Ota, Shingo Yokoi, Shinsuke Yamaoka

    Abstract: In this paper, we present our solution, which placed 5th in the kaggle Google Universal Image Embedding Competition in 2022. We use the ViT-H visual encoder of CLIP from the openclip repository as a backbone and train a head model composed of BatchNormalization and Linear layers using ArcFace. The dataset used was a subset of products10K, GLDv2, GPR1200, and Food101. And applying TTA for part of i… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: 3 pages, 1 figures

  10. arXiv:2109.13497  [pdf, other

    cs.CL cs.LG

    Instance-Based Neural Dependency Parsing

    Authors: Hiroki Ouchi, Jun Suzuki, Sosuke Kobayashi, Sho Yokoi, Tatsuki Kuribayashi, Masashi Yoshikawa, Kentaro Inui

    Abstract: Interpretable rationales for model predictions are crucial in practical applications. We develop neural models that possess an interpretable inference process for dependency parsing. Our models adopt instance-based inference, where dependency edges are extracted and labeled by comparing them to edges in a training set. The training edges are explicitly used for the predictions; thus, it is easy to… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: 15 pages, accepted to TACL 2021

  11. arXiv:2109.07152  [pdf, other

    cs.CL

    Incorporating Residual and Normalization Layers into Analysis of Masked Language Models

    Authors: Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui

    Abstract: Transformer architecture has become ubiquitous in the natural language processing field. To interpret the Transformer-based models, their attention patterns have been extensively analyzed. However, the Transformer architecture is not only composed of the multi-head attention; other components can also contribute to Transformers' progressive performance. In this study, we extended the scope of the… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Comments: 22 pages, accepted to EMNLP 2021 main conference

  12. arXiv:2105.08585  [pdf, other

    cs.CL

    Revisiting Additive Compositionality: AND, OR and NOT Operations with Word Embeddings

    Authors: Masahiro Naito, Sho Yokoi, Geewook Kim, Hidetoshi Shimodaira

    Abstract: It is well-known that typical word embedding methods such as Word2Vec and GloVe have the property that the meaning can be composed by adding up the embeddings (additive compositionality). Several theories have been proposed to explain additive compositionality, but the following questions remain unanswered: (Q1) The assumptions of those theories do not hold for the practical word embedding. (Q2) O… ▽ More

    Submitted 19 December, 2022; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: 13pages; v1: accepted at ACL-IJCNLP 2021 Student Research Workshop; v2: minor revision

    MSC Class: 68T50

  13. arXiv:2103.00899  [pdf, ps, other

    cs.LG

    Computationally Efficient Wasserstein Loss for Structured Labels

    Authors: Ayato Toyokuni, Sho Yokoi, Hisashi Kashima, Makoto Yamada

    Abstract: The problem of estimating the probability distribution of labels has been widely studied as a label distribution learning (LDL) problem, whose applications include age estimation, emotion analysis, and semantic segmentation. We propose a tree-Wasserstein distance regularized LDL algorithm, focusing on hierarchical text classification tasks. We propose predicting the entire label hierarchy using ne… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

  14. arXiv:2012.04207  [pdf, other

    cs.LG cs.CL cs.CV

    Efficient Estimation of Influence of a Training Instance

    Authors: Sosuke Kobayashi, Sho Yokoi, Jun Suzuki, Kentaro Inui

    Abstract: Understanding the influence of a training instance on a neural network model leads to improving interpretability. However, it is difficult and inefficient to evaluate the influence, which shows how a model's prediction would be changed if a training instance were not used. In this paper, we propose an efficient method for estimating the influence. Our method is inspired by dropout, which zero-mask… ▽ More

    Submitted 19 November, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: This is an extended version of the paper presented at SustaiNLP 2020

  15. arXiv:2011.01785  [pdf, other

    cs.CL

    Modeling Event Salience in Narratives via Barthes' Cardinal Functions

    Authors: Takaki Otake, Sho Yokoi, Naoya Inoue, Ryo Takahashi, Tatsuki Kuribayashi, Kentaro Inui

    Abstract: Events in a narrative differ in salience: some are more important to the story than others. Estimating event salience is useful for tasks such as story generation, and as a tool for text analysis in narratology and folkloristics. To compute event salience without any annotations, we adopt Barthes' definition of event salience and propose several unsupervised methods that require only a pre-trained… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: accepted to COLING 2020

  16. arXiv:2006.04528  [pdf, other

    cs.LG stat.ML

    Evaluation of Similarity-based Explanations

    Authors: Kazuaki Hanawa, Sho Yokoi, Satoshi Hara, Kentaro Inui

    Abstract: Explaining the predictions made by complex machine learning models helps users to understand and accept the predicted outputs with confidence. One promising way is to use similarity-based explanation that provides similar instances as evidence to support model predictions. Several relevance metrics are used for this purpose. In this study, we investigated relevance metrics that can provide reasona… ▽ More

    Submitted 22 March, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: ICLR 2021

  17. arXiv:2004.15003  [pdf, other

    cs.CL

    Word Rotator's Distance

    Authors: Sho Yokoi, Ryo Takahashi, Reina Akama, Jun Suzuki, Kentaro Inui

    Abstract: A key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. Such alignment-based approaches are intuitive and interpretable; however, they are empirically inferior to the simple cosine similarity between general-purpose sentence vectors. To address this issue, we focus on and demonstrate the fact that the norm of… ▽ More

    Submitted 16 November, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

    Comments: 17 pages, accepted at EMNLP 2020

    Journal ref: EMNLP 2020

  18. arXiv:2004.14514  [pdf, other

    cs.CL cs.LG

    Instance-Based Learning of Span Representations: A Case Study through Named Entity Recognition

    Authors: Hiroki Ouchi, Jun Suzuki, Sosuke Kobayashi, Sho Yokoi, Tatsuki Kuribayashi, Ryuto Konno, Kentaro Inui

    Abstract: Interpretable rationales for model predictions play a critical role in practical applications. In this study, we develop models possessing interpretable inference process for structured prediction. Specifically, we present a method of instance-based learning that learns similarities between spans. At inference time, each span is assigned a class label based on its similar spans in the training set… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted by ACL2020

  19. arXiv:2004.14008  [pdf, other

    cs.CL

    Filtering Noisy Dialogue Corpora by Connectivity and Content Relatedness

    Authors: Reina Akama, Sho Yokoi, Jun Suzuki, Kentaro Inui

    Abstract: Large-scale dialogue datasets have recently become available for training neural dialogue agents. However, these datasets have been reported to contain a non-negligible number of unacceptable utterance pairs. In this paper, we propose a method for scoring the quality of utterance pairs in terms of their connectivity and relatedness. The proposed scoring method is designed based on findings widely… ▽ More

    Submitted 6 October, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: 18 pages, Accepted at The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)

  20. arXiv:2004.10102  [pdf, other

    cs.CL

    Attention is Not Only a Weight: Analyzing Transformers with Vector Norms

    Authors: Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui

    Abstract: Attention is a key component of Transformers, which have recently achieved considerable success in natural language processing. Hence, attention is being extensively studied to investigate various linguistic capabilities of Transformers, focusing on analyzing the parallels between attention weights and specific linguistic phenomena. This paper shows that attention weights alone are only one of the… ▽ More

    Submitted 6 October, 2020; v1 submitted 21 April, 2020; originally announced April 2020.

    Comments: 19 pages, accepted by EMNLP 2020

  21. arXiv:1911.09011  [pdf, other

    stat.ML cs.LG

    Bayesian interpretation of SGD as Ito process

    Authors: Soma Yokoi, Issei Sato

    Abstract: The current interpretation of stochastic gradient descent (SGD) as a stochastic process lacks generality in that its numerical scheme restricts continuous-time dynamics as well as the loss function and the distribution of gradient noise. We introduce a simplified scheme with milder conditions that flexibly interprets SGD as a discrete-time approximation of an Ito process. The scheme also works as… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

  22. arXiv:1903.02750  [pdf, other

    stat.ML cs.LG

    On Transformations in Stochastic Gradient MCMC

    Authors: Soma Yokoi, Takuma Otsuka, Issei Sato

    Abstract: Stochastic gradient Langevin dynamics (SGLD) is a computationally efficient sampler for Bayesian posterior inference given a large scale dataset. Although SGLD is designed for unbounded random variables, many practical models incorporate variables with boundaries such as non-negative ones or those in a finite interval. To bridge this gap, we consider map** unbounded samples into the target inter… ▽ More

    Submitted 20 June, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

  23. Pointwise HSIC: A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions

    Authors: Sho Yokoi, Sosuke Kobayashi, Kenji Fukumizu, Jun Suzuki, Kentaro Inui

    Abstract: In this paper, we propose a new kernel-based co-occurrence measure that can be applied to sparse linguistic expressions (e.g., sentences) with a very short learning time, as an alternative to pointwise mutual information (PMI). As well as deriving PMI from mutual information, we derive this new measure from the Hilbert--Schmidt independence criterion (HSIC); thus, we call the new measure the point… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

    Comments: Accepted by EMNLP 2018

    Journal ref: EMNLP 2018

  24. arXiv:1805.05581  [pdf, other

    cs.CL

    Unsupervised Learning of Style-sensitive Word Vectors

    Authors: Reina Akama, Kento Watanabe, Sho Yokoi, Sosuke Kobayashi, Kentaro Inui

    Abstract: This paper presents the first study aimed at capturing stylistic similarity between words in an unsupervised manner. We propose extending the continuous bag of words (CBOW) model (Mikolov et al., 2013) to learn style-sensitive word vectors using a wider context window under the assumption that the style of all the words in an utterance is consistent. In addition, we introduce a novel task to predi… ▽ More

    Submitted 15 May, 2018; originally announced May 2018.

    Comments: 7 pages, Accepted at The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018)