Skip to main content

Showing 1–50 of 51 results for author: Inoue, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16535  [pdf, other

    cs.CL cs.AI cs.LG

    Token-based Decision Criteria Are Suboptimal in In-context Learning

    Authors: Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue

    Abstract: In-Context Learning (ICL) typically utilizes classification criteria from probabilities of manually selected label tokens. However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation. To address this problem, we propose Hidden Calibration, which renounces token probabilities and u… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 21 pages, 14 figures, 8 tables

  2. arXiv:2406.14240  [pdf, other

    cs.CV cs.AI

    CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information

    Authors: Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, Nakamasa Inoue

    Abstract: Vision-and-language navigation (VLN) aims to guide autonomous agents through real-world environments by integrating visual and linguistic cues. While substantial progress has been made in understanding these interactive modalities in ground-level navigation, aerial navigation remains largely underexplored. This is primarily due to the scarcity of resources suitable for real-world, city-scale aeria… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: The first two authors are equally contributed

  3. arXiv:2406.12402  [pdf, other

    cs.CL

    Flee the Flaw: Annotating the Underlying Logic of Fallacious Arguments Through Templates and Slot-filling

    Authors: Irfan Robbani, Paul Reisert, Naoya Inoue, Surawat Pothong, Camélia Guerraoui, Wenzhi Wang, Shoichi Naito, Jungmin Choi, Kentaro Inui

    Abstract: Prior research in computational argumentation has mainly focused on scoring the quality of arguments, with less attention on explicating logical errors. In this work, we introduce four sets of explainable templates for common informal logical fallacies designed to explicate a fallacy's implicit logic. Using our templates, we conduct an annotation study on top of 400 fallacious arguments taken from… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  4. arXiv:2406.08232  [pdf, other

    cs.CV cs.GR

    OpenCOLE: Towards Reproducible Automatic Graphic Design Generation

    Authors: Naoto Inoue, Kento Masui, Wataru Shimoda, Kota Yamaguchi

    Abstract: Automatic generation of graphic designs has recently received considerable attention. However, the state-of-the-art approaches are complex and rely on proprietary datasets, which creates reproducibility barriers. In this paper, we propose an open framework for automatic graphic design called OpenCOLE, where we build a modified version of the pioneering COLE and train our model exclusively on publi… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: To appear as an extended abstract (EA) in Workshop on Graphic Design Understanding and Generation (in CVPR2024), code: https://github.com/CyberAgentAILab/OpenCOLE

  5. arXiv:2406.01468  [pdf, other

    cs.CL cs.AI cs.LG

    Understanding Token Probability Encoding in Output Embeddings

    Authors: Hakaze Cho, Yoshihiro Sakai, Kenshiro Tanaka, Mariko Kato, Naoya Inoue

    Abstract: In this paper, we investigate the output token probability information in the output embedding of language models. We provide an approximate common log-linear encoding of output token probabilities within the output embedding vectors and demonstrate that it is accurate and sparse when the output space is large and output logits are concentrated. Based on such findings, we edit the encoding in outp… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 15 pages, 17 figures, 3 tables

  6. arXiv:2403.18187  [pdf, other

    cs.CV

    LayoutFlow: Flow Matching for Layout Generation

    Authors: Julian Jorge Andrade Guerreiro, Naoto Inoue, Kento Masui, Mayu Otani, Hideki Nakayama

    Abstract: Finding a suitable layout represents a crucial task for diverse applications in graphic design. Motivated by simpler and smoother sampling trajectories, we explore the use of Flow Matching as an alternative to current diffusion-based layout generation models. Specifically, we propose LayoutFlow, an efficient flow-based model capable of generating high-quality layouts. Instead of progressively deno… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  7. arXiv:2402.05515  [pdf, other

    cs.CL cs.AI

    NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning

    Authors: Yufeng Zhao, Yoshihiro Sakai, Naoya Inoue

    Abstract: In-Context Learning (ICL) is suffering from unsatisfactory performance and under-calibration due to high prior bias and unfaithful confidence. Some previous works fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for better performance and calibratio… ▽ More

    Submitted 15 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 20 pages, 28 figures, 7 tables (5 pages, 4 figures, 1 table in main body). ACL 2024 under review

  8. arXiv:2312.09718  [pdf, other

    cs.CL

    Discovering Highly Influential Shortcut Reasoning: An Automated Template-Free Approach

    Authors: Daichi Haraguchi, Kiyoaki Shirai, Naoya Inoue, Natthawut Kertkeidkachorn

    Abstract: Shortcut reasoning is an irrational process of inference, which degrades the robustness of an NLP model. While a number of previous work has tackled the identification of shortcut reasoning, there are still two major limitations: (i) a method for quantifying the severity of the discovered shortcut reasoning is not provided; (ii) certain types of shortcut reasoning may be missed. To address these i… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  9. arXiv:2311.13602  [pdf, other

    cs.CV

    Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

    Authors: Daichi Horita, Naoto Inoue, Kotaro Kikuchi, Kota Yamaguchi, Kiyoharu Aizawa

    Abstract: Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image. In this paper, we argue that the current layout generation approaches suffer from the limited training data for the high-dimensional layout structure. We show that a simple retrieval augmentation can significantly improve the generation quality. Our… ▽ More

    Submitted 15 April, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: Accepted to CVPR 2024 (Oral), Project website: https://udonda.github.io/RALF/ , GitHub: https://github.com/CyberAgentAILab/RALF

  10. arXiv:2310.18773  [pdf, other

    cs.CV

    CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data

    Authors: Taiki Miyanishi, Fumiya Kitamori, Shuhei Kurita, Jungdae Lee, Motoaki Kawanabe, Nakamasa Inoue

    Abstract: City-scale 3D point cloud is a promising way to express detailed and complicated outdoor structures. It encompasses both the appearance and geometry features of segmented city components, including cars, streets, and buildings, that can be utilized for attractive applications such as user-interactive navigation of autonomous vehicles and drones. However, compared to the extensive text annotations… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: NeurIPS D&B 2023. The first two authors are equally contributed

  11. arXiv:2309.17083  [pdf, other

    cs.CV

    SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning

    Authors: Risa Shinoda, Ryo Hayamizu, Kodai Nakashima, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka

    Abstract: Pre-training is a strong strategy for enhancing visual models to efficiently train them with a limited number of labeled images. In semantic segmentation, creating annotation masks requires an intensive amount of labor and time, and therefore, a large-scale pre-training dataset with semantic labels is quite difficult to construct. Moreover, what matters in semantic segmentation pre-training has no… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: ICCV2023. Code: https://github.com/dahlian00/SegRCDB, Project page: https://dahlian00.github.io/SegRCDBPage/

  12. arXiv:2307.15341  [pdf, other

    cs.CL

    Teach Me How to Improve My Argumentation Skills: A Survey on Feedback in Argumentation

    Authors: Camélia Guerraoui, Paul Reisert, Naoya Inoue, Farjana Sultana Mim, Shoichi Naito, Jungmin Choi, Irfan Robbani, Wenzhi Wang, Kentaro Inui

    Abstract: The use of argumentation in education has been shown to improve critical thinking skills for end-users such as students, and computational models for argumentation have been developed to assist in this process. Although these models are useful for evaluating the quality of an argument, they oftentimes cannot explain why a particular argument is considered poor or not, which makes it difficult to p… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: 14 pages, 4 figures

  13. arXiv:2307.14710  [pdf, other

    cs.CV

    Pre-training Vision Transformers with Very Limited Synthesized Images

    Authors: Ryo Nakamura, Hirokatsu Kataoka, Sora Takashima, Edgar Josafat Martinez Noriega, Rio Yokota, Nakamasa Inoue

    Abstract: Formula-driven supervised learning (FDSL) is a pre-training method that relies on synthetic images generated from mathematical formulae such as fractals. Prior work on FDSL has shown that pre-training vision transformers on such synthetic datasets can yield competitive accuracy on a wide range of downstream tasks. These synthetic images are categorized according to the parameters in the mathematic… ▽ More

    Submitted 30 July, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV 2023

  14. arXiv:2305.13844  [pdf, other

    cs.CL

    Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation

    Authors: Shohei Higashiyama, Hiroki Ouchi, Hiroki Teranishi, Hiroyuki Otomo, Yusuke Ide, Aitaro Yamamoto, Hiroyuki Shindo, Yuki Matsuda, Shoko Wakamiya, Naoya Inoue, Ikuya Yamada, Taro Watanabe

    Abstract: Geoparsing is a fundamental technique for analyzing geo-entity information in text. We focus on document-level geoparsing, which considers geographic relatedness among geo-entity mentions, and presents a Japanese travelogue dataset designed for evaluating document-level geoparsing systems. Our dataset comprises 200 travelogue documents with rich geo-entity information: 12,171 mentions, 6,339 coref… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  15. arXiv:2305.11444  [pdf, other

    cs.CL cs.AI cs.DL

    Arukikata Travelogue Dataset

    Authors: Hiroki Ouchi, Hiroyuki Shindo, Shoko Wakamiya, Yuki Matsuda, Naoya Inoue, Shohei Higashiyama, Satoshi Nakamura, Taro Watanabe

    Abstract: We have constructed Arukikata Travelogue Dataset and released it free of charge for academic research. This dataset is a Japanese text dataset with a total of over 31 million words, comprising 4,672 Japanese domestic travelogues and 9,607 overseas travelogues. Before providing our dataset, there was a scarcity of widely available travelogue data for research purposes, and each researcher had to pr… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: The application website for Arukikata Travelogue Dataset: https://www.nii.ac.jp/dsc/idr/arukikata/

  16. arXiv:2303.18248  [pdf, other

    cs.CV

    Towards Flexible Multi-modal Document Models

    Authors: Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

    Abstract: Creative workflows for generating graphical documents involve complex inter-related tasks, such as aligning elements, choosing appropriate fonts, or employing aesthetically harmonious colors. In this work, we attempt at building a holistic model that can jointly solve many different design tasks. Our model, which we denote by FlexDM, treats vector graphic documents as a set of multi-modal elements… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

    Comments: To be published in CVPR2023 (highlight), project page: https://cyberagentailab.github.io/flex-dm

  17. arXiv:2303.08137  [pdf, other

    cs.CV cs.GR

    LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

    Authors: Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

    Abstract: Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element. In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models. Our model, named LayoutDM, naturally handles the structured layout data in the d… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: To be published in CVPR2023, project page: https://cyberagentailab.github.io/layout-dm/

  18. arXiv:2303.01112  [pdf, other

    cs.CV cs.AI cs.LG

    Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

    Authors: Sora Takashima, Ryo Hayamizu, Nakamasa Inoue, Hirokatsu Kataoka, Rio Yokota

    Abstract: Formula-driven supervised learning (FDSL) has been shown to be an effective method for pre-training vision transformers, where ExFractalDB-21k was shown to exceed the pre-training effect of ImageNet-21k. These studies also indicate that contours mattered more than textures when pre-training vision transformers. However, the lack of a systematic investigation as to why these contour-oriented synthe… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  19. arXiv:2212.11541  [pdf, other

    cs.CV cs.MM

    Generative Colorization of Structured Mobile Web Pages

    Authors: Kotaro Kikuchi, Naoto Inoue, Mayu Otani, Edgar Simo-Serra, Kota Yamaguchi

    Abstract: Color is a critical design factor for web pages, affecting important factors such as viewer emotions and the overall trust and satisfaction of a website. Effective coloring requires design knowledge and expertise, but if this process could be automated through data-driven modeling, efficient exploration and alternative workflows would be possible. However, this direction remains underexplored due… ▽ More

    Submitted 23 January, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: Accepted to WACV 2023

  20. arXiv:2212.10352  [pdf, other

    cs.NE cs.LG

    Fixed-Weight Difference Target Propagation

    Authors: Tatsukichi Shibuya, Nakamasa Inoue, Rei Kawakami, Ikuro Sato

    Abstract: Target Propagation (TP) is a biologically more plausible algorithm than the error backpropagation (BP) to train deep networks, and improving practicality of TP is an open issue. TP methods require the feedforward and feedback networks to form layer-wise autoencoders for propagating the target values generated at the output layer. However, this causes certain drawbacks; e.g., careful hyperparameter… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23). 9 pages and 3 figures in main manuscript; 11 pages and 5 figures in supplementary material

  21. arXiv:2212.02780  [pdf, ps, other

    cs.MM cs.SD eess.AS

    Parameter Efficient Transfer Learning for Various Speech Processing Tasks

    Authors: Shinta Otake, Rei Kawakami, Nakamasa Inoue

    Abstract: Fine-tuning of self-supervised models is a powerful transfer learning method in a variety of fields, including speech processing, since it can utilize generic feature representations obtained from large amounts of unlabeled data. Fine-tuning, however, requires a new parameter set for each downstream task, which is parameter inefficient. Adapter architecture is proposed to partially solve this issu… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  22. arXiv:2207.01847  [pdf, other

    cs.LG

    PoF: Post-Training of Feature Extractor for Improving Generalization

    Authors: Ikuro Sato, Ryota Yamada, Masayuki Tanaka, Nakamasa Inoue, Rei Kawakami

    Abstract: It has been intensively investigated that the local shape, especially flatness, of the loss landscape near a minimum plays an important role for generalization of deep models. We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. The characteristics are two-fold: 1) Feat… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: Accepted to ICML2022. Contains a link to the code

  23. arXiv:2206.09132  [pdf, other

    cs.CV cs.AI cs.LG

    Replacing Labeled Real-image Datasets with Auto-generated Contours

    Authors: Hirokatsu Kataoka, Ryo Hayamizu, Ryosuke Yamada, Kodai Nakashima, Sora Takashima, Xinyu Zhang, Edgar Josafat Martinez-Noriega, Nakamasa Inoue, Rio Yokota

    Abstract: In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs). For example, ViT-Base pre-trained on ImageNet-21k shows 81.8% top-1 accuracy when fine-tuned on ImageNet-1k and FDSL shows 82.7% top-1 accuracy… ▽ More

    Submitted 18 June, 2022; originally announced June 2022.

    Comments: Accepted to CVPR 2022

  24. arXiv:2204.01512  [pdf, other

    cs.CL

    LPAttack: A Feasible Annotation Scheme for Capturing Logic Pattern of Attacks in Arguments

    Authors: Farjana Sultana Mim, Naoya Inoue, Shoichi Naito, Keshav Singh, Kentaro Inui

    Abstract: In argumentative discourse, persuasion is often achieved by refuting or attacking others arguments. Attacking is not always straightforward and often comprise complex rhetorical moves such that arguers might agree with a logic of an argument while attacking another logic. Moreover, arguer might neither deny nor agree with any logics of an argument, instead ignore them and attack the main stance of… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: 14 pages, 8 figures

  25. arXiv:2201.06674  [pdf, other

    cs.CL

    TYPIC: A Corpus of Template-Based Diagnostic Comments on Argumentation

    Authors: Shoichi Naito, Shintaro Sawada, Chihiro Nakagawa, Naoya Inoue, Kenshi Yamaguchi, Iori Shimizu, Farjana Sultana Mim, Keshav Singh, Kentaro Inui

    Abstract: Providing feedback on the argumentation of the learner is essential for develo** critical thinking skills, however, it requires a lot of time and effort. To mitigate the overload on teachers, we aim to automate a process of providing feedback, especially giving diagnostic comments which point out the weaknesses inherent in the argumentation. It is recommended to give specific diagnostic comments… ▽ More

    Submitted 21 June, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

    Comments: LREC2022. The dataset is available at https://github.com/cl-tohoku/TYPIC

  26. arXiv:2110.13692  [pdf, other

    cs.CL

    Annotating Implicit Reasoning in Arguments with Causal Links

    Authors: Keshav Singh, Naoya Inoue, Farjana Sultana Mim, Shoichi Naitoh, Kentaro Inui

    Abstract: Most of the existing work that focus on the identification of implicit knowledge in arguments generally represent implicit knowledge in the form of commonsense or factual knowledge. However, such knowledge is not sufficient to understand the implicit reasoning link between individual argumentative components (i.e., claim and premise). In this work, we focus on identifying the implicit knowledge in… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: Accepted to ArgKG:Workshop on Argumentation Knowledge Graphs (AKBC 2021)

  27. arXiv:2110.11934  [pdf, other

    cs.CL

    Cleaning Dirty Books: Post-OCR Processing for Previously Scanned Texts

    Authors: Allen Kim, Charuta Pethe, Naoya Inoue, Steve Skiena

    Abstract: Substantial amounts of work are required to clean large collections of digitized books for NLP analysis, both because of the presence of errors in the scanned text and the presence of duplicate volumes in the corpora. In this paper, we consider the issue of deduplication in the presence of optical character recognition (OCR) errors. We present methods to handle these errors, evaluated on a collect… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: Accepted for Findings of EMNLP 2021

  28. arXiv:2109.06853  [pdf, other

    cs.CL

    Summarize-then-Answer: Generating Concise Explanations for Multi-hop Reading Comprehension

    Authors: Naoya Inoue, Harsh Trivedi, Steven Sinha, Niranjan Balasubramanian, Kentaro Inui

    Abstract: How can we generate concise explanations for multi-hop Reading Comprehension (RC)? The current strategies of identifying supporting sentences can be seen as an extractive question-focused summarization of the input text. However, these extractive explanations are not necessarily concise i.e. not minimally sufficient for answering a question. Instead, we advocate for an abstractive approach, where… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: Accepted to EMNLP2021 Long Paper (Main Track)

  29. arXiv:2104.07924  [pdf, other

    cs.CL

    A Comparative Study on Collecting High-Quality Implicit Reasonings at a Large-scale

    Authors: Keshav Singh, Paul Reisert, Naoya Inoue, Kentaro Inui

    Abstract: Explicating implicit reasoning (i.e. warrants) in arguments is a long-standing challenge for natural language understanding systems. While recent approaches have focused on explicating warrants via crowdsourcing or expert annotations, the quality of warrants has been questionable due to the extreme complexity and subjectivity of the task. In this paper, we tackle the complex task of warrant explic… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

    Comments: 2 figures, 3 tables

  30. arXiv:2103.13023  [pdf, other

    cs.CV

    Can Vision Transformers Learn without Natural Images?

    Authors: Kodai Nakashima, Hirokatsu Kataoka, Asato Matsumoto, Kenji Iwata, Nakamasa Inoue

    Abstract: Can we complete pre-training of Vision Transformers (ViT) without natural images and human-annotated labels? Although a pre-trained ViT seems to heavily rely on a large-scale dataset and human-annotated labels, recent large-scale datasets contain several problems in terms of privacy violations, inadequate fairness protection, and labor-intensive annotation. In the present paper, we pre-train ViT w… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: Project page: https://hirokatsukataoka16.github.io/Vision-Transformers-without-Natural-Images/

  31. arXiv:2102.06540  [pdf, other

    cs.CL cs.LG

    Two Training Strategies for Improving Relation Extraction over Universal Graph

    Authors: Qin Dai, Naoya Inoue, Ryo Takahashi, Kentaro Inui

    Abstract: This paper explores how the Distantly Supervised Relation Extraction (DS-RE) can benefit from the use of a Universal Graph (UG), the combination of a Knowledge Graph (KG) and a large-scale text collection. A straightforward extension of a current state-of-the-art neural model for DS-RE with a UG may lead to degradation in performance. We first report that this degradation is associated with the di… ▽ More

    Submitted 6 May, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

  32. arXiv:2101.08515  [pdf, other

    cs.CV cs.LG

    Pre-training without Natural Images

    Authors: Hirokatsu Kataoka, Kazushige Okayasu, Asato Matsumoto, Eisuke Yamagata, Ryosuke Yamada, Nakamasa Inoue, Akio Nakamura, Yutaka Satoh

    Abstract: Is it possible to use convolutional neural networks pre-trained without any natural images to assist natural image understanding? The paper proposes a novel concept, Formula-driven Supervised Learning. We automatically generate image patterns and their category labels by assigning fractals, which are based on a natural law existing in the background knowledge of the real world. Theoretically, the… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

    Comments: ACCV 2020 Best Paper Honorable Mention Award, Codes are publicly available: https://github.com/hirokatsukataoka16/FractalDB-Pretrained-ResNet-PyTorch

  33. arXiv:2101.07406  [pdf, ps, other

    cs.CV

    Initialization Using Perlin Noise for Training Networks with a Limited Amount of Data

    Authors: Nakamasa Inoue, Eisuke Yamagata, Hirokatsu Kataoka

    Abstract: We propose a novel network initialization method using Perlin noise for training image classification networks with a limited amount of data. Our main idea is to initialize the network parameters by solving an artificial noise classification problem, where the aim is to classify Perlin noise samples into their noise categories. Specifically, the proposed method consists of two steps. First, it gen… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

    Comments: Accepted to ICPR2020

  34. Learning from Synthetic Shadows for Shadow Detection and Removal

    Authors: Naoto Inoue, Toshihiko Yamasaki

    Abstract: Shadow removal is an essential task in computer vision and computer graphics. Recent shadow removal approaches all train convolutional neural networks (CNN) on real paired shadow/shadow-free or shadow/shadow-free/mask image datasets. However, obtaining a large-scale, diverse, and accurate dataset has been a big challenge, and it limits the performance of the learned models on shadow images with un… ▽ More

    Submitted 13 February, 2021; v1 submitted 5 January, 2021; originally announced January 2021.

    Comments: Accepted to IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), v2: fixed typos

  35. arXiv:2011.01785  [pdf, other

    cs.CL

    Modeling Event Salience in Narratives via Barthes' Cardinal Functions

    Authors: Takaki Otake, Sho Yokoi, Naoya Inoue, Ryo Takahashi, Tatsuki Kuribayashi, Kentaro Inui

    Abstract: Events in a narrative differ in salience: some are more important to the story than others. Estimating event salience is useful for tasks such as story generation, and as a tool for text analysis in narratology and folkloristics. To compute event salience without any annotations, we adopt Barthes' definition of event salience and propose several unsupervised methods that require only a pre-trained… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: accepted to COLING 2020

  36. arXiv:2010.06137  [pdf, other

    cs.CL

    Corruption Is Not All Bad: Incorporating Discourse Structure into Pre-training via Corruption for Essay Scoring

    Authors: Farjana Sultana Mim, Naoya Inoue, Paul Reisert, Hiroki Ouchi, Kentaro Inui

    Abstract: Existing approaches for automated essay scoring and document representation learning typically rely on discourse parsers to incorporate discourse structure into text representation. However, the performance of parsers is not always adequate, especially when they are used on noisy texts, such as student essays. In this paper, we propose an unsupervised pre-training approach to capture discourse str… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  37. arXiv:2006.04326  [pdf, ps, other

    eess.AS cs.SD

    Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

    Authors: Nakamasa Inoue, Keita Goto

    Abstract: This paper introduces a semi-supervised contrastive learning framework and its application to text-independent speaker verification. The proposed framework employs generalized contrastive loss (GCL). GCL unifies losses from two different learning frameworks, supervised metric learning and unsupervised contrastive learning, and thus it naturally determines the loss for semi-supervised learning. In… ▽ More

    Submitted 7 June, 2020; originally announced June 2020.

  38. arXiv:2004.07992  [pdf, other

    eess.AS cs.LG cs.SD q-bio.QM

    Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network

    Authors: Mariana Rodrigues Makiuchi, Tifani Warnita, Nakamasa Inoue, Koichi Shinoda, Michitaka Yoshimura, Momoko Kitazawa, Kei Funaki, Yoko Eguchi, Taishiro Kishimoto

    Abstract: We propose a non-invasive and cost-effective method to automatically detect dementia by utilizing solely speech audio data. We extract paralinguistic features for a short speech segment and use Gated Convolutional Neural Networks (GCNN) to classify it into dementia or healthy. We evaluate our method on the Pitt Corpus and on our own dataset, the PROMPT Database. Our method yields the accuracy of 7… ▽ More

    Submitted 6 October, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

  39. arXiv:2003.00187  [pdf, other

    cs.CV cs.LG

    Augmented Cyclic Consistency Regularization for Unpaired Image-to-Image Translation

    Authors: Takehiko Ohkawa, Naoto Inoue, Hirokatsu Kataoka, Nakamasa Inoue

    Abstract: Unpaired image-to-image (I2I) translation has received considerable attention in pattern recognition and computer vision because of recent advancements in generative adversarial networks (GANs). However, due to the lack of explicit supervision, unpaired I2I models often fail to generate realistic images, especially in challenging datasets with different backgrounds and poses. Hence, stabilization… ▽ More

    Submitted 12 October, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

    Comments: Accepted to ICPR2020

  40. arXiv:1912.07190  [pdf, other

    cs.CV cs.LG eess.IV

    PixelRL: Fully Convolutional Network with Reinforcement Learning for Image Processing

    Authors: Ryosuke Furuta, Naoto Inoue, Toshihiko Yamasaki

    Abstract: This paper tackles a new problem setting: reinforcement learning with pixel-wise rewards (pixelRL) for image processing. After the introduction of the deep Q-network, deep RL has been achieving great success. However, the applications of deep reinforcement learning (RL) for image processing are still limited. Therefore, we extend deep RL to pixelRL for various image processing applications. In pix… ▽ More

    Submitted 15 December, 2019; originally announced December 2019.

    Comments: To appear in IEEE Transactions on Multimedia (TMM), Special Issue on Multimedia Computing with Interpretable Machine Learning. Extended version of our paper in AAAI 2019 (arXiv:1811.04323)

  41. arXiv:1911.00225  [pdf, other

    cs.CL

    When Choosing Plausible Alternatives, Clever Hans can be Clever

    Authors: Pride Kavumba, Naoya Inoue, Benjamin Heinzerling, Keshav Singh, Paul Reisert, Kentaro Inui

    Abstract: Pretrained language models, such as BERT and RoBERTa, have shown large improvements in the commonsense reasoning benchmark COPA. However, recent work found that many improvements in benchmarks of natural language understanding are not due to models learning the task, but due to their increasing ability to exploit superficial cues, such as tokens that occur more often in the correct answer than the… ▽ More

    Submitted 1 November, 2019; originally announced November 2019.

    Comments: Accepted to the COmmonsense INference in Natural Language Processing workshop (COIN)

  42. arXiv:1910.04601  [pdf, other

    cs.CL

    R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason

    Authors: Naoya Inoue, Pontus Stenetorp, Kentaro Inui

    Abstract: Recent studies have revealed that reading comprehension (RC) systems learn to exploit annotation artifacts and other biases in current datasets. This prevents the community from reliably measuring the progress of RC systems. To address this issue, we introduce R4C, a new task for evaluating RC systems' internal reasoning. R4C requires giving not only answers but also derivations: explanations that… ▽ More

    Submitted 1 May, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: Accepted by ACL2020. See https://naoya-i.github.io/r4c/ for more information

  43. arXiv:1910.03246  [pdf, other

    cs.CL

    Riposte! A Large Corpus of Counter-Arguments

    Authors: Paul Reisert, Benjamin Heinzerling, Naoya Inoue, Shun Kiyono, Kentaro Inui

    Abstract: Constructive feedback is an effective method for improving critical thinking skills. Counter-arguments (CAs), one form of constructive feedback, have been proven to be useful for critical thinking skills. However, little work has been done for constructing a large-scale corpus of them which can drive research on automatic generation of CAs for fallacious micro-level arguments (i.e. a single claim… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

  44. arXiv:1811.04531  [pdf, other

    cs.CL

    Sequence-Level Knowledge Distillation for Model Compression of Attention-based Sequence-to-Sequence Speech Recognition

    Authors: Raden Mu'az Mun'im, Nakamasa Inoue, Koichi Shinoda

    Abstract: We investigate the feasibility of sequence-level knowledge distillation of Sequence-to-Sequence (Seq2Seq) models for Large Vocabulary Continuous Speech Recognition (LVSCR). We first use a pre-trained larger teacher model to generate multiple hypotheses per utterance with beam search. With the same input, we then train the student model using these hypotheses generated from the teacher as pseudo la… ▽ More

    Submitted 11 November, 2018; originally announced November 2018.

  45. arXiv:1811.04323  [pdf, other

    cs.CV cs.AI

    Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

    Authors: Ryosuke Furuta, Naoto Inoue, Toshihiko Yamasaki

    Abstract: This paper tackles a new problem setting: reinforcement learning with pixel-wise rewards (pixelRL) for image processing. After the introduction of the deep Q-network, deep RL has been achieving great success. However, the applications of deep RL for image processing are still limited. Therefore, we extend deep RL to pixelRL for various image processing applications. In pixelRL, each pixel has an a… ▽ More

    Submitted 13 November, 2018; v1 submitted 10 November, 2018; originally announced November 2018.

    Comments: Accepted to AAAI 2019

  46. arXiv:1807.07203  [pdf, ps, other

    cs.MM cs.CV

    Few-Shot Adaptation for Multimedia Semantic Indexing

    Authors: Nakamasa Inoue, Koichi Shinoda

    Abstract: We propose a few-shot adaptation framework, which bridges zero-shot learning and supervised many-shot learning, for semantic indexing of image and video data. Few-shot adaptation provides robust parameter estimation with few training examples, by optimizing the parameters of zero-shot learning and supervised many-shot learning simultaneously. In this method, first we build a zero-shot detector, an… ▽ More

    Submitted 18 July, 2018; originally announced July 2018.

  47. arXiv:1805.11790  [pdf, other

    cs.CV

    A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition

    Authors: Thao Minh Le, Nakamasa Inoue, Koichi Shinoda

    Abstract: This paper presents a new framework for human action recognition from a 3D skeleton sequence. Previous studies do not fully utilize the temporal relationships between video segments in a human action. Some studies successfully used very deep Convolutional Neural Network (CNN) models but often suffer from the data insufficiency problem. In this study, we first segment a skeleton sequence into disti… ▽ More

    Submitted 18 August, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

    Comments: Camera-ready manuscript for BMVC2018

  48. arXiv:1804.00290  [pdf, other

    eess.AS cs.LG cs.SD

    I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification

    Authors: Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda

    Abstract: I-vector based text-independent speaker verification (SV) systems often have poor performance with short utterances, as the biased phonetic distribution in a short utterance makes the extracted i-vector unreliable. This paper proposes an i-vector compensation method using a generative adversarial network (GAN), where its generator network is trained to generate a compensated i-vector from a short-… ▽ More

    Submitted 1 April, 2018; originally announced April 2018.

  49. arXiv:1803.11365  [pdf, other

    cs.CV

    Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation

    Authors: Naoto Inoue, Ryosuke Furuta, Toshihiko Yamasaki, Kiyoharu Aizawa

    Abstract: Can we detect common objects in a variety of image domains without instance-level annotations? In this paper, we present a framework for a novel task, cross-domain weakly supervised object detection, which addresses this question. For this paper, we have access to images with instance-level annotations in a source domain (e.g., natural image) and images with image-level annotations in a target dom… ▽ More

    Submitted 30 March, 2018; originally announced March 2018.

    Comments: To appear at CVPR2018 (poster), including supplementary materials

  50. arXiv:1803.11344  [pdf, other

    eess.AS cs.SD

    Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data

    Authors: Tifani Warnita, Nakamasa Inoue, Koichi Shinoda

    Abstract: We propose an automatic detection method of Alzheimer's diseases using a gated convolutional neural network (GCNN) from speech data. This GCNN can be trained with a relatively small amount of data and can capture the temporal information in audio paralinguistic features. Since it does not utilize any linguistic features, it can be easily applied to any languages. We evaluated our method using Pitt… ▽ More

    Submitted 30 March, 2018; originally announced March 2018.

    Comments: 5 pages, 3 figures, submitted to INTERSPEECH 2018