Skip to main content

Showing 1–37 of 37 results for author: Sugiura, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00985  [pdf, other

    cs.RO cs.CV

    Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models

    Authors: Takayuki Nishimura, Katsuyuki Kuyo, Motonari Kambara, Komei Sugiura

    Abstract: We consider the task of generating segmentation masks for the target object from an object manipulation instruction, which allows users to give open vocabulary instructions to domestic service robots. Conventional segmentation generation approaches often fail to account for objects outside the camera's field of view and cases in which the order of vertices differs but still represents the same pol… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted for presentation at IROS2024

  2. arXiv:2404.01710  [pdf, other

    cs.DB

    Practical Persistent Multi-Word Compare-and-Swap Algorithms for Many-Core CPUs

    Authors: Kento Sugiura, Manabu Nishimura, Yoshiharu Ishikawa

    Abstract: In the last decade, academic and industrial researchers have focused on persistent memory because of the development of the first practical product, Intel Optane. One of the main challenges of persistent memory programming is to guarantee consistent durability over separate memory addresses, and Wang et al. proposed a persistent multi-word compare-and-swap (PMwCAS) algorithm to solve this problem.… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 8 pages, 14 figures

    ACM Class: H.2.4

  3. arXiv:2404.01237  [pdf, other

    cs.RO cs.AR

    FPGA-Accelerated Correspondence-free Point Cloud Registration with PointNet Features

    Authors: Keisuke Sugiura, Hiroki Matsutani

    Abstract: Point cloud registration serves as a basis for vision and robotic applications including 3D reconstruction and map**. Despite significant improvements on the quality of results, recent deep learning approaches are computationally expensive and power-hungry, making them difficult to deploy on resource-constrained edge devices. To tackle this problem, in this paper, we propose a fast, accurate, an… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 27 pages, 19 figures

  4. arXiv:2402.18091  [pdf, other

    cs.CV cs.AI cs.CL

    Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

    Authors: Yuiga Wada, Kanta Kaneda, Daichi Saito, Komei Sugiura

    Abstract: Establishing an automatic evaluation metric that closely aligns with human judgments is essential for effectively develo** image captioning models. Recent data-driven metrics have demonstrated a stronger correlation with human judgments than classic metrics such as CIDEr; however they lack sufficient capabilities to handle hallucinations and generalize across diverse images and texts partially b… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: CVPR 2024

  5. arXiv:2401.02721  [pdf, other

    cs.LG cs.AR

    A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE

    Authors: Ikumi Okubo, Keisuke Sugiura, Hiroki Matsutani

    Abstract: Transformer has been adopted to a wide range of tasks and shown to outperform CNNs and RNNs while it suffers from high training cost and computational complexity. To address these issues, a hybrid approach has become a recent research trend, which replaces a part of ResNet with an MHSA (Multi-Head Self-Attention). In this paper, we propose a lightweight hybrid model which uses Neural ODE (Ordinary… ▽ More

    Submitted 25 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  6. arXiv:2312.15844  [pdf, other

    cs.RO cs.CL cs.CV

    Learning-To-Rank Approach for Identifying Everyday Objects Using a Physical-World Search Engine

    Authors: Kanta Kaneda, Shunya Nagashima, Ryosuke Korekata, Motonari Kambara, Komei Sugiura

    Abstract: Domestic service robots offer a solution to the increasing demand for daily care and support. A human-in-the-loop approach that combines automation and operator intervention is considered to be a realistic approach to their use in society. Therefore, we focus on the task of retrieving target objects from open-vocabulary user instructions in a human-in-the-loop setting, which we define as the learn… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: Accepted for RAL 2023

  7. arXiv:2312.15138  [pdf, other

    cs.LG

    An FPGA-Based Accelerator for Graph Embedding using Sequential Training Algorithm

    Authors: Kazuki Sunaga, Keisuke Sugiura, Hiroki Matsutani

    Abstract: A graph embedding is an emerging approach that can represent a graph structure with a fixed-length low-dimensional vector. node2vec is a well-known algorithm to obtain such a graph embedding by sampling neighboring nodes on a given graph with a random walk technique. However, the original node2vec algorithm typically relies on a batch training of graph structures; thus, it is not suited for applic… ▽ More

    Submitted 29 April, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: RAW'24

  8. arXiv:2311.06855  [pdf, other

    cs.CV cs.CL cs.RO

    DialMAT: Dialogue-Enabled Transformer with Moment-Based Adversarial Training

    Authors: Kanta Kaneda, Ryosuke Korekata, Yuiga Wada, Shunya Nagashima, Motonari Kambara, Yui Iioka, Haruka Matsuo, Yuto Imai, Takayuki Nishimura, Komei Sugiura

    Abstract: This paper focuses on the DialFRED task, which is the task of embodied instruction following in a setting where an agent can actively ask questions about the task. To address this task, we propose DialMAT. DialMAT introduces Moment-based Adversarial Training, which incorporates adversarial perturbations into the latent space of language, image, and action. Additionally, it introduces a crossmodal… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: Accepted for presentation at Fourth Annual Embodied AI Workshop at CVPR

  9. arXiv:2311.04260  [pdf, other

    cs.RO cs.CL cs.CV

    Fully Automated Task Management for Generation, Execution, and Evaluation: A Framework for Fetch-and-Carry Tasks with Natural Language Instructions in Continuous Space

    Authors: Motonari Kambara, Komei Sugiura

    Abstract: This paper aims to develop a framework that enables a robot to execute tasks based on visual information, in response to natural language instructions for Fetch-and-Carry with Object Grounding (FCOG) tasks. Although there have been many frameworks, they usually rely on manually given instruction sentences. Therefore, evaluations have only been conducted with fixed tasks. Furthermore, many multimod… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted at presentation for CVPR 2023 Embodied AI Workshop

  10. arXiv:2311.04192  [pdf, other

    cs.CV cs.CL

    JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models

    Authors: Yuiga Wada, Kanta Kaneda, Komei Sugiura

    Abstract: Image captioning studies heavily rely on automatic evaluation metrics such as BLEU and METEOR. However, such n-gram-based metrics have been shown to correlate poorly with human evaluation, leading to the proposal of alternative metrics such as SPICE for English; however, no equivalent metrics have been established for other languages. Therefore, in this study, we propose an automatic evaluation me… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted at CoNLL 2023. Project page: https://yuiga.dev/jaspice/en

  11. arXiv:2307.08597  [pdf, other

    cs.CV cs.CL cs.RO

    Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions

    Authors: Yui Iioka, Yu Yoshida, Yuiga Wada, Shumpei Hatanaka, Komei Sugiura

    Abstract: In this study, we aim to develop a model that comprehends a natural language instruction (e.g., "Go to the living room and get the nearest pillow to the radio art on the wall") and generates a segmentation mask for the target everyday object. The task is challenging because it requires (1) the understanding of the referring expressions for multiple objects in the instruction, (2) the prediction of… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted for presentation at IROS2023

  12. arXiv:2307.07166  [pdf, other

    cs.RO cs.CL cs.CV

    Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks

    Authors: Ryosuke Korekata, Motonari Kambara, Yu Yoshida, Shintaro Ishikawa, Yosuke Kawasaki, Masaki Takahashi, Komei Sugiura

    Abstract: This paper describes a domestic service robot (DSR) that fetches everyday objects and carries them to specified destinations according to free-form natural language instructions. Given an instruction such as "Move the bottle on the left side of the plate to the empty chair," the DSR is expected to identify the bottle and the chair from multiple candidates in the environment and carry the target ob… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted for presentation at IROS2023

  13. arXiv:2307.05942  [pdf, other

    cs.RO cs.CL cs.CV

    Prototypical Contrastive Transfer Learning for Multimodal Language Understanding

    Authors: Seitaro Otsuki, Shintaro Ishikawa, Komei Sugiura

    Abstract: Although domestic service robots are expected to assist individuals who require support, they cannot currently interact smoothly with people through natural language. For example, given the instruction "Bring me a bottle from the kitchen," it is difficult for such robots to specify the bottle in an indoor environment. Most conventional models have been trained on real-world datasets that are labor… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: Accepted for presentation at IROS23

  14. arXiv:2306.17625  [pdf, other

    cs.RO cs.AR

    An Integrated FPGA Accelerator for Deep Learning-based 2D/3D Path Planning

    Authors: Keisuke Sugiura, Hiroki Matsutani

    Abstract: Path planning is a crucial component for realizing the autonomy of mobile robots. However, due to limited computational resources on mobile robots, it remains challenging to deploy state-of-the-art methods and achieve real-time performance. To address this, we propose P3Net (PointNet-based Path Planning Networks), a lightweight deep-learning-based method for 2D/3D path planning, and design an IP c… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: 25 pages, 17 figures

  15. arXiv:2306.13879  [pdf, other

    cs.LG

    Action Q-Transformer: Visual Explanation in Deep Reinforcement Learning with Encoder-Decoder Model using Action Query

    Authors: Hidenori Itaya, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Komei Sugiura

    Abstract: The excellent performance of Transformer in supervised learning has led to growing interest in its potential application to deep reinforcement learning (DRL) to achieve high performance on a wide variety of problems. However, the decision making of a DRL agent is a black box, which greatly hinders the application of the agent to real-world problems. To address this problem, we propose the Action Q… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: 16 pages, 8 figures, 3 tables

  16. arXiv:2305.12732  [pdf, other

    cs.DB

    Z-ordered Range Refinement for Multi-dimensional Range Queries

    Authors: Kento Sugiura, Yoshiharu Ishikawa

    Abstract: The z-order curve is a space-filling curve and is now attracting the interest of developers because of its simple and useful features. In the case of key-value stores, because the z-order curve achieves multi-dimensional range queries in one-dimensional z-ordered space, its use has been proposed for both academic and industrial purposes. However, z-ordered range queries suffer from wasteful query… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: 16 pages, 12 figures

  17. arXiv:2208.08613  [pdf, other

    cs.RO

    Visual Explanation of Deep Q-Network for Robot Navigation by Fine-tuning Attention Branch

    Authors: Yuya Maruyama, Hiroshi Fukui, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Komei Sugiura

    Abstract: Robot navigation with deep reinforcement learning (RL) achieves higher performance and performs well under complex environment. Meanwhile, the interpretation of the decision-making of deep RL models becomes a critical problem for more safety and reliability of autonomous robots. In this paper, we propose a visual explanation method based on an attention branch for deep RL models. We connect attent… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 8 pages, 8 figures, 1 table

  18. arXiv:2207.09083  [pdf, other

    cs.RO cs.CL cs.CV

    Relational Future Captioning Model for Explaining Likely Collisions in Daily Tasks

    Authors: Motonari Kambara, Komei Sugiura

    Abstract: Domestic service robots that support daily tasks are a promising solution for elderly or disabled people. It is crucial for domestic service robots to explain the collision risk before they perform actions. In this paper, our aim is to generate a caption about a future event. We propose the Relational Future Captioning Model (RFCM), a crossmodal language generation model for the future captioning… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: Accepted for presentation at ICIP2022

  19. arXiv:2204.00889  [pdf, other

    cs.RO cs.CL cs.CV

    Moment-based Adversarial Training for Embodied Language Comprehension

    Authors: Shintaro Ishikawa, Komei Sugiura

    Abstract: In this paper, we focus on a vision-and-language task in which a robot is instructed to execute household tasks. Given an instruction such as "Rinse off a mug and place it in the coffee maker," the robot is required to locate the mug, wash it, and put it in the coffee maker. This is challenging because the robot needs to break down the instruction sentences into subgoals and execute them in the co… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

    Comments: Accepted for presentation at ICPR2022

  20. arXiv:2203.05763  [pdf, other

    cs.RO cs.AR

    An Efficient Accelerator for Deep Learning-based Point Cloud Registration on FPGAs

    Authors: Keisuke Sugiura, Hiroki Matsutani

    Abstract: Point cloud registration is the basis for many robotic applications such as odometry and Simultaneous Localization And Map** (SLAM), which are increasingly important for autonomous mobile robots. Computational resources and power budgets are limited on these robots, thereby motivating the development of resource-efficient registration method on low-cost FPGAs. In this paper, we propose a novel a… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: 6 pages, 11 figures

  21. LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation

    Authors: Shoya Matsumori, Yuki Abe, Kosuke Shingyouchi, Komei Sugiura, Michita Imai

    Abstract: Text-guided image manipulation tasks have recently gained attention in the vision-and-language community. While most of the prior studies focused on single-turn manipulation, our goal in this paper is to address the more challenging multi-turn image manipulation (MTIM) task. Previous models for this task successfully generate images iteratively, given a sequence of instructions and a previously ge… ▽ More

    Submitted 2 June, 2022; v1 submitted 27 December, 2021; originally announced December 2021.

    Journal ref: IEEE Access, 9, 160521-160532 (2021)

  22. A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge Domain Adaptation on FPGAs

    Authors: Hiroki Kawakami, Hirohisa Watanabe, Keisuke Sugiura, Hiroki Matsutani

    Abstract: High-performance deep neural network (DNN)-based systems are in high demand in edge environments. Due to its high computational complexity, it is challenging to deploy DNNs on edge devices with strict limitations on computational resources. In this paper, we derive a compact while highly-accurate DNN model, termed dsODENet, by combining recently-proposed parameter reduction techniques: Neural ODE… ▽ More

    Submitted 17 March, 2023; v1 submitted 27 July, 2021; originally announced July 2021.

    Journal ref: IEICE Trans on Information and Systems (2023)

  23. arXiv:2107.00811  [pdf, other

    cs.RO cs.CL cs.CV

    Target-dependent UNITER: A Transformer-Based Multimodal Language Comprehension Model for Domestic Service Robots

    Authors: Shintaro Ishikawa, Komei Sugiura

    Abstract: Currently, domestic service robots have an insufficient ability to interact naturally through language. This is because understanding human instructions is complicated by various ambiguities and missing information. In existing methods, the referring expressions that specify the relationships between objects are insufficiently modeled. In this paper, we propose Target-dependent UNITER, which learn… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: Accepted for presentation at IROS2021

  24. arXiv:2107.00789  [pdf, other

    cs.RO cs.CL cs.CV

    Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions

    Authors: Motonari Kambara, Komei Sugiura

    Abstract: There have been many studies in robotics to improve the communication skills of domestic service robots. Most studies, however, have not fully benefited from recent advances in deep neural networks because the training datasets are not large enough. In this paper, our aim is to augment the datasets based on a crossmodal language generation model. We propose the Case Relation Transformer (CRT), whi… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: Accepted for presentation at IROS2021

  25. arXiv:2106.15550  [pdf, other

    cs.CV

    Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue

    Authors: Shoya Matsumori, Kosuke Shingyouchi, Yuki Abe, Yosuke Fukuchi, Komei Sugiura, Michita Imai

    Abstract: Building an interactive artificial intelligence that can ask questions about the real world is one of the biggest challenges for vision and language problems. In particular, goal-oriented visual dialogue, where the aim of the agent is to seek information by asking questions during a turn-taking dialogue, has been gaining scholarly attention recently. While several existing models based on the Gues… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

  26. arXiv:2103.09523  [pdf, other

    cs.RO cs.AR

    A Universal LiDAR SLAM Accelerator System on Low-cost FPGA

    Authors: Keisuke Sugiura, Hiroki Matsutani

    Abstract: LiDAR (Light Detection and Ranging) SLAM (Simultaneous Localization and Map**) serves as a basis for indoor cleaning, navigation, and many other useful applications in both industry and household. From a series of LiDAR scans, it constructs an accurate, globally consistent model of the environment and estimates a robot position inside it. SLAM is inherently computationally intensive; it is a cha… ▽ More

    Submitted 30 December, 2021; v1 submitted 17 March, 2021; originally announced March 2021.

  27. arXiv:2103.04067  [pdf, other

    cs.LG

    Visual Explanation using Attention Mechanism in Actor-Critic-based Deep Reinforcement Learning

    Authors: Hidenori Itaya, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Komei Sugiura

    Abstract: Deep reinforcement learning (DRL) has great potential for acquiring the optimal action in complex environments such as games and robot control. However, it is difficult to analyze the decision-making of the agent, i.e., the reasons it selects the action acquired by learning. In this work, we propose Mask-Attention A3C (Mask A3C), which introduces an attention mechanism into Asynchronous Advantage… ▽ More

    Submitted 6 March, 2021; originally announced March 2021.

    Comments: 20 pages, 19 figures

  28. arXiv:2103.00852  [pdf, other

    cs.RO cs.CV

    CrossMap Transformer: A Crossmodal Masked Path Transformer Using Double Back-Translation for Vision-and-Language Navigation

    Authors: Aly Magassouba, Komei Sugiura, Hisashi Kawai

    Abstract: Navigation guided by natural language instructions is particularly suitable for Domestic Service Robots that interacts naturally with users. This task involves the prediction of a sequence of actions that leads to a specified destination given a natural language navigation instruction. The task thus requires the understanding of instructions, such as ``Walk out of the bathroom and wait on the stai… ▽ More

    Submitted 21 August, 2023; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: 8 pages, 5 figures, 5 tables. Submitted to IEEE Robotics and Automation Letters

  29. arXiv:2102.06507  [pdf, other

    cs.RO cs.CV

    Predicting and Attending to Damaging Collisions for Placing Everyday Objects in Photo-Realistic Simulations

    Authors: Aly Magassouba, Komei Sugiura, Angelica Nakayama, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Hisashi Kawai

    Abstract: Placing objects is a fundamental task for domestic service robots (DSRs). Thus, inferring the collision-risk before a placing motion is crucial for achieving the requested task. This problem is particularly challenging because it is necessary to predict what happens if an object is placed in a cluttered designated area. We show that a rule-based approach that uses plane detection, to detect free a… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: 18 pages, 7 figures, 5 tables. Submitted to Advanced Robotics

  30. arXiv:2007.04557  [pdf, other

    cs.CV cs.RO

    Alleviating the Burden of Labeling: Sentence Generation by Attention Branch Encoder-Decoder Network

    Authors: Tadashi Ogura, Aly Magassouba, Komei Sugiura, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Hisashi Kawai

    Abstract: Domestic service robots (DSRs) are a promising solution to the shortage of home care workers. However, one of the main limitations of DSRs is their inability to interact naturally through language. Recently, data-driven approaches have been shown to be effective for tackling this limitation; however, they often require large-scale datasets, which is costly. Based on this background, we aim to perf… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

    Comments: 9 pages, 8 figures. accepted for IEEE Robotics and Automation Letters (RA-L) with presentation at IROS 2020

  31. An FPGA Acceleration and Optimization Techniques for 2D LiDAR SLAM Algorithm

    Authors: Keisuke Sugiura, Hiroki Matsutani

    Abstract: An efficient hardware implementation for Simultaneous Localization and Map** (SLAM) methods is of necessity for mobile autonomous robots with limited computational resources. In this paper, we propose a resource-efficient FPGA implementation for accelerating scan matching computations, which typically cause a major bottleneck in 2D LiDAR SLAM methods. Scan matching is a process of correcting a r… ▽ More

    Submitted 31 August, 2020; v1 submitted 29 May, 2020; originally announced June 2020.

  32. arXiv:1912.10675  [pdf, other

    cs.RO cs.CV

    A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects

    Authors: Aly Magassouba, Komei Sugiura, Hisashi Kawai

    Abstract: In this study, we focus on multimodal language understanding for fetching instructions in the domestic service robots context. This task consists of predicting a target object, as instructed by the user, given an image and an unstructured sentence, such as "Bring me the yellow box (from the wooden cabinet)." This is challenging because of the ambiguity of natural language, i.e., the relevant infor… ▽ More

    Submitted 24 December, 2019; v1 submitted 23 December, 2019; originally announced December 2019.

    Comments: 9 pages, 5 figures, accepted for IEEE Robotics and Automation Letters

  33. arXiv:1909.05664  [pdf, other

    cs.CV cs.AI cs.RO

    Multimodal Attention Branch Network for Perspective-Free Sentence Generation

    Authors: Aly Magassouba, Komei Sugiura, Hisashi Kawai

    Abstract: In this paper, we address the automatic sentence generation of fetching instructions for domestic service robots. Typical fetching commands such as "bring me the yellow toy from the upper part of the white shelf" includes referring expressions, i.e., "from the white upper part of the white shelf". To solve this task, we propose a multimodal attention branch network (Multi-ABN) which generates natu… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

    Comments: 10 pages, 4 figures. Accepted for CoRL 2019

  34. arXiv:1906.06830  [pdf, other

    cs.RO

    Understanding Natural Language Instructions for Fetching Daily Objects Using GAN-Based Multimodal Target-Source Classification

    Authors: Aly Magassouba, Komei Sugiura, Anh Trinh Quoc, Hisashi Kawai

    Abstract: In this paper, we address multimodal language understanding for unconstrained fetching instruction in domestic service robots context. A typical fetching instruction such as "Bring me the yellow toy from the white shelf" requires to infer the user intention, that is what object (target) to fetch and from where (source). To solve the task, we propose a Multimodal Target-source Classifier Model (MTC… ▽ More

    Submitted 16 June, 2019; originally announced June 2019.

    Comments: 8 pages, 6 figures, 5 tables. Accepted for RA-L with presentation at IROS 2019

  35. arXiv:1806.03847  [pdf, other

    cs.RO cs.CL

    A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions

    Authors: Aly Magassouba, Komei Sugiura, Hisashi Kawai

    Abstract: This paper focuses on a multimodal language understanding method for carry-and-place tasks with domestic service robots. We address the case of ambiguous instructions, that is, when the target area is not specified. For instance "put away the milk and cereal" is a natural instruction where there is ambiguity regarding the target area, considering environments in daily life. Conventionally, this in… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: 9 pages, 7 figures, accepted for IEEE Robotics and Automation Letters (RA-L)

  36. arXiv:1806.01065  [pdf, other

    cs.RO

    SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Networks by Drones

    Authors: Komei Sugiura

    Abstract: To meet the immediate needs of environmental monitoring or hazardous event detection, we consider the automatic deployment of a group of low-cost or disposable sensors by a drone. Introducing sensors by drones to an environment instead of humans has advantages in terms of worker safety and time requirements. In this study, we define "sensor scattering (SS)" as the problem of maximizing the informa… ▽ More

    Submitted 4 June, 2018; originally announced June 2018.

    Comments: Accepted to IEEE Robotics and Automation Letters

  37. arXiv:1801.05096  [pdf, other

    cs.RO cs.CL cs.HC

    Grounded Language Understanding for Manipulation Instructions Using GAN-Based Classification

    Authors: Komei Sugiura, Hisashi Kawai

    Abstract: The target task of this study is grounded language understanding for domestic service robots (DSRs). In particular, we focus on instruction understanding for short sentences where verbs are missing. This task is of critical importance to build communicative DSRs because manipulation is essential for DSRs. Existing instruction understanding methods usually estimate missing information only from non… ▽ More

    Submitted 15 January, 2018; originally announced January 2018.

    Comments: 6 pages, 3 figures, published at IEEE ASRU 2017