Skip to main content

Showing 1–22 of 22 results for author: Nishimura, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00985  [pdf, other

    cs.RO cs.CV

    Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models

    Authors: Takayuki Nishimura, Katsuyuki Kuyo, Motonari Kambara, Komei Sugiura

    Abstract: We consider the task of generating segmentation masks for the target object from an object manipulation instruction, which allows users to give open vocabulary instructions to domestic service robots. Conventional segmentation generation approaches often fail to account for objects outside the camera's field of view and cases in which the order of vertices differs but still represents the same pol… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted for presentation at IROS2024

  2. arXiv:2404.03161  [pdf, other

    cs.CV cs.CL cs.MM

    BioVL-QR: Egocentric Biochemical Video-and-Language Dataset Using Micro QR Codes

    Authors: Taichi Nishimura, Koki Yamamoto, Yuto Haneji, Keiya Kajimura, Chihiro Nishiwaki, Eriko Daikoku, Natsuko Okuda, Fumihito Ono, Hirotaka Kameko, Shinsuke Mori

    Abstract: This paper introduces a biochemical vision-and-language dataset, which consists of 24 egocentric experiment videos, corresponding protocols, and video-and-language alignments. The key challenge in the wet-lab domain is detecting equipment, reagents, and containers is difficult because the lab environment is scattered by filling objects on the table and some objects are indistinguishable. Therefore… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 6 pages

  3. arXiv:2404.02523  [pdf, other

    cs.CV cs.AI

    Text-driven Affordance Learning from Egocentric Vision

    Authors: Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura, Shinsuke Mori

    Abstract: Visual affordance learning is a key component for robots to understand how to interact with objects. Conventional approaches in this field rely on pre-defined objects and actions, falling short of capturing diverse interactions in realworld scenarios. The key idea of our approach is employing textual instruction, targeting various affordances for a wide range of objects. This approach covers both… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  4. arXiv:2403.16483  [pdf, other

    cs.CL

    Automatic Construction of a Large-Scale Corpus for Geoparsing Using Wikipedia Hyperlinks

    Authors: Keyaki Ohno, Hirotaka Kameko, Keisuke Shirai, Taichi Nishimura, Shinsuke Mori

    Abstract: Geoparsing is the task of estimating the latitude and longitude (coordinates) of location expressions in texts. Geoparsing must deal with the ambiguity of the expressions that indicate multiple locations with the same notation. For evaluating geoparsing systems, several corpora have been proposed in previous work. However, these corpora are small-scale and suffer from the coverage of location expr… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024

  5. Single-Motor Robotic Gripper with Multi-Surface Fingers for Variable Gras** Configurations

    Authors: Toshihiro Nishimura, Yosuke Suzuki, Tokuo Tsuj, Tetsuyou Watanabe

    Abstract: This study proposes a novel robotic gripper with variable gras** configurations for gras** various objects. The fingers of the developed gripper incorporate multiple different surfaces. The gripper possesses the function of altering the finger surfaces facing a target object by rotating the fingers in its longitudinal direction. In the proposed design equipped with two fingers, the two fingers… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Journal ref: in IEEE Robotics and Automation Letters, vol. 9, no. 5, pp. 4114-4121, May 2024

  6. arXiv:2401.09774  [pdf, other

    cs.MM cs.CL cs.CV cs.SD eess.AS

    On the Audio Hallucinations in Large Audio-Video Language Models

    Authors: Taichi Nishimura, Shota Nakada, Masayoshi Kondo

    Abstract: Large audio-video language models can generate descriptions for both video and audio. However, they sometimes ignore audio content, producing audio descriptions solely reliant on visual information. This paper refers to this as audio hallucinations and analyzes them in large audio-video language models. We gather 1,000 sentences by inquiring about audio information and annotate them whether they c… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: 6 pages

  7. arXiv:2312.00414  [pdf, other

    cs.CV cs.MM

    Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval

    Authors: Taichi Nishimura, Shota Nakada, Masayoshi Kondo

    Abstract: In this paper, we propose an efficient and high-performance method for partially relevant video retrieval, which aims to retrieve long videos that contain at least one moment relevant to the input text query. The challenge lies in encoding dense frames using visual backbones. This requires models to handle the increased frames, resulting in significant computation costs for long videos. To mitigat… ▽ More

    Submitted 11 March, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: 24 pages

  8. arXiv:2311.16444  [pdf, other

    cs.CV cs.CL

    Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos

    Authors: Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato

    Abstract: We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view. While dense video captioning (predicting time segments and their captions) is primarily studied with exocentric videos (e.g., YouCook2), benchmarks with egocentric videos are restricted due to data scarcity. To overcome… ▽ More

    Submitted 29 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

  9. arXiv:2311.06855  [pdf, other

    cs.CV cs.CL cs.RO

    DialMAT: Dialogue-Enabled Transformer with Moment-Based Adversarial Training

    Authors: Kanta Kaneda, Ryosuke Korekata, Yuiga Wada, Shunya Nagashima, Motonari Kambara, Yui Iioka, Haruka Matsuo, Yuto Imai, Takayuki Nishimura, Komei Sugiura

    Abstract: This paper focuses on the DialFRED task, which is the task of embodied instruction following in a setting where an agent can actively ask questions about the task. To address this task, we propose DialMAT. DialMAT introduces Moment-based Adversarial Training, which incorporates adversarial perturbations into the latent space of language, image, and action. Additionally, it introduces a crossmodal… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: Accepted for presentation at Fourth Annual Embodied AI Workshop at CVPR

  10. Lightweight High-Speed and High-Force Gripper for Assembly

    Authors: Toshihiro Nishimura, Takeshi Takaki, Yosuke Suzuki, Tokuo Tsuji, Tetsuyou Watanabe

    Abstract: This paper presents a novel industrial robotic gripper with a high gras** speed (maximum: 1396 mm/s), high tip force (maximum: 80 N) for gras**, large motion range, and lightweight design (0.3 kg). To realize these features, the high-speed section of the quick-return mechanism and load-sensitive continuously variable transmission mechanism are installed in the gripper. The gripper is also equi… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  11. Single-Motor Robotic Gripper With Three Functional Modes for Gras** in Confined Spaces

    Authors: Toshihiro Nishimura, Tetsuyou Watanabe

    Abstract: This study proposes a novel robotic gripper driven by a single motor. The main task is to pick up objects in confined spaces. For this purpose, the developed gripper has three operating modes: gras**, finger-bending, and pull-in modes. Using these three modes, the developed gripper can rotate and translate a grasped object, i.e., can perform in-hand manipulation. This in-hand manipulation is eff… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  12. arXiv:2306.13894  [pdf, other

    cs.RO eess.SY

    OUXT Polaris: Autonomous Navigation System for the 2022 Maritime RobotX Challenge

    Authors: Kenta Okamoto, Akihisa Nagata, Kyoma Arai, Yusei Nagao, Tatsuki Nishimura, Kento Hirogaki, Shunya Tanaka, Masato Kobayashi, Tatsuya Sanada, Masaya Kataoka

    Abstract: OUXT-Polaris has been develo** an autonomous navigation system by participating in the Maritime RobotX Challenge 2014, 2016, and 2018. In this paper, we describe the improvement of the previous vessel system. We also indicate the advantage of the improved design. Moreover, we describe the develo** method under Covid-19 using simulation / miniture-size hardware and the feature components for th… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: Technical Design Paper of 2022 Maritime RobotX Challenge

  13. Flexible and slim device switching air blowing and suction by a single airflow control

    Authors: Seita Nojiri, Toshihiro Nishimura, Kenjiro Tadakuma, Tetsuyou Watanabe

    Abstract: This study proposes a soft robotic device with a slim and flexible body that switches between air blowing and suction with a single airflow control. Suction is achieved by jet flow entraining surrounding air, and blowing is achieved by blocking and reversing jet flow. The thin and flexible flap gate enables the switching. Air flow is blocked while the gate is closed and passes through while the ga… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  14. High-payload and self-adaptive robotic hand with 1-degree-of-freedom translation/rotation switching mechanism

    Authors: Toshihiro Nishimura, Tsubasa Muryoe, Tetsuyou Watanabe

    Abstract: This study proposes a novel robotic hand that can achieve self-adaptive gras** and a large payload (over 20 kg) with a single actuator. Accordingly, two novel mechanisms, an actuation system with self-motion switching and a self-adaptive finger with a self-locking mechanism, are installed in a 1-degree-of-freedom robotic hand. The actuation system switches the output motion from translational to… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  15. Near-optimal stochastic MIMO signal detection with a mixture of t-distribution prior

    Authors: Junichiro Hagiwara, Kazushi Matsumura, Hiroki Asumi, Yukiko Kasuga, Toshihiko Nishimura, Takanori Sato, Yasutaka Ogawa, Takeo Ohgane

    Abstract: Multiple-input multiple-output (MIMO) systems will play a crucial role in future wireless communication, but improving their signal detection performance to increase transmission efficiency remains a challenge. To address this issue, we propose extending the discrete signal detection problem in MIMO systems to a continuous one and applying the Hamiltonian Monte Carlo method, an efficient Markov ch… ▽ More

    Submitted 7 March, 2024; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: Published in the 2023 IEEE Global Communications Conference (GLOBECOM)

  16. 1-degree-of-freedom Robotic Gripper With Infinite Self-Twist Function

    Authors: Toshihiro Nishimura, Yosuke Suzuki, Tokuo Tsuji, Tetsuyou Watanabe

    Abstract: This study proposed a novel robotic gripper that can achieve gras** and infinite wrist twisting motions using a single actuator. The gripper is equipped with a differential gear mechanism that allows switching between the gras** and twisting motions according to the magnitude of the tip force applied to the finger. The gras** motion is activated when the tip force is below a set value, and t… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  17. Single-Fingered Reconfigurable Robotic Gripper With a Folding Mechanism for Narrow Working Spaces

    Authors: Toshihiro Nishimura, Tsubasa Muryoe, Yoshitatsu Asama, Hiroki Ikeuchi, Ryo Toshima, Tetsuyou Watanabe

    Abstract: This letter proposes a novel single-fingered reconfigurable robotic gripper for gras** objects in narrow working spaces. The finger of the developed gripper realizes two configurations, namely, the insertion and gras** modes, using only a single motor. In the insertion mode, the finger assumes a thin shape such that it can insert its tip into a narrow space. The gras** mode of the finger is… ▽ More

    Submitted 21 November, 2022; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: This study was presented at IROS 2022

    Journal ref: IEEE Robotics and Automation Letters, Vol.7, No.4 (2022) 10192-10199

  18. arXiv:2209.10134  [pdf, other

    cs.MM cs.CL cs.CV

    Recipe Generation from Unsegmented Cooking Videos

    Authors: Taichi Nishimura, Atsushi Hashimoto, Yoshitaka Ushiku, Hirotaka Kameko, Shinsuke Mori

    Abstract: This paper tackles recipe generation from unsegmented cooking videos, a task that requires agents to (1) extract key events in completing the dish and (2) generate sentences for the extracted events. Our task is similar to dense video captioning (DVC), which aims at detecting events thoroughly and generating sentences for them. However, unlike DVC, in recipe generation, recipe story awareness is c… ▽ More

    Submitted 18 February, 2024; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: Accepted at ACM TOMM; ACM Transactions on Multimedia Computing, Communications, and Applications

  19. arXiv:2209.05840  [pdf, other

    cs.CL cs.AI

    Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows

    Authors: Keisuke Shirai, Atsushi Hashimoto, Taichi Nishimura, Hirotaka Kameko, Shuhei Kurita, Yoshitaka Ushiku, Shinsuke Mori

    Abstract: We present a new multimodal dataset called Visual Recipe Flow, which enables us to learn each cooking action result in a recipe text. The dataset consists of object state changes and the workflow of the recipe text. The state change is represented as an image pair, while the workflow is represented as a recipe flow graph (r-FG). The image pairs are grounded in the r-FG, which provides the cross-mo… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: COLING 2022

  20. Soft robotic hand with finger-bending/friction-reduction switching mechanism through 1-degree-of-freedom flow control

    Authors: Toshihiro Nishimura, Kensuke Shimizu, Seita Nojiri, Kenjiro Tadakuma, Yosuke Suzuki, Tokuo Tsuji, Tetsuyou Watanabe

    Abstract: This paper proposes a novel pneumatic soft robotic hand that incorporates a mechanism that can switch the airflow path using a single airflow control. The developed hand can control the finger motion and operate the surface friction variable mechanism. In the friction variable mechanism, a lubricant is injected onto the high-friction finger surface to reduce surface friction. To inject the lubrica… ▽ More

    Submitted 27 March, 2022; originally announced March 2022.

    Journal ref: IEEE Robotics and Automation Letters (2022)(Early Access)

  21. arXiv:1909.11274  [pdf, other

    cs.LG stat.ML

    Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network

    Authors: Taiji Suzuki, Hiroshi Abe, Tomoaki Nishimura

    Abstract: One of the biggest issues in deep learning theory is the generalization ability of networks with huge model size. The classical learning theory suggests that overparameterized models cause overfitting. However, practically used large deep models avoid overfitting, which is not well explained by the classical approaches. To resolve this issue, several attempts have been made. Among them, the compre… ▽ More

    Submitted 21 June, 2020; v1 submitted 24 September, 2019; originally announced September 2019.

    Comments: published in ICLR2020

  22. arXiv:1808.08558  [pdf, other

    stat.ML cs.LG

    Spectral Pruning: Compressing Deep Neural Networks via Spectral Analysis and its Generalization Error

    Authors: Taiji Suzuki, Hiroshi Abe, Tomoya Murata, Shingo Horiuchi, Kotaro Ito, Tokuma Wachi, So Hirai, Masatoshi Yukishima, Tomoaki Nishimura

    Abstract: Compression techniques for deep neural network models are becoming very important for the efficient execution of high-performance deep learning systems on edge-computing devices. The concept of model compression is also important for analyzing the generalization error of deep learning, known as the compression-based error bound. However, there is still huge gap between a practically effective comp… ▽ More

    Submitted 13 July, 2020; v1 submitted 26 August, 2018; originally announced August 2018.

    Comments: 17 pages, 4 figures. Accepted in IJCAI-PRICAI 2020. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pages 2839--2846