Skip to main content

Showing 1–26 of 26 results for author: Morishima, S

.
  1. arXiv:2405.07060  [pdf, other

    cs.RO

    Memory-Maze: Scenario Driven Benchmark and Visual Language Navigation Model for Guiding Blind People

    Authors: Masaki Kuribayashi, Kohei Uehara, Allan Wang, Daisuke Sato, Simon Chu, Shigeo Morishima

    Abstract: Visual Language Navigation (VLN) powered navigation robots have the potential to guide blind people by understanding and executing route instructions provided by sighted passersby. This capability allows robots to operate in environments that are often unknown a priori. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  2. Gaze-Driven Sentence Simplification for Language Learners: Enhancing Comprehension and Readability

    Authors: Taichi Higasa, Keitaro Tanaka, Qi Feng, Shigeo Morishima

    Abstract: Language learners should regularly engage in reading challenging materials as part of their study routine. Nevertheless, constantly referring to dictionaries is time-consuming and distracting. This paper presents a novel gaze-driven sentence simplification system designed to enhance reading comprehension while maintaining their focus on the content. Our system incorporates machine learning models… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: Accepted by ACM ICMI 2023 workshops (Multimodal, Interactive Interfaces for Education)

  3. arXiv:2309.10375  [pdf, other

    cs.CV

    Pointing out Human Answer Mistakes in a Goal-Oriented Visual Dialogue

    Authors: Ryosuke Oshima, Seitaro Shinagawa, Hideki Tsunashima, Qi Feng, Shigeo Morishima

    Abstract: Effective communication between humans and intelligent agents has promising applications for solving complex problems. One such approach is visual dialogue, which leverages multimodal context to assist humans. However, real-world scenarios occasionally involve human mistakes, which can cause intelligent agents to fail. While most prior research assumes perfect answers from human interlocutors, we… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted at ICCVW 2023

  4. arXiv:2308.13042  [pdf, other

    cs.CV cs.HC

    Enhancing Perception and Immersion in Pre-Captured Environments through Learning-Based Eye Height Adaptation

    Authors: Qi Feng, Hubert P. H. Shum, Shigeo Morishima

    Abstract: Pre-captured immersive environments using omnidirectional cameras provide a wide range of virtual reality applications. Previous research has shown that manipulating the eye height in egocentric virtual environments can significantly affect distance perception and immersion. However, the influence of eye height in pre-captured real environments has received less attention due to the difficulty of… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: 10 pages, 13 figures, 3 tables, submitted to ISMAR 2023

  5. arXiv:2306.06495  [pdf, other

    eess.AS cs.SD

    Audio-Visual Speech Enhancement With Selective Off-Screen Speech Extraction

    Authors: Tomoya Yoshinaga, Keitaro Tanaka, Shigeo Morishima

    Abstract: This paper describes an audio-visual speech enhancement (AV-SE) method that estimates from noisy input audio a mixture of the speech of the speaker appearing in an input video (on-screen target speech) and of a selected speaker not appearing in the video (off-screen target speech). Although conventional AV-SE methods have suppressed all off-screen sounds, it is necessary to listen to a specific pr… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

    Comments: Accepted by EUSIPCO 2023

  6. Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning

    Authors: Sara Kashiwagi, Keitaro Tanaka, Qi Feng, Shigeo Morishima

    Abstract: This paper presents a novel metric learning approach to address the performance gap between normal and silent speech in visual speech recognition (VSR). The difference in lip movements between the two poses a challenge for existing VSR models, which exhibit degraded accuracy when applied to silent speech. To solve this issue and tackle the scarcity of training data for silent speech, we propose to… ▽ More

    Submitted 16 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023

  7. arXiv:2304.07087  [pdf, other

    cs.CV cs.LG

    Memory Efficient Diffusion Probabilistic Models via Patch-based Generation

    Authors: Shinei Arakawa, Hideki Tsunashima, Daichi Horita, Keitaro Tanaka, Shigeo Morishima

    Abstract: Diffusion probabilistic models have been successful in generating high-quality and diverse images. However, traditional models, whose input and output are high-resolution images, suffer from excessive memory requirements, making them less practical for edge devices. Previous approaches for generative adversarial networks proposed a patch-based method that uses positional encoding and global conten… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Comments: Accepted to the Generative Models for Computer Vision workshop at CVPR 2023

  8. arXiv:2303.02930  [pdf, other

    cs.CV

    Scapegoat Generation for Privacy Protection from Deepfake

    Authors: Gido Kato, Yoshihiro Fukuhara, Mariko Isogawa, Hideki Tsunashima, Hirokatsu Kataoka, Shigeo Morishima

    Abstract: To protect privacy and prevent malicious use of deepfake, current studies propose methods that interfere with the generation process, such as detection and destruction approaches. However, these methods suffer from sub-optimal generalization performance to unseen models and add undesirable noise to the original image. To address these problems, we propose a new problem formulation for deepfake pre… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: 5 pages, 5 figures

    MSC Class: 68T07

  9. Event-based Camera Simulation using Monte Carlo Path Tracing with Adaptive Denoising

    Authors: Yuta Tsuji, Tatsuya Yatagawa, Hiroyuki Kubo, Shigeo Morishima

    Abstract: This paper presents an algorithm to obtain an event-based video from noisy frames given by physics-based Monte Carlo path tracing over a synthetic 3D scene. Given the nature of dynamic vision sensor (DVS), rendering event-based video can be viewed as a process of detecting the changes from noisy brightness values. We extend a denoising method based on a weighted local regression (WLR) to detect th… ▽ More

    Submitted 22 August, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Comments: 8 pages, 6 figures, 3 tables

    Journal ref: Proceedings of the IEEE International Conference on Image Processing (ICCP) 2023

  10. arXiv:2301.06816  [pdf, other

    cs.GR

    A Combined Finite Element and Finite Volume Method for Liquid Simulation

    Authors: Tatsuya Koike, Shigeo Morishima, Ryoichi Ando

    Abstract: We introduce a new Eulerian simulation framework for liquid animation that leverages both finite element and finite volume methods. In contrast to previous methods where the whole simulation domain is discretized either using the finite volume method or finite element method, our method spatially merges them together using two types of discretization being tightly coupled on its seams while enforc… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Comments: 13 pages, 8 figures

  11. arXiv:2207.09425  [pdf, other

    cs.CV

    Geometric Features Informed Multi-person Human-object Interaction Recognition in Videos

    Authors: Tanqiu Qiao, Qianhui Men, Frederick W. B. Li, Yoshiki Kubotani, Shigeo Morishima, Hubert P. H. Shum

    Abstract: Human-Object Interaction (HOI) recognition in videos is important for analyzing human activity. Most existing work focusing on visual features usually suffer from occlusion in the real-world scenarios. Such a problem will be further complicated when multiple people and objects are involved in HOIs. Consider that geometric features such as human pose and object position provide meaningful informati… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV 2022

  12. arXiv:2203.15991  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    The Sound of Bounding-Boxes

    Authors: Takashi Oya, Shohei Iwase, Shigeo Morishima

    Abstract: In the task of audio-visual sound source separation, which leverages visual information for sound source separation, identifying objects in an image is a crucial step prior to separating the sound source. However, existing methods that assign sound on detected bounding boxes suffer from a problem that their approach heavily relies on pre-trained object detectors. Specifically, when using these exi… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: 6 pages, 5 figures, ICPR (International Conference on Pattern Recognition) 2022

  13. arXiv:2203.09109  [pdf, other

    cs.CV cs.CL

    Community-Driven Comprehensive Scientific Paper Summarization: Insight from cvpaper.challenge

    Authors: Shintaro Yamamoto, Hirokatsu Kataoka, Ryota Suzuki, Seitaro Shinagawa, Shigeo Morishima

    Abstract: The present paper introduces a group activity involving writing summaries of conference proceedings by volunteer participants. The rapid increase in scientific papers is a heavy burden for researchers, especially non-native speakers, who need to survey scientific literature. To alleviate this problem, we organized a group of non-native English speakers to write summaries of papers presented at a c… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

  14. arXiv:2202.08010  [pdf, other

    cs.CV

    360 Depth Estimation in the Wild -- The Depth360 Dataset and the SegFuse Network

    Authors: Qi Feng, Hubert P. H. Shum, Shigeo Morishima

    Abstract: Single-view depth estimation from omnidirectional images has gained popularity with its wide range of applications such as autonomous driving and scene reconstruction. Although data-driven learning-based methods demonstrate significant potential in this field, scarce training data and ineffective 360 estimation algorithms are still two key limitations hindering accurate estimation across diverse d… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

    Comments: 10 pages, 10 figures, 5 tables, submitted to IEEE VR 2022

    ACM Class: I.2.10

  15. arXiv:2108.00268  [pdf, other

    cs.AI cs.CY cs.LG

    RLTutor: Reinforcement Learning Based Adaptive Tutoring System by Modeling Virtual Student with Fewer Interactions

    Authors: Yoshiki Kubotani, Yoshihiro Fukuhara, Shigeo Morishima

    Abstract: A major challenge in the field of education is providing review schedules that present learned items at appropriate intervals to each student so that memory is retained over time. In recent years, attempts have been made to formulate item reviews as sequential decision-making problems to realize adaptive instruction based on the knowledge state of students. It has been reported previously that rei… ▽ More

    Submitted 31 July, 2021; originally announced August 2021.

    Comments: Accepted in AI4EDU workshop at IJCAI2021. The official code is available on https://github.com/YoshikiKubotani/rltutor

  16. arXiv:2102.00845  [pdf, other

    cs.CL

    LSTM-SAKT: LSTM-Encoded SAKT-like Transformer for Knowledge Tracing

    Authors: Takashi Oya, Shigeo Morishima

    Abstract: This paper introduces the 2nd place solution for the Riiid! Answer Correctness Prediction in Kaggle, the world's largest data science competition website. This competition was held from October 16, 2020, to January 7, 2021, with 3395 teams and 4387 competitors. The main insights and contributions of this paper are as follows. (i) We pointed out existing Transformer-based models are suffering from… ▽ More

    Submitted 10 February, 2021; v1 submitted 28 January, 2021; originally announced February 2021.

    Comments: 4 pages, 3 figures, the paper at AAAI 2021 Workshop on AI Education https://sites.google.com/view/tipce-2021/home

  17. arXiv:2012.11213  [pdf, ps, other

    cs.IR cs.CL

    Self-Supervised Learning for Visual Summary Identification in Scientific Publications

    Authors: Shintaro Yamamoto, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš, Shigeo Morishima

    Abstract: Providing visual summaries of scientific publications can increase information access for readers and thereby help deal with the exponential growth in the number of scientific publications. Nonetheless, efforts in providing visual publication summaries have been few and far apart, primarily focusing on the biomedical domain. This is primarily because of the limited availability of annotated gold s… ▽ More

    Submitted 14 January, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

  18. arXiv:2007.05722  [pdf, other

    cs.CV cs.SD eess.AS

    Do We Need Sound for Sound Source Localization?

    Authors: Takashi Oya, Shohei Iwase, Ryota Natsume, Takahiro Itazuri, Shugo Yamaguchi, Shigeo Morishima

    Abstract: During the performance of sound source localization which uses both visual and aural information, it presently remains unclear how much either image or sound modalities contribute to the result, i.e. do we need both image and sound for sound source localization? To address this question, we develop an unsupervised learning system that solves sound source localization by decomposing this task into… ▽ More

    Submitted 11 July, 2020; originally announced July 2020.

    Comments: Paper: 14 pages, 6 figures. Supplementary Material: 6 pages, 3 figures. Videos and Codes will be released later

  19. arXiv:2004.03811  [pdf, other

    cs.CV

    MirrorNet: A Deep Bayesian Approach to Reflective 2D Pose Estimation from Human Images

    Authors: Takayuki Nakatsuka, Kazuyoshi Yoshii, Yuki Koyama, Satoru Fukayama, Masataka Goto, Shigeo Morishima

    Abstract: This paper proposes a statistical approach to 2D pose estimation from human images. The main problems with the standard supervised approach, which is based on a deep recognition (image-to-pose) model, are that it often yields anatomically implausible poses, and its performance is limited by the amount of paired data. To solve these problems, we propose a semi-supervised method that can make effect… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: 19 pages

  20. arXiv:1905.07666  [pdf, other

    cs.CV

    What Do Adversarially Robust Models Look At?

    Authors: Takahiro Itazuri, Yoshihiro Fukuhara, Hirokatsu Kataoka, Shigeo Morishima

    Abstract: In this paper, we address the open question: "What do adversarially robust models look at?" Recently, it has been reported in many works that there exists the trade-off between standard accuracy and adversarial robustness. According to prior works, this trade-off is rooted in the fact that adversarially robust and standard accurate models might depend on very different sets of features. However, i… ▽ More

    Submitted 18 May, 2019; originally announced May 2019.

  21. arXiv:1905.05172  [pdf, other

    cs.CV cs.GR

    PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

    Authors: Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, Hao Li

    Abstract: We introduce Pixel-aligned Implicit Function (PIFu), a highly effective implicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D object. Using PIFu, we propose an end-to-end deep learning method for digitizing highly detailed clothed humans that can infer both 3D surface and texture from a single image, and optionally, multiple input images.… ▽ More

    Submitted 3 December, 2019; v1 submitted 13 May, 2019; originally announced May 2019.

    Comments: project page: https://shunsukesaito.github.io/PIFu

    Journal ref: The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 2304-2314

  22. arXiv:1901.00049  [pdf, other

    cs.CV

    SiCloPe: Silhouette-Based Clothed People

    Authors: Ryota Natsume, Shunsuke Saito, Zeng Huang, Weikai Chen, Chongyang Ma, Hao Li, Shigeo Morishima

    Abstract: We introduce a new silhouette-based representation for modeling clothed human bodies using deep generative models. Our method can reconstruct a complete and textured 3D model of a person wearing clothes from a single input picture. Inspired by the visual hull algorithm, our implicit representation uses 2D silhouettes and 3D joints of a body pose to describe the immense shape complexity and variati… ▽ More

    Submitted 10 April, 2019; v1 submitted 31 December, 2018; originally announced January 2019.

  23. FSNet: An Identity-Aware Generative Model for Image-based Face Swap**

    Authors: Ryota Natsume, Tatsuya Yatagawa, Shigeo Morishima

    Abstract: This paper presents FSNet, a deep generative model for image-based face swap**. Traditionally, face-swap** methods are based on three-dimensional morphable models (3DMMs), and facial textures are replaced between the estimated three-dimensional (3D) geometries in two images of different individuals. However, the estimation of 3D geometries along with different lighting conditions using 3DMMs i… ▽ More

    Submitted 30 November, 2018; originally announced November 2018.

    Comments: 20pages, Asian Conference of Computer Vision 2018

  24. arXiv:1811.06943  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Automatic Paper Summary Generation from Visual and Textual Information

    Authors: Shintaro Yamamoto, Yoshihiro Fukuhara, Ryota Suzuki, Shigeo Morishima, Hirokatsu Kataoka

    Abstract: Due to the recent boom in artificial intelligence (AI) research, including computer vision (CV), it has become impossible for researchers in these fields to keep up with the exponentially increasing number of manuscripts. In response to this situation, this paper proposes the paper summary generation (PSG) task using a simple but effective method to automatically generate an academic paper summary… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

    Comments: International Conference on Machine Vision 2018, Munich, Germany

  25. arXiv:1809.08391  [pdf, other

    cs.CV cs.AI

    Understanding Fake Faces

    Authors: Ryota Natsume, Kazuki Inoue, Yoshihiro Fukuhara, Shintaro Yamamoto, Shigeo Morishima, Hirokatsu Kataoka

    Abstract: Face recognition research is one of the most active topics in computer vision (CV), and deep neural networks (DNN) are now filling the gap between human-level and computer-driven performance levels in face verification algorithms. However, although the performance gap appears to be narrowing in terms of accuracy-based expectations, a curious question has arisen; specifically, "Face understanding o… ▽ More

    Submitted 22 September, 2018; originally announced September 2018.

    Comments: 11 pages, 3 figures, ECCV 2018 Workshop on Brain-Driven Computer Vision (BDCV)

  26. RSGAN: Face Swap** and Editing using Face and Hair Representation in Latent Spaces

    Authors: Ryota Natsume, Tatsuya Yatagawa, Shigeo Morishima

    Abstract: In this paper, we present an integrated system for automatically generating and editing face images through face swap**, attribute-based editing, and random face parts synthesis. The proposed system is based on a deep neural network that variationally learns the face and hair regions with large-scale face image datasets. Different from conventional variational methods, the proposed network repre… ▽ More

    Submitted 18 April, 2018; v1 submitted 10 April, 2018; originally announced April 2018.