Search | arXiv e-print repository

Privacy Protection and Video Manipulation in Immersive Media

Authors: Leslie Wöhler, Satoshi Ikehata, Kiyoharu Aizawa

Abstract: In comparison to traditional footage, 360° videos can convey engaging, immersive experiences and even be utilized to create interactive virtual environments. Like regular recordings, these videos need to consider the privacy of recorded people and could be targets for video manipulations. However, due to their properties like enhanced presence, the effects on users might differ from traditional, n… ▽ More In comparison to traditional footage, 360° videos can convey engaging, immersive experiences and even be utilized to create interactive virtual environments. Like regular recordings, these videos need to consider the privacy of recorded people and could be targets for video manipulations. However, due to their properties like enhanced presence, the effects on users might differ from traditional, non-immersive content. Therefore, we are interested in how changes of real-world footage like adding privacy protection or applying video manipulations could mitigate or introduce harm in the resulting immersive media. △ Less

Submitted 23 April, 2024; originally announced May 2024.

Comments: This is an accepted position statement of CHI 2024 Workshop (Novel Approaches for Understanding and Mitigating Emerging New Harms in Immersive and Embodied Virtual Spaces: A Workshop at CHI 2024)

arXiv:2403.16141 [pdf, other]

Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes

Authors: Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa

Abstract: Recent advancements in the study of Neural Radiance Fields (NeRF) for dynamic scenes often involve explicit modeling of scene dynamics. However, this approach faces challenges in modeling scene dynamics in urban environments, where moving objects of various categories and scales are present. In such settings, it becomes crucial to effectively eliminate moving objects to accurately reconstruct stat… ▽ More Recent advancements in the study of Neural Radiance Fields (NeRF) for dynamic scenes often involve explicit modeling of scene dynamics. However, this approach faces challenges in modeling scene dynamics in urban environments, where moving objects of various categories and scales are present. In such settings, it becomes crucial to effectively eliminate moving objects to accurately reconstruct static backgrounds. Our research introduces an innovative method, termed here as Entity-NeRF, which combines the strengths of knowledge-based and statistical strategies. This approach utilizes entity-wise statistics, leveraging entity segmentation and stationary entity classification through thing/stuff segmentation. To assess our methodology, we created an urban scene dataset masked with moving objects. Our comprehensive experiments demonstrate that Entity-NeRF notably outperforms existing techniques in removing moving objects and reconstructing static urban backgrounds, both quantitatively and qualitatively. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024), Project website: https://otonari726.github.io/entitynerf/

arXiv:2303.15724 [pdf, other]

Scalable, Detailed and Mask-Free Universal Photometric Stereo

Authors: Satoshi Ikehata

Abstract: In this paper, we introduce SDM-UniPS, a groundbreaking Scalable, Detailed, Mask-free, and Universal Photometric Stereo network. Our approach can recover astonishingly intricate surface normal maps, rivaling the quality of 3D scanners, even when images are captured under unknown, spatially-varying lighting conditions in uncontrolled environments. We have extended previous universal photometric ste… ▽ More In this paper, we introduce SDM-UniPS, a groundbreaking Scalable, Detailed, Mask-free, and Universal Photometric Stereo network. Our approach can recover astonishingly intricate surface normal maps, rivaling the quality of 3D scanners, even when images are captured under unknown, spatially-varying lighting conditions in uncontrolled environments. We have extended previous universal photometric stereo networks to extract spatial-light features, utilizing all available information in high-resolution input images and accounting for non-local interactions among surface points. Moreover, we present a new synthetic training dataset that encompasses a diverse range of shapes, materials, and illumination scenarios found in real-world scenes. Through extensive evaluation, we demonstrate that our method not only surpasses calibrated, lighting-specific techniques on public benchmarks, but also excels with a significantly smaller number of input images even without object masks. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: CVPR 2023 (Highlight). The source code will be available at https://github.com/satoshi-ikehata/SDM-UniPS-CVPR2023

arXiv:2212.03635 [pdf, other]

Non-uniform Sampling Strategies for NeRF on 360{\textdegree} images

Authors: Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa

Abstract: In recent years, the performance of novel view synthesis using perspective images has dramatically improved with the advent of neural radiance fields (NeRF). This study proposes two novel techniques that effectively build NeRF for 360{\textdegree} omnidirectional images. Due to the characteristics of a 360{\textdegree} image of ERP format that has spatial distortion in their high latitude regions… ▽ More In recent years, the performance of novel view synthesis using perspective images has dramatically improved with the advent of neural radiance fields (NeRF). This study proposes two novel techniques that effectively build NeRF for 360{\textdegree} omnidirectional images. Due to the characteristics of a 360{\textdegree} image of ERP format that has spatial distortion in their high latitude regions and a 360{\textdegree} wide viewing angle, NeRF's general ray sampling strategy is ineffective. Hence, the view synthesis accuracy of NeRF is limited and learning is not efficient. We propose two non-uniform ray sampling schemes for NeRF to suit 360{\textdegree} images - distortion-aware ray sampling and content-aware ray sampling. We created an evaluation dataset Synth360 using Replica and SceneCity models of indoor and outdoor scenes, respectively. In experiments, we show that our proposal successfully builds 360{\textdegree} image NeRF in terms of both accuracy and efficiency. The proposal is widely applicable to advanced variants of NeRF. DietNeRF, AugNeRF, and NeRF++ combined with the proposed techniques further improve the performance. Moreover, we show that our proposed method enhances the quality of real-world scenes in 360{\textdegree} images. Synth360: https://drive.google.com/drive/folders/1suL9B7DO2no21ggiIHkH3JF3OecasQLb. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: Accepted at the 33rd British Machine Vision Conference (BMVC) 2022

arXiv:2211.11386 [pdf, other]

PS-Transformer: Learning Sparse Photometric Stereo Network using Self-Attention Mechanism

Authors: Satoshi Ikehata

Abstract: Existing deep calibrated photometric stereo networks basically aggregate observations under different lights based on the pre-defined operations such as linear projection and max pooling. While they are effective with the dense capture, simple first-order operations often fail to capture the high-order interactions among observations under small number of different lights. To tackle this issue, th… ▽ More Existing deep calibrated photometric stereo networks basically aggregate observations under different lights based on the pre-defined operations such as linear projection and max pooling. While they are effective with the dense capture, simple first-order operations often fail to capture the high-order interactions among observations under small number of different lights. To tackle this issue, this paper presents a deep sparse calibrated photometric stereo network named {\it PS-Transformer} which leverages the learnable self-attention mechanism to properly capture the complex inter-image interactions. PS-Transformer builds upon the dual-branch design to explore both pixel-wise and image-wise features and individual feature is trained with the intermediate surface normal supervision to maximize geometric feasibility. A new synthetic dataset named CyclesPS+ is also presented with the comprehensive analysis to successfully train the photometric stereo networks. Extensive results on the publicly available benchmark datasets demonstrate that the surface normal prediction accuracy of the proposed method significantly outperforms other state-of-the-art algorithms with the same number of input images and is even comparable to that of dense algorithms which input 10$\times$ larger number of images. △ Less

Submitted 21 November, 2022; originally announced November 2022.

Comments: BMVC2021. Code and Supplementary are available at https://github.com/satoshi-ikehata/PS-Transformer-BMVC2021

Journal ref: BMVC. Vol. 2. No. 4. 2021

arXiv:2209.03656 [pdf, other]

doi 10.1109/ACCESS.2022.3200486

Saliency-based Multiple Region of Interest Detection from a Single 360° image

Authors: Yuuki Sawabe, Satoshi Ikehata, Kiyoharu Aizawa

Abstract: 360° images are informative -- it contains omnidirectional visual information around the camera. However, the areas that cover a 360° image is much larger than the human's field of view, therefore important information in different view directions is easily overlooked. To tackle this issue, we propose a method for predicting the optimal set of Region of Interest (RoI) from a single 360° image usin… ▽ More 360° images are informative -- it contains omnidirectional visual information around the camera. However, the areas that cover a 360° image is much larger than the human's field of view, therefore important information in different view directions is easily overlooked. To tackle this issue, we propose a method for predicting the optimal set of Region of Interest (RoI) from a single 360° image using the visual saliency as a clue. To deal with the scarce, strongly biased training data of existing single 360° image saliency prediction dataset, we also propose a data augmentation method based on the spherical random data rotation. From the predicted saliency map and redundant candidate regions, we obtain the optimal set of RoIs considering both the saliency within a region and the Interaction-Over-Union (IoU) between regions. We conduct the subjective evaluation to show that the proposed method can select regions that properly summarize the input 360° image. △ Less

Submitted 8 September, 2022; originally announced September 2022.

Journal ref: in IEEE Access, vol. 10, pp. 89124-89133, 2022

arXiv:2206.02452 [pdf, other]

Universal Photometric Stereo Network using Global Lighting Contexts

Authors: Satoshi Ikehata

Abstract: This paper tackles a new photometric stereo task, named universal photometric stereo. Unlike existing tasks that assumed specific physical lighting models; hence, drastically limited their usability, a solution algorithm of this task is supposed to work for objects with diverse shapes and materials under arbitrary lighting variations without assuming any specific models. To solve this extremely ch… ▽ More This paper tackles a new photometric stereo task, named universal photometric stereo. Unlike existing tasks that assumed specific physical lighting models; hence, drastically limited their usability, a solution algorithm of this task is supposed to work for objects with diverse shapes and materials under arbitrary lighting variations without assuming any specific models. To solve this extremely challenging task, we present a purely data-driven method, which eliminates the prior assumption of lighting by replacing the recovery of physical lighting parameters with the extraction of the generic lighting representation, named global lighting contexts. We use them like lighting parameters in a calibrated photometric stereo network to recover surface normal vectors pixelwisely. To adapt our network to a wide variety of shapes, materials and lightings, it is trained on a new synthetic dataset which simulates the appearance of objects in the wild. Our method is compared with other state-of-the-art uncalibrated photometric stereo methods on our test data to demonstrate the significance of our method. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: Accepted to CVPR2022. Code and Dataset at https://satoshi-ikehata.github.io/cvpr2022/univps_cvpr2022.html

arXiv:2204.04634 [pdf, other]

Intersection Prediction from Single 360° Image via Deep Detection of Possible Direction of Travel

Authors: Naoki Sugimoto, Satoshi Ikehata, Kiyoharu Aizawa

Abstract: Movie-Map, an interactive first-person-view map that engages the user in a simulated walking experience, comprises short 360° video segments separated by traffic intersections that are seamlessly connected according to the viewer's direction of travel. However, in wide urban-scale areas with numerous intersecting roads, manual intersection segmentation requires significant human effort. Therefore,… ▽ More Movie-Map, an interactive first-person-view map that engages the user in a simulated walking experience, comprises short 360° video segments separated by traffic intersections that are seamlessly connected according to the viewer's direction of travel. However, in wide urban-scale areas with numerous intersecting roads, manual intersection segmentation requires significant human effort. Therefore, automatic identification of intersections from 360° videos is an important problem for scaling up Movie-Map. In this paper, we propose a novel method that identifies an intersection from individual frames in 360° videos. Instead of formulating the intersection identification as a standard binary classification task with a 360° image as input, we identify an intersection based on the number of the possible directions of travel (PDoT) in perspective images projected in eight directions from a single 360° image detected by the neural network for handling various types of intersections. We constructed a large-scale 360° Image Intersection Identification (iii360) dataset for training and evaluation where 360° videos were collected from various areas such as school campus, downtown, suburb, and china town and demonstrate that our PDoT-based method achieves 88\% accuracy, which is significantly better than that achieved by the direct naive binary classification based method. The source codes and a partial dataset will be shared in the community after the paper is published. △ Less

Submitted 10 April, 2022; originally announced April 2022.

Comments: Accepted for publication in BMVC

arXiv:2202.03176 [pdf, other]

Field-of-View IoU for Object Detection in 360° Images

Authors: Miao Cao, Satoshi Ikehata, Kiyoharu Aizawa

Abstract: 360° cameras have gained popularity over the last few years. In this paper, we propose two fundamental techniques -- Field-of-View IoU (FoV-IoU) and 360Augmentation for object detection in 360° images. Although most object detection neural networks designed for the perspective images are applicable to 360° images in equirectangular projection (ERP) format, their performance deteriorates owing to t… ▽ More 360° cameras have gained popularity over the last few years. In this paper, we propose two fundamental techniques -- Field-of-View IoU (FoV-IoU) and 360Augmentation for object detection in 360° images. Although most object detection neural networks designed for the perspective images are applicable to 360° images in equirectangular projection (ERP) format, their performance deteriorates owing to the distortion in ERP images. Our method can be readily integrated with existing perspective object detectors and significantly improves the performance. The FoV-IoU computes the intersection-over-union of two Field-of-View bounding boxes in a spherical image which could be used for training, inference, and evaluation while 360Augmentation is a data augmentation technique specific to 360° object detection task which randomly rotates a spherical image and solves the bias due to the sphere-to-plane projection. We conduct extensive experiments on the 360indoor dataset with different types of perspective object detectors and show the consistent effectiveness of our method. △ Less

Submitted 22 September, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

arXiv:1808.10093 [pdf, other]

CNN-PS: CNN-based Photometric Stereo for General Non-Convex Surfaces

Authors: Satoshi Ikehata

Abstract: Most conventional photometric stereo algorithms inversely solve a BRDF-based image formation model. However, the actual imaging process is often far more complex due to the global light transport on the non-convex surfaces. This paper presents a photometric stereo network that directly learns relationships between the photometric stereo input and surface normals of a scene. For handling unordered,… ▽ More Most conventional photometric stereo algorithms inversely solve a BRDF-based image formation model. However, the actual imaging process is often far more complex due to the global light transport on the non-convex surfaces. This paper presents a photometric stereo network that directly learns relationships between the photometric stereo input and surface normals of a scene. For handling unordered, arbitrary number of input images, we merge all the input data to the intermediate representation called {\it observation map} that has a fixed shape, is able to be fed into a CNN. To improve both training and prediction, we take into account the rotational pseudo-invariance of the observation map that is derived from the isotropic constraint. For training the network, we create a synthetic photometric stereo dataset that is generated by a physics-based renderer, therefore the global light transport is considered. Our experimental results on both synthetic and real datasets show that our method outperforms conventional BRDF-based photometric stereo algorithms especially when scenes are highly non-convex. △ Less

Submitted 29 August, 2018; originally announced August 2018.

Comments: Accepted in ECCV 2018 (ECCV2018). Source code and supplementary are available at https://github.com/satoshi-ikehata/CNN-PS

arXiv:1808.08544 [pdf, other]

Scale Drift Correction of Camera Geo-Localization using Geo-Tagged Images

Authors: Kazuya Iwami, Satoshi Ikehata, Kiyoharu Aizawa

Abstract: Camera geo-localization from a monocular video is a fundamental task for video analysis and autonomous navigation. Although 3D reconstruction is a key technique to obtain camera poses, monocular 3D reconstruction in a large environment tends to result in the accumulation of errors in rotation, translation, and especially in scale: a problem known as scale drift. To overcome these errors, we propos… ▽ More Camera geo-localization from a monocular video is a fundamental task for video analysis and autonomous navigation. Although 3D reconstruction is a key technique to obtain camera poses, monocular 3D reconstruction in a large environment tends to result in the accumulation of errors in rotation, translation, and especially in scale: a problem known as scale drift. To overcome these errors, we propose a novel framework that integrates incremental structure from motion (SfM) and a scale drift correction method utilizing geo-tagged images, such as those provided by Google Street View. Our correction method begins by obtaining sparse 6-DoF correspondences between the reconstructed 3D map coordinate system and the world coordinate system, by using geo-tagged images. Then, it corrects scale drift by applying pose graph optimization over Sim(3) constraints and bundle adjustment. Experimental evaluations on large-scale datasets show that the proposed framework not only sufficiently corrects scale drift, but also achieves accurate geo-localization in a kilometer-scale environment. △ Less

Submitted 26 August, 2018; originally announced August 2018.

Comments: ECCV Workshop CVRSUAD

arXiv:1612.01256 [pdf, other]

Panoramic Structure from Motion via Geometric Relationship Detection

Authors: Satoshi Ikehata, Ivaylo Boyadzhiev, Qi Shan, Yasutaka Furukawa

Abstract: This paper addresses the problem of Structure from Motion (SfM) for indoor panoramic image streams, extremely challenging even for the state-of-the-art due to the lack of textures and minimal parallax. The key idea is the fusion of single-view and multi-view reconstruction techniques via geometric relationship detection (e.g., detecting 2D lines as coplanar in 3D). Rough geometry suffices to perfo… ▽ More This paper addresses the problem of Structure from Motion (SfM) for indoor panoramic image streams, extremely challenging even for the state-of-the-art due to the lack of textures and minimal parallax. The key idea is the fusion of single-view and multi-view reconstruction techniques via geometric relationship detection (e.g., detecting 2D lines as coplanar in 3D). Rough geometry suffices to perform such detection, and our approach utilizes rough surface normal estimates from an image-to-normal deep network to discover geometric relationships among lines. The detected relationships provide exact geometric constraints in our line-based linear SfM formulation. A constrained linear least squares is used to reconstruct a 3D model and camera motions, followed by the bundle adjustment. We have validated our algorithm on challenging datasets, outperforming various state-of-the-art reconstruction techniques. △ Less

Submitted 5 December, 2016; originally announced December 2016.

Showing 1–12 of 12 results for author: Ikehata, S