Search | arXiv e-print repository

arXiv:2405.12006 [pdf, other]

Depth Reconstruction with Neural Signed Distance Fields in Structured Light Systems

Authors: Rukun Qiao, Hiroshi Kawasaki, Hongbin Zha

Abstract: We introduce a novel depth estimation technique for multi-frame structured light setups using neural implicit representations of 3D space. Our approach employs a neural signed distance field (SDF), trained through self-supervised differentiable rendering. Unlike passive vision, where joint estimation of radiance and geometry fields is necessary, we capitalize on known radiance fields from projecte… ▽ More We introduce a novel depth estimation technique for multi-frame structured light setups using neural implicit representations of 3D space. Our approach employs a neural signed distance field (SDF), trained through self-supervised differentiable rendering. Unlike passive vision, where joint estimation of radiance and geometry fields is necessary, we capitalize on known radiance fields from projected patterns in structured light systems. This enables isolated optimization of the geometry field, ensuring convergence and network efficacy with fixed device positioning. To enhance geometric fidelity, we incorporate an additional color loss based on object surfaces during training. Real-world experiments demonstrate our method's superiority in geometric performance for few-shot scenarios, while achieving comparable results with increased pattern availability. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 10 pages, 8 figures, accepted by 3DV 2024

arXiv:2310.08934 [pdf, other]

Online Adaptive Disparity Estimation for Dynamic Scenes in Structured Light Systems

Authors: Rukun Qiao, Hiroshi Kawasaki, Hongbin Zha

Abstract: In recent years, deep neural networks have shown remarkable progress in dense disparity estimation from dynamic scenes in monocular structured light systems. However, their performance significantly drops when applied in unseen environments. To address this issue, self-supervised online adaptation has been proposed as a solution to bridge this performance gap. Unlike traditional fine-tuning proces… ▽ More In recent years, deep neural networks have shown remarkable progress in dense disparity estimation from dynamic scenes in monocular structured light systems. However, their performance significantly drops when applied in unseen environments. To address this issue, self-supervised online adaptation has been proposed as a solution to bridge this performance gap. Unlike traditional fine-tuning processes, online adaptation performs test-time optimization to adapt networks to new domains. Therefore, achieving fast convergence during the adaptation process is critical for attaining satisfactory accuracy. In this paper, we propose an unsupervised loss function based on long sequential inputs. It ensures better gradient directions and faster convergence. Our loss function is designed using a multi-frame pattern flow, which comprises a set of sparse trajectories of the projected pattern along the sequence. We estimate the sparse pseudo ground truth with a confidence mask using a filter-based method, which guides the online adaptation process. Our proposed framework significantly improves the online adaptation speed and achieves superior performance on unseen data. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: Accpeted by 36th IEEE/RSJ International Conference on Intelligent Robots and Systems, 2023

arXiv:2310.08932 [pdf, other]

doi 10.1109/LRA.2022.3150029

TIDE: Temporally Incremental Disparity Estimation via Pattern Flow in Structured Light System

Authors: Rukun Qiao, Hiroshi Kawasaki, Hongbin Zha

Abstract: We introduced Temporally Incremental Disparity Estimation Network (TIDE-Net), a learning-based technique for disparity computation in mono-camera structured light systems. In our hardware setting, a static pattern is projected onto a dynamic scene and captured by a monocular camera. Different from most former disparity estimation methods that operate in a frame-wise manner, our network acquires di… ▽ More We introduced Temporally Incremental Disparity Estimation Network (TIDE-Net), a learning-based technique for disparity computation in mono-camera structured light systems. In our hardware setting, a static pattern is projected onto a dynamic scene and captured by a monocular camera. Different from most former disparity estimation methods that operate in a frame-wise manner, our network acquires disparity maps in a temporally incremental way. Specifically, We exploit the deformation of projected patterns (named pattern flow ) on captured image sequences, to model the temporal information. Notably, this newly proposed pattern flow formulation reflects the disparity changes along the epipolar line, which is a special form of optical flow. Tailored for pattern flow, the TIDE-Net, a recurrent architecture, is proposed and implemented. For each incoming frame, our model fuses correlation volumes (from current frame) and disparity (from former frame) warped by pattern flow. From fused features, the final stage of TIDE-Net estimates the residual disparity rather than the full disparity, as conducted by many previous methods. Interestingly, this design brings clear empirical advantages in terms of efficiency and generalization ability. Using only synthetic data for training, our extensitve evaluation results (w.r.t. both accuracy and efficienty metrics) show superior performance than several SOTA models on unseen real data. The code is available on https://github.com/CodePointer/TIDENet. △ Less

Submitted 13 October, 2023; originally announced October 2023.

ACM Class: I.4.5; I.5.1

Journal ref: IEEE Robotics and Automation Letters ( Volume: 7, Issue: 2, April 2022). pp 5111 - 5118

arXiv:2309.16162 [pdf, other]

ACT2G: Attention-based Contrastive Learning for Text-to-Gesture Generation

Authors: Hitoshi Teshima, Naoki Wake, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi

Abstract: Recent increase of remote-work, online meeting and tele-operation task makes people find that gesture for avatars and communication robots is more important than we have thought. It is one of the key factors to achieve smooth and natural communication between humans and AI systems and has been intensively researched. Current gesture generation methods are mostly based on deep neural network using… ▽ More Recent increase of remote-work, online meeting and tele-operation task makes people find that gesture for avatars and communication robots is more important than we have thought. It is one of the key factors to achieve smooth and natural communication between humans and AI systems and has been intensively researched. Current gesture generation methods are mostly based on deep neural network using text, audio and other information as the input, however, they generate gestures mainly based on audio, which is called a beat gesture. Although the ratio of the beat gesture is more than 70% of actual human gestures, content based gestures sometimes play an important role to make avatars more realistic and human-like. In this paper, we propose a attention-based contrastive learning for text-to-gesture (ACT2G), where generated gestures represent content of the text by estimating attention weight for each word from the input text. In the method, since text and gesture features calculated by the attention weight are mapped to the same latent space by contrastive learning, once text is given as input, the network outputs a feature vector which can be used to generate gestures related to the content. User study confirmed that the gestures generated by ACT2G were better than existing methods. In addition, it was demonstrated that wide variation of gestures were generated from the same text by changing attention weights by creators. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.14824 [pdf, other]

Generalization of pixel-wise phase estimation by CNN and improvement of phase-unwrap** by MRF optimization for one-shot 3D scan

Authors: Hiroto Harada, Michihiro Mikamo, Ryo Furukawa, Ryushuke Sagawa, Hiroshi Kawasaki

Abstract: Active stereo technique using single pattern projection, a.k.a. one-shot 3D scan, have drawn a wide attention from industry, medical purposes, etc. One severe drawback of one-shot 3D scan is sparse reconstruction. In addition, since spatial pattern becomes complicated for the purpose of efficient embedding, it is easily affected by noise, which results in unstable decoding. To solve the problems,… ▽ More Active stereo technique using single pattern projection, a.k.a. one-shot 3D scan, have drawn a wide attention from industry, medical purposes, etc. One severe drawback of one-shot 3D scan is sparse reconstruction. In addition, since spatial pattern becomes complicated for the purpose of efficient embedding, it is easily affected by noise, which results in unstable decoding. To solve the problems, we propose a pixel-wise interpolation technique for one-shot scan, which is applicable to any types of static pattern if the pattern is regular and periodic. This is achieved by U-net which is pre-trained by CG with efficient data augmentation algorithm. In the paper, to further overcome the decoding instability, we propose a robust correspondence finding algorithm based on Markov random field (MRF) optimization. We also propose a shape refinement algorithm based on b-spline and Gaussian kernel interpolation using explicitly detected laser curves. Experiments are conducted to show the effectiveness of the proposed method using real data with strong noises and textures. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: MVA2023

arXiv:2309.03445 [pdf, other]

Underwater Image Enhancement by Transformer-based Diffusion Model with Non-uniform Sampling for Skip Strategy

Authors: Yi Tang, Takafumi Iwaguchi, Hiroshi Kawasaki

Abstract: In this paper, we present an approach to image enhancement with diffusion model in underwater scenes. Our method adapts conditional denoising diffusion probabilistic models to generate the corresponding enhanced images by using the underwater images and the Gaussian noise as the inputs. Additionally, in order to improve the efficiency of the reverse process in the diffusion model, we adopt two dif… ▽ More In this paper, we present an approach to image enhancement with diffusion model in underwater scenes. Our method adapts conditional denoising diffusion probabilistic models to generate the corresponding enhanced images by using the underwater images and the Gaussian noise as the inputs. Additionally, in order to improve the efficiency of the reverse process in the diffusion model, we adopt two different ways. We firstly propose a lightweight transformer-based denoising network, which can effectively promote the time of network forward per iteration. On the other hand, we introduce a skip sampling strategy to reduce the number of iterations. Besides, based on the skip sampling strategy, we propose two different non-uniform sampling methods for the sequence of the time step, namely piecewise sampling and searching with the evolutionary algorithm. Both of them are effective and can further improve performance by using the same steps against the previous uniform sampling. In the end, we conduct a relative evaluation of the widely used underwater enhancement datasets between the recent state-of-the-art methods and the proposed approach. The experimental results prove that our approach can achieve both competitive performance and high efficiency. Our code is available at \href{mailto:https://github.com/piggy2009/DM_underwater}{\color{blue}{https://github.com/piggy2009/DM\_underwater}}. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2308.13748 [pdf, ps, other]

This paper presents a new application of Borsuk-Ulam's theorem to nonlinear programming

Authors: Hidefumi Kawasaki

Abstract: Borsuk-Ulam's theorem is a useful tool of algebraic topology. It states that for any continuous map** $f$ from the $n$-sphere to the $n$-dimensional Euclidean space, there exists a pair of antipodal points such that $f(x)=f(-x)$. As for its applications, ham-sandwich theorem, necklace theorem and coloring of Kneser graph by Lovász are well-known. This paper attempts to apply Borsuk-Ulam's theore… ▽ More Borsuk-Ulam's theorem is a useful tool of algebraic topology. It states that for any continuous map** $f$ from the $n$-sphere to the $n$-dimensional Euclidean space, there exists a pair of antipodal points such that $f(x)=f(-x)$. As for its applications, ham-sandwich theorem, necklace theorem and coloring of Kneser graph by Lovász are well-known. This paper attempts to apply Borsuk-Ulam's theorem to nonlinear programming. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 9 pages, 1 figure

MSC Class: 90C31; 55M20

arXiv:2210.06790 [pdf, other]

Deep Gesture Generation for Social Robots Using Type-Specific Libraries

Authors: Hitoshi Teshima, Naoki Wake, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi

Abstract: Body language such as conversational gesture is a powerful way to ease communication. Conversational gestures do not only make a speech more lively but also contain semantic meaning that helps to stress important information in the discussion. In the field of robotics, giving conversational agents (humanoid robots or virtual avatars) the ability to properly use gestures is critical, yet remain a t… ▽ More Body language such as conversational gesture is a powerful way to ease communication. Conversational gestures do not only make a speech more lively but also contain semantic meaning that helps to stress important information in the discussion. In the field of robotics, giving conversational agents (humanoid robots or virtual avatars) the ability to properly use gestures is critical, yet remain a task of extraordinary difficulty. This is because given only a text as input, there are many possibilities and ambiguities to generate an appropriate gesture. Different to previous works we propose a new method that explicitly takes into account the gesture types to reduce these ambiguities and generate human-like conversational gestures. Key to our proposed system is a new gesture database built on the TED dataset that allows us to map a word to one of three types of gestures: "Imagistic" gestures, which express the content of the speech, "Beat" gestures, which emphasize words, and "No gestures." We propose a system that first maps the words in the input text to their corresponding gesture type, generate type-specific gestures and combine the generated gestures into one final smooth gesture. In our comparative experiments, the effectiveness of the proposed method was confirmed in user studies for both avatar and humanoid robot. △ Less

Submitted 13 October, 2022; originally announced October 2022.

arXiv:2210.02038 [pdf, other]

MOTSLAM: MOT-assisted monocular dynamic SLAM using single-view depth estimation

Authors: Hanwei Zhang, Hideaki Uchiyama, Shintaro Ono, Hiroshi Kawasaki

Abstract: Visual SLAM systems targeting static scenes have been developed with satisfactory accuracy and robustness. Dynamic 3D object tracking has then become a significant capability in visual SLAM with the requirement of understanding dynamic surroundings in various scenarios including autonomous driving, augmented and virtual reality. However, performing dynamic SLAM solely with monocular images remains… ▽ More Visual SLAM systems targeting static scenes have been developed with satisfactory accuracy and robustness. Dynamic 3D object tracking has then become a significant capability in visual SLAM with the requirement of understanding dynamic surroundings in various scenarios including autonomous driving, augmented and virtual reality. However, performing dynamic SLAM solely with monocular images remains a challenging problem due to the difficulty of associating dynamic features and estimating their positions. In this paper, we present MOTSLAM, a dynamic visual SLAM system with the monocular configuration that tracks both poses and bounding boxes of dynamic objects. MOTSLAM first performs multiple object tracking (MOT) with associated both 2D and 3D bounding box detection to create initial 3D objects. Then, neural-network-based monocular depth estimation is applied to fetch the depth of dynamic features. Finally, camera poses, object poses, and both static, as well as dynamic map points, are jointly optimized using a novel bundle adjustment. Our experiments on the KITTI dataset demonstrate that our system has reached best performance on both camera ego-motion and object tracking on monocular dynamic SLAM. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2109.10524 [pdf, other]

doi 10.1109/ICIP42928.2021.9506443

A Method For Adding Motion-Blur on Arbitrary Objects By using Auto-Segmentation and Color Compensation Techniques

Authors: Michihiro Mikamo, Ryo Furukawa, Hiroshi Kawasaki

Abstract: When dynamic objects are captured by a camera, motion blur inevitably occurs. Such a blur is sometimes considered as just a noise, however, it sometimes gives an important effect to add dynamism in the scene for photographs or videos. Unlike the similar effects, such as defocus blur, which is now easily controlled even by smartphones, motion blur is still uncontrollable and makes undesired effects… ▽ More When dynamic objects are captured by a camera, motion blur inevitably occurs. Such a blur is sometimes considered as just a noise, however, it sometimes gives an important effect to add dynamism in the scene for photographs or videos. Unlike the similar effects, such as defocus blur, which is now easily controlled even by smartphones, motion blur is still uncontrollable and makes undesired effects on photographs. In this paper, an unified framework to add motion blur on per-object basis is proposed. In the method, multiple frames are captured without motion blur and they are accumulated to create motion blur on target objects. To capture images without motion blur, shutter speed must be short, however, it makes captured images dark, and thus, a sensor gain should be increased to compensate it. Since a sensor gain causes a severe noise on image, we propose a color compensation algorithm based on non-linear filtering technique for solution. Another contribution is that our technique can be used to make HDR images for fast moving objects by using multi-exposure images. In the experiments, effectiveness of the method is confirmed by ablation study using several data sets. △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: This paper was accepted at ICIP 2021

Journal ref: 2021 IEEE International Conference on Image Processing (ICIP)

arXiv:2108.02937 [pdf, other]

High-frequency shape recovery from shading by CNN and domain adaptation

Authors: Kodai Tokieda, Takafumi Iwaguchi, Hiroshi Kawasaki

Abstract: Importance of structured-light based one-shot scanning technique is increasing because of its simple system configuration and ability of capturing moving objects. One severe limitation of the technique is that it can capture only sparse shape, but not high frequency shapes, because certain area of projection pattern is required to encode spatial information. In this paper, we propose a technique t… ▽ More Importance of structured-light based one-shot scanning technique is increasing because of its simple system configuration and ability of capturing moving objects. One severe limitation of the technique is that it can capture only sparse shape, but not high frequency shapes, because certain area of projection pattern is required to encode spatial information. In this paper, we propose a technique to recover high-frequency shapes by using shading information, which is captured by one-shot RGB-D sensor based on structured light with single camera. Since color image comprises shading information of object surface, high-frequency shapes can be recovered by shape from shading techniques. Although multiple images with different lighting positions are required for shape from shading techniques, we propose a learning based approach to recover shape from a single image. In addition, to overcome the problem of preparing sufficient amount of data for training, we propose a new data augmentation method for high-frequency shapes using synthetic data and domain adaptation. Experimental results are shown to confirm the effectiveness of the proposed method. △ Less

Submitted 6 August, 2021; originally announced August 2021.

arXiv:2107.03000 [pdf, other]

PoseRN: A 2D pose refinement network for bias-free multi-view 3D human pose estimation

Authors: Akihiko Sayo, Diego Thomas, Hiroshi Kawasaki, Yuta Nakashima, Katsushi Ikeuchi

Abstract: We propose a new 2D pose refinement network that learns to predict the human bias in the estimated 2D pose. There are biases in 2D pose estimations that are due to differences between annotations of 2D joint locations based on annotators' perception and those defined by motion capture (MoCap) systems. These biases are crafted into publicly available 2D pose datasets and cannot be removed with exis… ▽ More We propose a new 2D pose refinement network that learns to predict the human bias in the estimated 2D pose. There are biases in 2D pose estimations that are due to differences between annotations of 2D joint locations based on annotators' perception and those defined by motion capture (MoCap) systems. These biases are crafted into publicly available 2D pose datasets and cannot be removed with existing error reduction approaches. Our proposed pose refinement network allows us to efficiently remove the human bias in the estimated 2D poses and achieve highly accurate multi-view 3D human pose estimation. △ Less

Submitted 6 July, 2021; originally announced July 2021.

arXiv:2011.00174 [pdf, other]

Dense Pixel-wise Micro-motion Estimation of Object Surface by using Low Dimensional Embedding of Laser Speckle Pattern

Authors: Ryusuke Sagawa, Yusuke Higuchi, Hiroshi Kawasaki, Ryo Furukawa, Takahiro Ito

Abstract: This paper proposes a method of estimating micro-motion of an object at each pixel that is too small to detect under a common setup of camera and illumination. The method introduces an active-lighting approach to make the motion visually detectable. The approach is based on speckle pattern, which is produced by the mutual interference of laser light on object's surface and continuously changes its… ▽ More This paper proposes a method of estimating micro-motion of an object at each pixel that is too small to detect under a common setup of camera and illumination. The method introduces an active-lighting approach to make the motion visually detectable. The approach is based on speckle pattern, which is produced by the mutual interference of laser light on object's surface and continuously changes its appearance according to the out-of-plane motion of the surface. In addition, speckle pattern becomes uncorrelated with large motion. To compensate such micro- and large motion, the method estimates the motion parameters up to scale at each pixel by nonlinear embedding of the speckle pattern into low-dimensional space. The out-of-plane motion is calculated by making the motion parameters spatially consistent across the image. In the experiments, the proposed method is compared with other measuring devices to prove the effectiveness of the method. △ Less

Submitted 30 October, 2020; originally announced November 2020.

Comments: to be published in ACCV2020

arXiv:1909.03583 [pdf, other]

Unified Underwater Structure-from-Motion

Authors: Kazuto Ichimaru, Yuichi Taguchi, Hiroshi Kawasaki

Abstract: This paper shows that accurate underwater 3D shape reconstruction is possible using a single camera, observing a target through a refractive interface. We provide unified reconstruction techniques for a variety of scenarios such as single static camera and moving refractive interface, single moving camera and static refractive interface, and single moving camera and moving refractive interface. In… ▽ More This paper shows that accurate underwater 3D shape reconstruction is possible using a single camera, observing a target through a refractive interface. We provide unified reconstruction techniques for a variety of scenarios such as single static camera and moving refractive interface, single moving camera and static refractive interface, and single moving camera and moving refractive interface. In our basic setup, we assume that the refractive interface is planar, and simultaneously estimate the unknown transformations of the planar interface and the camera, and the unknown target shape using bundle adjustment. We also extend it to relax the planarity assumption, which enables us to use waves of the refractive interface for the reconstruction task. Experiments with real data show the superiority of our method to existing methods. △ Less

Submitted 8 September, 2019; originally announced September 2019.

Comments: Accepted in International Conference on 3D Vision (3DV 2019)

arXiv:1905.09588 [pdf, other]

Underwater Stereo using Refraction-free Image Synthesized from Light Field Camera

Authors: Kazuto Ichimaru, Hiroshi Kawasaki

Abstract: There is a strong demand on capturing underwater scenes without distortions caused by refraction. Since a light field camera can capture several light rays at each point of an image plane from various directions, if geometrically correct rays are chosen, it is possible to synthesize a refraction-free image. In this paper, we propose a novel technique to efficiently select such rays to synthesize a… ▽ More There is a strong demand on capturing underwater scenes without distortions caused by refraction. Since a light field camera can capture several light rays at each point of an image plane from various directions, if geometrically correct rays are chosen, it is possible to synthesize a refraction-free image. In this paper, we propose a novel technique to efficiently select such rays to synthesize a refraction-free image from an underwater image captured by a light field camera. In addition, we propose a stereo technique to reconstruct 3D shapes using a pair of our refraction-free images, which are central projection. In the experiment, we captured several underwater scenes by two light field cameras, synthesized refraction free images and applied stereo technique to reconstruct 3D shapes. The results are compared with previous techniques which are based on approximation, showing the strength of our method. △ Less

Submitted 23 May, 2019; originally announced May 2019.

Comments: Accepted in 2019 IEEE International Conference on Image Processing (ICIP)

arXiv:1811.09675 [pdf, other]

CNN based dense underwater 3D scene reconstruction by transfer learning using bubble database

Authors: Kazuto Ichimaru, Ryo Furukawa, Hiroshi Kawasaki

Abstract: Dense 3D shape acquisition of swimming human or live fish is an important research topic for sports, biological science and so on. For this purpose, active stereo sensor is usually used in the air, however it cannot be applied to the underwater environment because of refraction, strong light attenuation and severe interference of bubbles. Passive stereo is a simple solution for capturing dynamic s… ▽ More Dense 3D shape acquisition of swimming human or live fish is an important research topic for sports, biological science and so on. For this purpose, active stereo sensor is usually used in the air, however it cannot be applied to the underwater environment because of refraction, strong light attenuation and severe interference of bubbles. Passive stereo is a simple solution for capturing dynamic scenes at underwater environment, however the shape with textureless surfaces or irregular reflections cannot be recovered. Recently, the stereo camera pair with a pattern projector for adding artificial textures on the objects is proposed. However, to use the system for underwater environment, several problems should be compensated, i.e., disturbance by fluctuation and bubbles. Simple solution is to use convolutional neural network for stereo to cancel the effects of bubbles and/or water fluctuation. Since it is not easy to train CNN with small size of database with large variation, we develop a special bubble generation device to efficiently create real bubble database of multiple size and density. In addition, we propose a transfer learning technique for multi-scale CNN to effectively remove bubbles and projected-patterns on the object. Further, we develop a real system and actually captured live swimming human, which has not been done before. Experiments are conducted to show the effectiveness of our method compared with the state of the art techniques. △ Less

Submitted 20 November, 2018; originally announced November 2018.

Comments: IEEE Winter Conference on Applications of Computer Vision. arXiv admin note: text overlap with arXiv:1808.08348

arXiv:1808.08348 [pdf, other]

Multi-scale CNN stereo and pattern removal technique for underwater active stereo system

Authors: Kazuto Ichimaru, Ryo Furukawa, Hiroshi Kawasaki

Abstract: Demands on capturing dynamic scenes of underwater environments are rapidly growing. Passive stereo is applicable to capture dynamic scenes, however the shape with textureless surfaces or irregular reflections cannot be recovered by the technique. In our system, we add a pattern projector to the stereo camera pair so that artificial textures are augmented on the objects. To use the system at underw… ▽ More Demands on capturing dynamic scenes of underwater environments are rapidly growing. Passive stereo is applicable to capture dynamic scenes, however the shape with textureless surfaces or irregular reflections cannot be recovered by the technique. In our system, we add a pattern projector to the stereo camera pair so that artificial textures are augmented on the objects. To use the system at underwater environments, several problems should be compensated, i.e., refraction, disturbance by fluctuation and bubbles. Further, since surface of the objects are interfered by the bubbles, projected patterns, etc., those noises and patterns should be removed from captured images to recover original texture. To solve these problems, we propose three approaches; a depth-dependent calibration, Convolutional Neural Network(CNN)-stereo method and CNN-based texture recovery method. A depth-dependent calibration is our analysis to find the acceptable depth range for approximation by center projection to find the certain target depth for calibration. In terms of CNN stereo, unlike common CNNbased stereo methods which do not consider strong disturbances like refraction or bubbles, we designed a novel CNN architecture for stereo matching using multi-scale information, which is intended to be robust against such disturbances. Finally, we propose a multi-scale method for bubble and a projected-pattern removal method using CNNs to recover original textures. Experimental results are shown to prove the effectiveness of our method compared with the state of the art techniques. Furthermore, reconstruction of a live swimming fish is demonstrated to confirm the feasibility of our techniques. △ Less

Submitted 24 August, 2018; originally announced August 2018.

Comments: International Conference on 3D Vision 2018

arXiv:1807.02632 [pdf, other]

Representing a Partially Observed Non-Rigid 3D Human Using Eigen-Texture and Eigen-Deformation

Authors: Ryosuke Kimura, Akihiko Sayo, Fabian Lorenzo Dayrit, Yuta Nakashima, Hiroshi Kawasaki, Ambrosio Blanco, Katsushi Ikeuchi

Abstract: Reconstruction of the shape and motion of humans from RGB-D is a challenging problem, receiving much attention in recent years. Recent approaches for full-body reconstruction use a statistic shape model, which is built upon accurate full-body scans of people in skin-tight clothes, to complete invisible parts due to occlusion. Such a statistic model may still be fit to an RGB-D measurement with loo… ▽ More Reconstruction of the shape and motion of humans from RGB-D is a challenging problem, receiving much attention in recent years. Recent approaches for full-body reconstruction use a statistic shape model, which is built upon accurate full-body scans of people in skin-tight clothes, to complete invisible parts due to occlusion. Such a statistic model may still be fit to an RGB-D measurement with loose clothes but cannot describe its deformations, such as clothing wrinkles. Observed surfaces may be reconstructed precisely from actual measurements, while we have no cues for unobserved surfaces. For full-body reconstruction with loose clothes, we propose to use lower dimensional embeddings of texture and deformation referred to as eigen-texturing and eigen-deformation, to reproduce views of even unobserved surfaces. Provided a full-body reconstruction from a sequence of partial measurements as 3D meshes, the texture and deformation of each triangle are then embedded using eigen-decomposition. Combined with neural-network-based coefficient regression, our method synthesizes the texture and deformation from arbitrary viewpoints. We evaluate our method using simulated data and visually demonstrate how our method works on real data. △ Less

Submitted 7 July, 2018; originally announced July 2018.

Comments: 6pages, accepted to ICPR

arXiv:1710.00517 [pdf, other]

Temporal shape super-resolution by intra-frame motion encoding using high-fps structured light

Authors: Yuki Shiba, Satoshi Ono, Ryo Furukawa, Shinsaku Hiura, Hiroshi Kawasaki

Abstract: One of the solutions of depth imaging of moving scene is to project a static pattern on the object and use just a single image for reconstruction. However, if the motion of the object is too fast with respect to the exposure time of the image sensor, patterns on the captured image are blurred and reconstruction fails. In this paper, we impose multiple projection patterns into each single captured… ▽ More One of the solutions of depth imaging of moving scene is to project a static pattern on the object and use just a single image for reconstruction. However, if the motion of the object is too fast with respect to the exposure time of the image sensor, patterns on the captured image are blurred and reconstruction fails. In this paper, we impose multiple projection patterns into each single captured image to realize temporal super resolution of the depth image sequences. With our method, multiple patterns are projected onto the object with higher fps than possible with a camera. In this case, the observed pattern varies depending on the depth and motion of the object, so we can extract temporal information of the scene from each single image. The decoding process is realized using a learning-based approach where no geometric calibration is needed. Experiments confirm the effectiveness of our method where sequential shapes are reconstructed from a single image. Both quantitative evaluations and comparisons with recent techniques were also conducted. △ Less

Submitted 2 October, 2017; originally announced October 2017.

Comments: 9 pages, Published at the International Conference on Computer Vision (ICCV 2017)

arXiv:1710.00513 [pdf, other]

Depth estimation using structured light flow -- analysis of projected pattern flow on an object's surface --

Authors: Ryo Furukawa, Ryusuke Sagawa, Hiroshi Kawasaki

Abstract: Shape reconstruction techniques using structured light have been widely researched and developed due to their robustness, high precision, and density. Because the techniques are based on decoding a pattern to find correspondences, it implicitly requires that the projected patterns be clearly captured by an image sensor, i.e., to avoid defocus and motion blur of the projected pattern. Although inte… ▽ More Shape reconstruction techniques using structured light have been widely researched and developed due to their robustness, high precision, and density. Because the techniques are based on decoding a pattern to find correspondences, it implicitly requires that the projected patterns be clearly captured by an image sensor, i.e., to avoid defocus and motion blur of the projected pattern. Although intensive researches have been conducted for solving defocus blur, few researches for motion blur and only solution is to capture with extremely fast shutter speed. In this paper, unlike the previous approaches, we actively utilize motion blur, which we refer to as a light flow, to estimate depth. Analysis reveals that minimum two light flows, which are retrieved from two projected patterns on the object, are required for depth estimation. To retrieve two light flows at the same time, two sets of parallel line patterns are illuminated from two video projectors and the size of motion blur of each line is precisely measured. By analyzing the light flows, i.e. lengths of the blurs, scene depth information is estimated. In the experiments, 3D shapes of fast moving objects, which are inevitably captured with motion blur, are successfully reconstructed by our technique. △ Less

Submitted 2 October, 2017; originally announced October 2017.

Comments: 9 pages, Published at the International Conference on Computer Vision (ICCV 2017)

arXiv:1705.10435 [pdf, other]

doi 10.1016/j.neuroimage.2018.02.033

The Bispectrum and Its Relationship to Phase-Amplitude Coupling

Authors: Christopher K. Kovach, Hiroyuki Oya, Hiroto Kawasaki

Abstract: Most biological signals are non-Gaussian, reflecting their origins in highly nonlinear physiological systems. A versatile set of techniques for studying non-Gaussian signals relies on the spectral representations of higher moments, known as polyspectra, which describe forms of cross-frequency dependence that do not arise in time-invariant Gaussian signals. The most commonly used of these employ th… ▽ More Most biological signals are non-Gaussian, reflecting their origins in highly nonlinear physiological systems. A versatile set of techniques for studying non-Gaussian signals relies on the spectral representations of higher moments, known as polyspectra, which describe forms of cross-frequency dependence that do not arise in time-invariant Gaussian signals. The most commonly used of these employ the bispectrum. Recently, other measures of cross-frequency dependence have drawn interest in EEG literature, in particular those which address phase-amplitude coupling (PAC). Here we demonstrate a close relationship between the bispectrum and popular measures of PAC, which we relate to smoothings of the signal bispectrum, making them fundamentally bispectral estimators. Viewed this way, however, conventional PAC measures exhibit some unfavorable qualities, including poor bias properties, lack of correct symmetry and artificial constraints on the spectral range and resolution of the estimate. Moreover, information obscured by smoothing in measures of PAC, but preserved in standard bispectral estimators, may be critical for distinguishing nested oscillations from transient signal features and other non-oscillatory causes of "spurious" PAC. We propose guidelines for gauging the nature and origin of cross-frequency coupling with bispectral statistics. Beyond clarifying the relationship between PAC and the bispectrum, the present work lays out a general framework for the interpretation of the bispectrum, which extends to other higher-order spectra. In particular, this framework holds promise for the detailed identification of signal features related to both nested oscillations and transient phenomena. We conclude with a discussion of some broader theoretical implications of this framework and highlight promising directions for future development. △ Less

Submitted 1 March, 2018; v1 submitted 29 May, 2017; originally announced May 2017.

Journal ref: NeuroImage 173, 2018, 518 - 539

arXiv:1609.02994 [pdf, other]

Simultaneous independent image display technique on multiple 3D objects

Authors: Takuto Hirukawa, Marco Visentini-Scarzanella, Hiroshi Kawasaki, Ryo Furukawa, Shinsaku Hiura

Abstract: We propose a new system to visualize depth-dependent patterns and images on solid objects with complex geometry using multiple projectors. The system, despite consisting of conventional passive LCD projectors, is able to project different images and patterns depending on the spatial location of the object. The technique is based on the simple principle that multiple patterns projected from multipl… ▽ More We propose a new system to visualize depth-dependent patterns and images on solid objects with complex geometry using multiple projectors. The system, despite consisting of conventional passive LCD projectors, is able to project different images and patterns depending on the spatial location of the object. The technique is based on the simple principle that multiple patterns projected from multiple projectors interfere constructively with each other when their patterns are projected on the same object. Previous techniques based on the same principle can only achieve 1) low resolution volume colorization or 2) high resolution images but only on a limited number of flat planes. In this paper, we discretize a 3D object into a number of 3D points so that high resolution images can be projected onto the complex shapes. We also propose a dynamic ranges expansion technique as well as an efficient optimization procedure based on epipolar constraints. Such technique can be used to the extend projection map** to have spatial dependency, which is desirable for practical applications. We also demonstrate the system potential as a visual instructor for object placement and assembling. Experiments prove the effectiveness of our method. △ Less

Submitted 9 September, 2016; originally announced September 2016.

Comments: Accepted to ACCV 2016

arXiv:nucl-ex/0309022 [pdf, ps, other]

doi 10.1103/PhysRevLett.92.062501

Anomalously hindered E2 strength B(E2;2_1^+ -> 0^+) in 16C

Authors: N. Imai, N. Aoi, N. Fukuda, T. Kishida, T. Kubo, T. Minemura, T. Motobayashi, S. Takeuchi, K. Yoneda, H. Watanabe, M. Ishihara, H. J. Ong, H. Sakurai, H. Iwasaki, T. K. Ohnishi, M. K. Suzuki, K. Demichi, H. Kawasaki, H. Baba, T. Gomi, H. Hasegawa, E. Kaneko, S. Kanno, K. Kurita, E. Takeshita , et al. (14 additional authors not shown)

Abstract: The electric quadrupole transition from the first 2+ state to the ground 0+ state in 16C is studied through measurement of the lifetime by a recoil shadow method applied to inelastically scattered radioactive 16C nuclei. The measured lifetime is 75 +- 23 ps, corresponding to a B(E2;2_1+ -> 0^+) value of 0.63 +- 0.19 e2fm4, or 0.26 +- 0.08 Weisskopf units. The transition strength is found to be a… ▽ More The electric quadrupole transition from the first 2+ state to the ground 0+ state in 16C is studied through measurement of the lifetime by a recoil shadow method applied to inelastically scattered radioactive 16C nuclei. The measured lifetime is 75 +- 23 ps, corresponding to a B(E2;2_1+ -> 0^+) value of 0.63 +- 0.19 e2fm4, or 0.26 +- 0.08 Weisskopf units. The transition strength is found to be anomalously small compared to the empirically predicted value. △ Less

Submitted 30 September, 2003; originally announced September 2003.

Comments: 4pages, 4figures, submitted to Physical Review Letters

Journal ref: Phys.Rev.Lett. 92 (2004) 062501

Showing 1–23 of 23 results for author: Kawasaki, H