Search | arXiv e-print repository

Quantum Curriculum Learning

Authors: Quoc Hoan Tran, Yasuhiro Endo, Hirotaka Oshima

Abstract: Quantum machine learning (QML) requires significant quantum resources to achieve quantum advantage. Research should prioritize both the efficient design of quantum architectures and the development of learning strategies to optimize resource usage. We propose a framework called quantum curriculum learning (Q-CurL) for quantum data, where the curriculum introduces simpler tasks or data to the learn… ▽ More Quantum machine learning (QML) requires significant quantum resources to achieve quantum advantage. Research should prioritize both the efficient design of quantum architectures and the development of learning strategies to optimize resource usage. We propose a framework called quantum curriculum learning (Q-CurL) for quantum data, where the curriculum introduces simpler tasks or data to the learning model before progressing to more challenging ones. We define the curriculum criteria based on the data density ratio between tasks to determine the curriculum order. We also implement a dynamic learning schedule to emphasize the significance of quantum data in optimizing the loss function. Empirical evidence shows that Q-CurL enhances the training convergence and the generalization for unitary learning tasks and improves the robustness of quantum phase recognition tasks. Our framework provides a general learning strategy, bringing QML closer to realizing practical advantages. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: main 5 pages, supplementary materials 6 pages

arXiv:2406.18316 [pdf, other]

Trade-off between Gradient Measurement Efficiency and Expressivity in Deep Quantum Neural Networks

Authors: Koki Chinzei, Shinichiro Yamano, Quoc Hoan Tran, Yasuhiro Endo, Hirotaka Oshima

Abstract: Quantum neural networks (QNNs) require an efficient training algorithm to achieve practical quantum advantages. A promising approach is the use of gradient-based optimization algorithms, where gradients are estimated through quantum measurements. However, it is generally difficult to efficiently measure gradients in QNNs because the quantum state collapses upon measurement. In this work, we prove… ▽ More Quantum neural networks (QNNs) require an efficient training algorithm to achieve practical quantum advantages. A promising approach is the use of gradient-based optimization algorithms, where gradients are estimated through quantum measurements. However, it is generally difficult to efficiently measure gradients in QNNs because the quantum state collapses upon measurement. In this work, we prove a general trade-off between gradient measurement efficiency and expressivity in a wide class of deep QNNs, elucidating the theoretical limits and possibilities of efficient gradient estimation. This trade-off implies that a more expressive QNN requires a higher measurement cost in gradient estimation, whereas we can increase gradient measurement efficiency by reducing the QNN expressivity to suit a given task. We further propose a general QNN ansatz called the stabilizer-logical product ansatz (SLPA), which can reach the upper limit of the trade-off inequality by leveraging the symmetric structure of the quantum circuit. In learning an unknown symmetric function, the SLPA drastically reduces the quantum resources required for training while maintaining accuracy and trainability compared to a well-designed symmetric circuit based on the parameter-shift method. Our results not only reveal a theoretical understanding of efficient training in QNNs but also provide a standard and broadly applicable efficient QNN design. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 32 pages, 11 figures

arXiv:2405.16443 [pdf, other]

3D View Optimization for Improving Image Aesthetics

Authors: Taichi Uchida, Yoshihiro Kanamori, Yuki Endo

Abstract: Achieving aesthetically pleasing photography necessitates attention to multiple factors, including composition and capture conditions, which pose challenges to novices. Prior research has explored the enhancement of photo aesthetics post-capture through 2D manipulation techniques; however, these approaches offer limited search space for aesthetics. We introduce a pioneering method that employs 3D… ▽ More Achieving aesthetically pleasing photography necessitates attention to multiple factors, including composition and capture conditions, which pose challenges to novices. Prior research has explored the enhancement of photo aesthetics post-capture through 2D manipulation techniques; however, these approaches offer limited search space for aesthetics. We introduce a pioneering method that employs 3D operations to simulate the conditions at the moment of capture retrospectively. Our approach extrapolates the input image and then reconstructs the 3D scene from the extrapolated image, followed by an optimization to identify camera parameters and image aspect ratios that yield the best 3D view with enhanced aesthetics. Comparative qualitative and quantitative assessments reveal that our method surpasses traditional 2D editing techniques with superior aesthetics. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 10 pages

arXiv:2403.17761 [pdf, other]

Makeup Prior Models for 3D Facial Makeup Estimation and Applications

Authors: Xingchao Yang, Takafumi Taketomi, Yuki Endo, Yoshihiro Kanamori

Abstract: In this work, we introduce two types of makeup prior models to extend existing 3D face prior models: PCA-based and StyleGAN2-based priors. The PCA-based prior model is a linear model that is easy to construct and is computationally efficient. However, it retains only low-frequency information. Conversely, the StyleGAN2-based model can represent high-frequency information with relatively higher com… ▽ More In this work, we introduce two types of makeup prior models to extend existing 3D face prior models: PCA-based and StyleGAN2-based priors. The PCA-based prior model is a linear model that is easy to construct and is computationally efficient. However, it retains only low-frequency information. Conversely, the StyleGAN2-based model can represent high-frequency information with relatively higher computational cost than the PCA-based model. Although there is a trade-off between the two models, both are applicable to 3D facial makeup estimation and related applications. By leveraging makeup prior models and designing a makeup consistency module, we effectively address the challenges that previous methods faced in robustly estimating makeup, particularly in the context of handling self-occluded faces. In experiments, we demonstrate that our approach reduces computational costs by several orders of magnitude, achieving speeds up to 180 times faster. In addition, by improving the accuracy of the estimated makeup, we confirm that our methods are highly advantageous for various 3D facial makeup applications such as 3D makeup face reconstruction, user-friendly makeup editing, makeup transfer, and interpolation. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: CVPR2024. Project: https://yangxingchao.github.io/makeup-priors-page

arXiv:2401.02804 [pdf, other]

DiffBody: Diffusion-based Pose and Shape Editing of Human Images

Authors: Yuta Okuyama, Yuki Endo, Yoshihiro Kanamori

Abstract: Pose and body shape editing in a human image has received increasing attention. However, current methods often struggle with dataset biases and deteriorate realism and the person's identity when users make large edits. We propose a one-shot approach that enables large edits with identity preservation. To enable large edits, we fit a 3D body model, project the input image onto the 3D model, and cha… ▽ More Pose and body shape editing in a human image has received increasing attention. However, current methods often struggle with dataset biases and deteriorate realism and the person's identity when users make large edits. We propose a one-shot approach that enables large edits with identity preservation. To enable large edits, we fit a 3D body model, project the input image onto the 3D model, and change the body's pose and shape. Because this initial textured body model has artifacts due to occlusion and the inaccurate body shape, the rendered image undergoes a diffusion-based refinement, in which strong noise destroys body structure and identity whereas insufficient noise does not help. We thus propose an iterative refinement with weak noise, applied first for the whole body and then for the face. We further enhance the realism by fine-tuning text embeddings via self-supervised learning. Our quantitative and qualitative evaluations demonstrate that our method outperforms other existing methods across various datasets. △ Less

Submitted 7 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: Accepted to WACV 2024, project page: https://www.cgg.cs.tsukuba.ac.jp/~okuyama/pub/diffbody/

arXiv:2312.08809 [pdf, ps, other]

doi 10.1162/neco_a_01628

Performance evaluation of matrix factorization for fMRI data

Authors: Yusuke Endo, Kou** Takeda

Abstract: In the study of the brain, there is a hypothesis that sparse coding is realized in information representation of external stimuli, which is experimentally confirmed for visual stimulus recently. However, unlike the specific functional region in the brain, sparse coding in information processing in the whole brain has not been clarified sufficiently. In this study, we investigate the validity of sp… ▽ More In the study of the brain, there is a hypothesis that sparse coding is realized in information representation of external stimuli, which is experimentally confirmed for visual stimulus recently. However, unlike the specific functional region in the brain, sparse coding in information processing in the whole brain has not been clarified sufficiently. In this study, we investigate the validity of sparse coding in the whole human brain by applying various matrix factorization methods to functional magnetic resonance imaging data of neural activities in the whole human brain. The result suggests sparse coding hypothesis in information representation in the whole human brain, because extracted features from sparse MF method, SparsePCA or MOD under high sparsity setting, or approximate sparse MF method, FastICA, can classify external visual stimuli more accurately than non-sparse MF method or sparse MF method under low sparsity setting. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 22 pages, 8 figures

Journal ref: Neural Computation (2024) 36 (1) 128-150

arXiv:2308.06027 [pdf, other]

Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation

Authors: Yuki Endo

Abstract: Text-to-image synthesis has achieved high-quality results with recent advances in diffusion models. However, text input alone has high spatial ambiguity and limited user controllability. Most existing methods allow spatial control through additional visual guidance (e.g., sketches and semantic masks) but require additional training with annotated images. In this paper, we propose a method for spat… ▽ More Text-to-image synthesis has achieved high-quality results with recent advances in diffusion models. However, text input alone has high spatial ambiguity and limited user controllability. Most existing methods allow spatial control through additional visual guidance (e.g., sketches and semantic masks) but require additional training with annotated images. In this paper, we propose a method for spatially controlling text-to-image generation without further training of diffusion models. Our method is based on the insight that the cross-attention maps reflect the positional relationship between words and pixels. Our aim is to control the attention maps according to given semantic masks and text prompts. To this end, we first explore a simple approach of directly swap** the cross-attention maps with constant maps computed from the semantic regions. Some prior works also allow training-free spatial control of text-to-image diffusion models by directly manipulating cross-attention maps. However, these approaches still suffer from misalignment to given masks because manipulated attention maps are far from actual ones learned by diffusion models. To address this issue, we propose masked-attention guidance, which can generate images more faithful to semantic masks via indirect control of attention to each word and pixel by manipulating noise images fed to diffusion models. Masked-attention guidance can be easily integrated into pre-trained off-the-shelf diffusion models (e.g., Stable Diffusion) and applied to the tasks of text-guided image editing. Experiments show that our method enables more accurate spatial control than baselines qualitatively and quantitatively. △ Less

Submitted 30 October, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

Comments: Accepted to The Visual Computer, code: https://github.com/endo-yuki-t/MAG

arXiv:2305.16759 [pdf, other]

StyleHumanCLIP: Text-guided Garment Manipulation for StyleGAN-Human

Authors: Takato Yoshikawa, Yuki Endo, Yoshihiro Kanamori

Abstract: This paper tackles text-guided control of StyleGAN for editing garments in full-body human images. Existing StyleGAN-based methods suffer from handling the rich diversity of garments and body shapes and poses. We propose a framework for text-guided full-body human image synthesis via an attention-based latent code mapper, which enables more disentangled control of StyleGAN than existing mappers. O… ▽ More This paper tackles text-guided control of StyleGAN for editing garments in full-body human images. Existing StyleGAN-based methods suffer from handling the rich diversity of garments and body shapes and poses. We propose a framework for text-guided full-body human image synthesis via an attention-based latent code mapper, which enables more disentangled control of StyleGAN than existing mappers. Our latent code mapper adopts an attention mechanism that adaptively manipulates individual latent codes on different StyleGAN layers under text guidance. In addition, we introduce feature-space masking at inference time to avoid unwanted changes caused by text inputs. Our quantitative and qualitative evaluations reveal that our method can control generated images more faithfully to given texts than existing methods. △ Less

Submitted 20 March, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: VISIAPP 2024, project page: https://www.cgg.cs.tsukuba.ac.jp/~yoshikawa/pub/style_human_clip/

arXiv:2208.12408 [pdf, other]

User-Controllable Latent Transformer for StyleGAN Image Layout Editing

Authors: Yuki Endo

Abstract: Latent space exploration is a technique that discovers interpretable latent directions and manipulates latent codes to edit various attributes in images generated by generative adversarial networks (GANs). However, in previous work, spatial control is limited to simple transformations (e.g., translation and rotation), and it is laborious to identify appropriate latent directions and adjust their p… ▽ More Latent space exploration is a technique that discovers interpretable latent directions and manipulates latent codes to edit various attributes in images generated by generative adversarial networks (GANs). However, in previous work, spatial control is limited to simple transformations (e.g., translation and rotation), and it is laborious to identify appropriate latent directions and adjust their parameters. In this paper, we tackle the problem of editing the StyleGAN image layout by annotating the image directly. To do so, we propose an interactive framework for manipulating latent codes in accordance with the user inputs. In our framework, the user annotates a StyleGAN image with locations they want to move or not and specifies a movement direction by mouse dragging. From these user inputs and initial latent codes, our latent transformer based on a transformer encoder-decoder architecture estimates the output latent codes, which are fed to the StyleGAN generator to obtain a result image. To train our latent transformer, we utilize synthetic data and pseudo-user inputs generated by off-the-shelf StyleGAN and optical flow models, without manual supervision. Quantitative and qualitative evaluations demonstrate the effectiveness of our method over existing methods. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: Accepted to Pacific Graphics 2022, project page: http://www.cgg.cs.tsukuba.ac.jp/~endo/projects/UserControllableLT

arXiv:2206.05433 [pdf, ps, other]

doi 10.1364/OE.460681

Gigahertz-rate random speckle projection for high-speed single-pixel image classification

Authors: **sei Hanawa, Tomoaki Niiyama, Yutaka Endo, Satoshi Sunada

Abstract: Imaging techniques based on single-pixel detection, such as ghost imaging, can reconstruct or recognize a target scene from multiple measurements using a sequence of random mask patterns. However, the processing speed is limited by the low rate of the pattern generation. In this study, we propose an ultrafast method for random speckle pattern generation, which has the potential to overcome the lim… ▽ More Imaging techniques based on single-pixel detection, such as ghost imaging, can reconstruct or recognize a target scene from multiple measurements using a sequence of random mask patterns. However, the processing speed is limited by the low rate of the pattern generation. In this study, we propose an ultrafast method for random speckle pattern generation, which has the potential to overcome the limited processing speed. The proposed approach is based on multimode fiber speckles induced by fast optical phase modulation. We experimentally demonstrate dynamic speckle projection with phase modulation at 10 GHz rates, which is five to six orders of magnitude higher than conventional modulation approaches using spatial light modulators. Moreover, we combine the proposed generation approach with a wavelength-division multiplexing technique and apply it for image classification. As a proof-of-concept demonstration, we show that 28x28-pixel images of digits acquired at GHz rates can be accurately classified using a simple neural network. The proposed approach opens a novel pathway for an all-optical image processor. △ Less

Submitted 11 June, 2022; originally announced June 2022.

Comments: 7 pages, 7 figures

Journal ref: Optics Express Vol. 30, Issue 13, pp. 22911-22921 (2022)

arXiv:2110.07272 [pdf, other]

Relighting Humans in the Wild: Monocular Full-Body Human Relighting with Domain Adaptation

Authors: Daichi Tajima, Yoshihiro Kanamori, Yuki Endo

Abstract: The modern supervised approaches for human image relighting rely on training data generated from 3D human models. However, such datasets are often small (e.g., Light Stage data with a small number of individuals) or limited to diffuse materials (e.g., commercial 3D scanned human models). Thus, the human relighting techniques suffer from the poor generalization capability and synthetic-to-real doma… ▽ More The modern supervised approaches for human image relighting rely on training data generated from 3D human models. However, such datasets are often small (e.g., Light Stage data with a small number of individuals) or limited to diffuse materials (e.g., commercial 3D scanned human models). Thus, the human relighting techniques suffer from the poor generalization capability and synthetic-to-real domain gap. In this paper, we propose a two-stage method for single-image human relighting with domain adaptation. In the first stage, we train a neural network for diffuse-only relighting. In the second stage, we train another network for enhancing non-diffuse reflection by learning residuals between real photos and images reconstructed by the diffuse-only network. Thanks to the second stage, we can achieve higher generalization capability against various cloth textures, while reducing the domain gap. Furthermore, to handle input videos, we integrate illumination-aware deep video prior to greatly reduce flickering artifacts even with challenging settings under dynamic illuminations. △ Less

Submitted 14 October, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: Accepted to Pacific Graphics 2021, project page: http://www.cgg.cs.tsukuba.ac.jp/~tajima/pub/relighting_in_the_wild/

arXiv:2106.13416 [pdf, other]

doi 10.1111/cgf.14164

Diversifying Semantic Image Synthesis and Editing via Class- and Layer-wise VAEs

Authors: Yuki Endo, Yoshihiro Kanamori

Abstract: Semantic image synthesis is a process for generating photorealistic images from a single semantic mask. To enrich the diversity of multimodal image synthesis, previous methods have controlled the global appearance of an output image by learning a single latent space. However, a single latent code is often insufficient for capturing various object styles because object appearance depends on multipl… ▽ More Semantic image synthesis is a process for generating photorealistic images from a single semantic mask. To enrich the diversity of multimodal image synthesis, previous methods have controlled the global appearance of an output image by learning a single latent space. However, a single latent code is often insufficient for capturing various object styles because object appearance depends on multiple factors. To handle individual factors that determine object styles, we propose a class- and layer-wise extension to the variational autoencoder (VAE) framework that allows flexible control over each object class at the local to global levels by learning multiple latent spaces. Furthermore, we demonstrate that our method generates images that are both plausible and more diverse compared to state-of-the-art methods via extensive experiments with real and synthetic datasets inthree different domains. We also show that our method enables a wide range of applications in image synthesis and editing tasks. △ Less

Submitted 29 June, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: Accepted to Pacific Graphics 2020, codes available at https://github.com/endo-yuki-t/DiversifyingSMIS

arXiv:2103.14877 [pdf, other]

Few-shot Semantic Image Synthesis Using StyleGAN Prior

Authors: Yuki Endo, Yoshihiro Kanamori

Abstract: This paper tackles a challenging problem of generating photorealistic images from semantic layouts in few-shot scenarios where annotated training pairs are hardly available but pixel-wise annotation is quite costly. We present a training strategy that performs pseudo labeling of semantic masks using the StyleGAN prior. Our key idea is to construct a simple map** between the StyleGAN feature and… ▽ More This paper tackles a challenging problem of generating photorealistic images from semantic layouts in few-shot scenarios where annotated training pairs are hardly available but pixel-wise annotation is quite costly. We present a training strategy that performs pseudo labeling of semantic masks using the StyleGAN prior. Our key idea is to construct a simple map** between the StyleGAN feature and each semantic class from a few examples of semantic masks. With such map**s, we can generate an unlimited number of pseudo semantic masks from random noise to train an encoder for controlling a pre-trained StyleGAN generator. Although the pseudo semantic masks might be too coarse for previous approaches that require pixel-aligned masks, our framework can synthesize high-quality images from not only dense semantic masks but also sparse inputs such as landmarks and scribbles. Qualitative and quantitative results with various datasets demonstrate improvement over previous approaches with respect to layout fidelity and visual quality in as few as one- or five-shot settings. △ Less

Submitted 12 May, 2021; v1 submitted 27 March, 2021; originally announced March 2021.

Comments: The source codes are available at https://github.com/endo-yuki-t/Fewshot-SMIS

arXiv:1910.07192 [pdf, other]

Animating Landscape: Self-Supervised Learning of Decoupled Motion and Appearance for Single-Image Video Synthesis

Authors: Yuki Endo, Yoshihiro Kanamori, Shigeru Kuriyama

Abstract: Automatic generation of a high-quality video from a single image remains a challenging task despite the recent advances in deep generative models. This paper proposes a method that can create a high-resolution, long-term animation using convolutional neural networks (CNNs) from a single landscape image where we mainly focus on skies and waters. Our key observation is that the motion (e.g., moving… ▽ More Automatic generation of a high-quality video from a single image remains a challenging task despite the recent advances in deep generative models. This paper proposes a method that can create a high-resolution, long-term animation using convolutional neural networks (CNNs) from a single landscape image where we mainly focus on skies and waters. Our key observation is that the motion (e.g., moving clouds) and appearance (e.g., time-varying colors in the sky) in natural scenes have different time scales. We thus learn them separately and predict them with decoupled control while handling future uncertainty in both predictions by introducing latent codes. Unlike previous methods that infer output frames directly, our CNNs predict spatially-smooth intermediate data, i.e., for motion, flow fields for war**, and for appearance, color transfer maps, via self-supervised learning, i.e., without explicitly-provided ground truth. These intermediate data are applied not to each previous output frame, but to the input image only once for each output frame. This design is crucial to alleviate error accumulation in long-term predictions, which is the essential problem in previous recurrent approaches. The output frames can be looped like cinemagraph, and also be controlled directly by specifying latent codes or indirectly via visual annotations. We demonstrate the effectiveness of our method through comparisons with the state-of-the-arts on video prediction as well as appearance manipulation. △ Less

Submitted 16 October, 2019; originally announced October 2019.

Comments: Published at SIGGRAPH Asia 2019 (ACM Transactions on Graphics)

arXiv:1908.02714 [pdf, other]

doi 10.1145/3272127.3275104

Relighting Humans: Occlusion-Aware Inverse Rendering for Full-Body Human Images

Authors: Yoshihiro Kanamori, Yuki Endo

Abstract: Relighting of human images has various applications in image synthesis. For relighting, we must infer albedo, shape, and illumination from a human portrait. Previous techniques rely on human faces for this inference, based on spherical harmonics (SH) lighting. However, because they often ignore light occlusion, inferred shapes are biased and relit images are unnaturally bright particularly at holl… ▽ More Relighting of human images has various applications in image synthesis. For relighting, we must infer albedo, shape, and illumination from a human portrait. Previous techniques rely on human faces for this inference, based on spherical harmonics (SH) lighting. However, because they often ignore light occlusion, inferred shapes are biased and relit images are unnaturally bright particularly at hollowed regions such as armpits, crotches, or garment wrinkles. This paper introduces the first attempt to infer light occlusion in the SH formulation directly. Based on supervised learning using convolutional neural networks (CNNs), we infer not only an albedo map, illumination but also a light transport map that encodes occlusion as nine SH coefficients per pixel. The main difficulty in this inference is the lack of training datasets compared to unlimited variations of human portraits. Surprisingly, geometric information including occlusion can be inferred plausibly even with a small dataset of synthesized human figures, by carefully preparing the dataset so that the CNNs can exploit the data coherency. Our method accomplishes more realistic relighting than the occlusion-ignored formulation. △ Less

Submitted 7 August, 2019; originally announced August 2019.

Comments: Published at SIGGRAPH Asia 2018 (ACM Transactions on Graphics). Project page with codes, pretrained models, and human model lists is at http://kanamori.cs.tsukuba.ac.jp/projects/relighting_human/

arXiv:1810.09444 [pdf, ps, other]

doi 10.1364/AO.58.001900

Digital holographic particle volume reconstruction using a deep neural network

Authors: Tomoyoshi Shimobaba, Takayuki Takahashi, Yota Yamamoto, Yutaka Endo, Atsushi Shiraki, Takashi Nishitsuji, Naoto Hoshikawa, Takashi Kakue, Tomoyosh Ito

Abstract: This paper proposes a particle volume reconstruction directly from an in-line hologram using a deep neural network. Digital holographic volume reconstruction conventionally uses multiple diffraction calculations to obtain sectional reconstructed images from an in-line hologram, followed by detection of the lateral and axial positions, and the sizes of particles by using focus metrics. However, the… ▽ More This paper proposes a particle volume reconstruction directly from an in-line hologram using a deep neural network. Digital holographic volume reconstruction conventionally uses multiple diffraction calculations to obtain sectional reconstructed images from an in-line hologram, followed by detection of the lateral and axial positions, and the sizes of particles by using focus metrics. However, the axial resolution is limited by the numerical aperture of the optical system, and the processes are time-consuming. The method proposed here can simultaneously detect the lateral and axial positions, and the particle sizes via a deep neural network (DNN). We numerically investigated the performance of the DNN in terms of the errors in the detected positions and sizes. The calculation time is faster than conventional diffracted-based approaches. △ Less

Submitted 21 October, 2018; originally announced October 2018.

arXiv:1710.08343 [pdf, ps, other]

doi 10.1016/j.optcom.2017.12.041

Computational ghost imaging using deep learning

Authors: Tomoyoshi Shimobaba, Yutaka Endo, Takashi Nishitsuji, Takayuki Takahashi, Yuki Nagahama, Satoki Hasegawa, Marie Sano, Ryuji Hirayama, Takashi Kakue, Atsushi Shiraki, Tomoyoshi Ito

Abstract: Computational ghost imaging (CGI) is a single-pixel imaging technique that exploits the correlation between known random patterns and the measured intensity of light transmitted (or reflected) by an object. Although CGI can obtain two- or three- dimensional images with a single or a few bucket detectors, the quality of the reconstructed images is reduced by noise due to the reconstruction of image… ▽ More Computational ghost imaging (CGI) is a single-pixel imaging technique that exploits the correlation between known random patterns and the measured intensity of light transmitted (or reflected) by an object. Although CGI can obtain two- or three- dimensional images with a single or a few bucket detectors, the quality of the reconstructed images is reduced by noise due to the reconstruction of images from random patterns. In this study, we improve the quality of CGI images using deep learning. A deep neural network is used to automatically learn the features of noise-contaminated CGI images. After training, the network is able to predict low-noise images from new noise-contaminated CGI images. △ Less

Submitted 18 October, 2017; originally announced October 2017.

arXiv:1612.03959 [pdf, other]

doi 10.1364/AO.56.000F27

Autoencoder-based holographic image restoration

Authors: Tomoyoshi Shimobaba, Yutaka Endo, Ryuji Hirayama, Yuki Nagahama, Takayuki Takahashi, Takashi Nishitsuji, Takashi Kakue, Atsushi Shiraki, Naoki Takada, Nobuyuki Masuda, Tomoyoshi Ito

Abstract: We propose a holographic image restoration method using an autoencoder, which is an artificial neural network. Because holographic reconstructed images are often contaminated by direct light, conjugate light, and speckle noise, the discrimination of reconstructed images may be difficult. In this paper, we demonstrate the restoration of reconstructed images from holograms that record page data in h… ▽ More We propose a holographic image restoration method using an autoencoder, which is an artificial neural network. Because holographic reconstructed images are often contaminated by direct light, conjugate light, and speckle noise, the discrimination of reconstructed images may be difficult. In this paper, we demonstrate the restoration of reconstructed images from holograms that record page data in holographic memory and QR codes by using the proposed method. △ Less

Submitted 12 December, 2016; originally announced December 2016.

arXiv:1504.01424 [pdf, ps, other]

Improvement of the image quality of random phase--free holography using an iterative method

Authors: Tomoyoshi Shimobaba, Takashi Kakue, Yutaka Endo, Ryuji Hirayama, Daisuke Hiyama, Satoki Hasegawa, Yuki Nagahama, Marie Sano, Minoru Oikawa, Takashige Sugie, Tomoyoshi Ito

Abstract: Our proposed method of random phase-free holography using virtual convergence light can obtain large reconstructed images exceeding the size of the hologram, without the assistance of random phase. The reconstructed images have low-speckle noise in the amplitude and phase-only holograms (kinoforms); however, in low-resolution holograms, we obtain a degraded image quality compared to the original i… ▽ More Our proposed method of random phase-free holography using virtual convergence light can obtain large reconstructed images exceeding the size of the hologram, without the assistance of random phase. The reconstructed images have low-speckle noise in the amplitude and phase-only holograms (kinoforms); however, in low-resolution holograms, we obtain a degraded image quality compared to the original image. We propose an iterative random phase-free method with virtual convergence light to address this problem. △ Less

Submitted 6 April, 2015; originally announced April 2015.

arXiv:1503.00360 [pdf, ps, other]

Optical encryption for large-sized images using random phase-free method

Authors: Tomoyoshi Shimobaba, Takashi Kakue, Yutaka Endo, Ryuji Hirayama, Daisuke Hiyama, Satoki Hasegawa, Yuki Nagahama, Marie Sano, Takashige Sugie, Tomoyoshi Ito

Abstract: We propose an optical encryption framework that can encrypt and decrypt large-sized images beyond the size of the encrypted image using our two methods: random phase-free method and scaled diffraction. In order to record the entire image information on the encrypted image, the large-sized images require the random phase to widely diffuse the object light over the encrypted image; however, the rand… ▽ More We propose an optical encryption framework that can encrypt and decrypt large-sized images beyond the size of the encrypted image using our two methods: random phase-free method and scaled diffraction. In order to record the entire image information on the encrypted image, the large-sized images require the random phase to widely diffuse the object light over the encrypted image; however, the random phase gives rise to the speckle noise on the decrypted images, and it may be difficult to recognize the decrypted images. In order to reduce the speckle noise, we apply our random phase-free method to the framework. In addition, we employ scaled diffraction that calculates light propagation between planes with different sizes by changing the sampling rates. △ Less

Submitted 1 March, 2015; originally announced March 2015.

arXiv:1407.2971 [pdf, ps, other]

doi 10.1016/j.optcom.2014.07.081

Numerical investigation of lensless zoomable holographic multiple projections to tilted planes

Authors: Tomoyoshi Shimobaba, Michal Makowski, Takashi Kakue, Naohisa Okada, Yutaka Endo, Ryuji Hirayam, Daisuke Hiyama, Satoki Hasegawa, Yuki Nagahama, Tomoyoshi Ito

Abstract: This paper numerically investigates the feasibility of lensless zoomable holographic multiple projections to tilted planes. We have already developed lensless zoomable holographic single projection using scaled diffraction, which calculates diffraction between parallel planes with different sampling pitches. The structure of this zoomable holographic projection is very simple because it does not n… ▽ More This paper numerically investigates the feasibility of lensless zoomable holographic multiple projections to tilted planes. We have already developed lensless zoomable holographic single projection using scaled diffraction, which calculates diffraction between parallel planes with different sampling pitches. The structure of this zoomable holographic projection is very simple because it does not need a lens; however, it only projects a single image to a plane parallel to the hologram. The lensless zoomable holographic projection in this paper is capable of projecting multiple images onto tilted planes simultaneously. △ Less

Submitted 10 July, 2014; originally announced July 2014.

arXiv:1308.0376 [pdf, ps, other]

doi 10.1117/1.OE.53.2.024108

Calculation reduction method for color computer-generated hologram using color space conversion

Authors: Tomoyoshi Shimobaba, Takashi Kakue, Minoru Oikawa, Naoki Takada, Naohisa Okada, Yutaka Endo, Ryuji Hirayama, Tomoyoshi Ito

Abstract: We report a calculation reduction method for color computer-generated holograms (CGHs) using color space conversion. Color CGHs are generally calculated on RGB space. In this paper, we calculate color CGHs in other color spaces: for example, YCbCr color space. In YCbCr color space, a RGB image is converted to the luminance component (Y), blue-difference chroma (Cb) and red-difference chroma (Cr) c… ▽ More We report a calculation reduction method for color computer-generated holograms (CGHs) using color space conversion. Color CGHs are generally calculated on RGB space. In this paper, we calculate color CGHs in other color spaces: for example, YCbCr color space. In YCbCr color space, a RGB image is converted to the luminance component (Y), blue-difference chroma (Cb) and red-difference chroma (Cr) components. In terms of the human eye, although the negligible difference of the luminance component is well-recognized, the difference of the other components is not. In this method, the luminance component is normal sampled and the chroma components are down-sampled. The down-sampling allows us to accelerate the calculation of the color CGHs. We compute diffraction calculations from the components, and then we convert the diffracted results in YCbCr color space to RGB color space. △ Less

Submitted 1 August, 2013; originally announced August 2013.

Showing 1–22 of 22 results for author: Endo, Y