Search | arXiv e-print repository

doi 10.1109/WACV57701.2024.00288

Controlling Rate, Distortion, and Realism: Towards a Single Comprehensive Neural Image Compression Model

Authors: Shoma Iwai, Tomo Miyazaki, Shinichiro Omachi

Abstract: In recent years, neural network-driven image compression (NIC) has gained significant attention. Some works adopt deep generative models such as GANs and diffusion models to enhance perceptual quality (realism). A critical obstacle of these generative NIC methods is that each model is optimized for a single bit rate. Consequently, multiple models are required to compress images to different bit ra… ▽ More In recent years, neural network-driven image compression (NIC) has gained significant attention. Some works adopt deep generative models such as GANs and diffusion models to enhance perceptual quality (realism). A critical obstacle of these generative NIC methods is that each model is optimized for a single bit rate. Consequently, multiple models are required to compress images to different bit rates, which is impractical for real-world applications. To tackle this issue, we propose a variable-rate generative NIC model. Specifically, we explore several discriminator designs tailored for the variable-rate approach and introduce a novel adversarial loss. Moreover, by incorporating the newly proposed multi-realism technique, our method allows the users to adjust the bit rate, distortion, and realism with a single model, achieving ultra-controllability. Unlike existing variable-rate generative NIC models, our method matches or surpasses the performance of state-of-the-art single-rate generative NIC models while covering a wide range of bit rates using just one model. Code will be available at https://github.com/iwa-shi/CRDR △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: WACV2024 Oral. Code is at https://github.com/iwa-shi/CRDR

arXiv:2405.09873 [pdf, other]

IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model

Authors: Yongsong Huang, Tomo Miyazaki, Xiaofeng Liu, Shinichiro Omachi

Abstract: Infrared (IR) image super-resolution faces challenges from homogeneous background pixel distributions and sparse target regions, requiring models that effectively handle long-range dependencies and capture detailed local-global information. Recent advancements in Mamba-based (Selective Structured State Space Model) models, employing state space models, have shown significant potential in visual ta… ▽ More Infrared (IR) image super-resolution faces challenges from homogeneous background pixel distributions and sparse target regions, requiring models that effectively handle long-range dependencies and capture detailed local-global information. Recent advancements in Mamba-based (Selective Structured State Space Model) models, employing state space models, have shown significant potential in visual tasks, suggesting their applicability for IR enhancement. In this work, we introduce IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model, a novel Mamba-based model designed specifically for IR image super-resolution. This model enhances the restoration of context-sparse target details through its advanced dependency modeling capabilities. Additionally, a new wavelet transform feature modulation block improves multi-scale receptive field representation, capturing both global and local information efficiently. Comprehensive evaluations confirm that IRSRMamba outperforms existing models on multiple benchmarks. This research advances IR super-resolution and demonstrates the potential of Mamba-based models in IR image processing. Code are available at \url{https://github.com/yongsongH/IRSRMamba}. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2403.16553 [pdf, other]

Imaging quantum interference in a monolayer Kitaev quantum spin liquid candidate

Authors: Y. Kohsaka, S. Akutagawa, S. Omachi, Y. Iwamichi, T. Ono, I. Tanaka, S. Tateishi, H. Murayama, S. Suetsugu, K. Hashimoto, T. Shibauchi, M. O. Takahashi, M. G. Yamada, S. Nikolaev, T. Mizushima, S. Fujimoto, T. Terashima, T. Asaba, Y. Kasahara, Y. Matsuda

Abstract: Single atomic defects are prominent windows to look into host quantum states because collective responses from the host states emerge as localized states around the defects. Friedel oscillations and Kondo clouds in Fermi liquids are quintessential examples. However, the situation is quite different for quantum spin liquid (QSL), an exotic state of matter with fractionalized quasiparticles and topo… ▽ More Single atomic defects are prominent windows to look into host quantum states because collective responses from the host states emerge as localized states around the defects. Friedel oscillations and Kondo clouds in Fermi liquids are quintessential examples. However, the situation is quite different for quantum spin liquid (QSL), an exotic state of matter with fractionalized quasiparticles and topological order arising from a profound impact of quantum entanglement. Elucidating the underlying local electronic property has been challenging due to the charge neutrality of fractionalized quasiparticles and the insulating nature of QSLs. Here, using spectroscopic-imaging scanning tunneling microscopy, we report atomically resolved images of monolayer $α$-RuCl$_3$, the most promising Kitaev QSL candidate, on metallic substrates. We find quantum interference in the insulator manifesting as incommensurate and decaying spatial oscillations of the local density of states around defects with a characteristic bias dependence. The oscillation differs from any known spatial structures in its nature and does not exist in other Mott insulators, implying it is an exotic oscillation involved with excitations unique to $α$-RuCl$_3$. Numerical simulations can reproduce the observed oscillation by assuming that itinerant Majorana fermions of Kitaev QSL are scattered across the Majorana Fermi surface. The oscillation provides a new approach to exploring Kitaev QSLs through the local response against defects like Friedel oscillations in metals. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2312.16455 [pdf, other]

Learn From Orientation Prior for Radiograph Super-Resolution: Orientation Operator Transformer

Authors: Yongsong Huang, Tomo Miyazaki, Xiaofeng Liu, Kaiyuan Jiang, Zhengmi Tang, Shinichiro Omachi

Abstract: Background and objective: High-resolution radiographic images play a pivotal role in the early diagnosis and treatment of skeletal muscle-related diseases. It is promising to enhance image quality by introducing single-image super-resolution (SISR) model into the radiology image field. However, the conventional image pipeline, which can learn a mixed map** between SR and denoising from the color… ▽ More Background and objective: High-resolution radiographic images play a pivotal role in the early diagnosis and treatment of skeletal muscle-related diseases. It is promising to enhance image quality by introducing single-image super-resolution (SISR) model into the radiology image field. However, the conventional image pipeline, which can learn a mixed map** between SR and denoising from the color space and inter-pixel patterns, poses a particular challenge for radiographic images with limited pattern features. To address this issue, this paper introduces a novel approach: Orientation Operator Transformer - $O^{2}$former. Methods: We incorporate an orientation operator in the encoder to enhance sensitivity to denoising map** and to integrate orientation prior. Furthermore, we propose a multi-scale feature fusion strategy to amalgamate features captured by different receptive fields with the directional prior, thereby providing a more effective latent representation for the decoder. Based on these innovative components, we propose a transformer-based SISR model, i.e., $O^{2}$former, specifically designed for radiographic images. Results: The experimental results demonstrate that our method achieves the best or second-best performance in the objective metrics compared with the competitors at $\times 4$ upsampling factor. For qualitative, more objective details are observed to be recovered. Conclusions: In this study, we propose a novel framework called $O^{2}$former for radiological image super-resolution tasks, which improves the reconstruction model's performance by introducing an orientation operator and multi-scale feature fusion strategy. Our approach is promising to further promote the radiographic image enhancement field. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: Accepted by Computer Methods and Programs in Biomedicine

arXiv:2312.00689 [pdf, other]

Infrared Image Super-Resolution via GAN

Authors: Yongsong Huang, Shinichiro Omachi

Abstract: The ability of generative models to accurately fit data distributions has resulted in their widespread adoption and success in fields such as computer vision and natural language processing. In this chapter, we provide a brief overview of the application of generative models in the domain of infrared (IR) image super-resolution, including a discussion of the various challenges and adversarial trai… ▽ More The ability of generative models to accurately fit data distributions has resulted in their widespread adoption and success in fields such as computer vision and natural language processing. In this chapter, we provide a brief overview of the application of generative models in the domain of infrared (IR) image super-resolution, including a discussion of the various challenges and adversarial training methods employed. We propose potential areas for further investigation and advancement in the application of generative models for IR image super-resolution. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: Applications of Generative AI, Chapter 28

arXiv:2311.08816 [pdf, other]

Target-oriented Domain Adaptation for Infrared Image Super-Resolution

Authors: Yongsong Huang, Tomo Miyazaki, Xiaofeng Liu, Yafei Dong, Shinichiro Omachi

Abstract: Recent efforts have explored leveraging visible light images to enrich texture details in infrared (IR) super-resolution. However, this direct adaptation approach often becomes a double-edged sword, as it improves texture at the cost of introducing noise and blurring artifacts. To address these challenges, we propose the Target-oriented Domain Adaptation SRGAN (DASRGAN), an innovative framework sp… ▽ More Recent efforts have explored leveraging visible light images to enrich texture details in infrared (IR) super-resolution. However, this direct adaptation approach often becomes a double-edged sword, as it improves texture at the cost of introducing noise and blurring artifacts. To address these challenges, we propose the Target-oriented Domain Adaptation SRGAN (DASRGAN), an innovative framework specifically engineered for robust IR super-resolution model adaptation. DASRGAN operates on the synergy of two key components: 1) Texture-Oriented Adaptation (TOA) to refine texture details meticulously, and 2) Noise-Oriented Adaptation (NOA), dedicated to minimizing noise transfer. Specifically, TOA uniquely integrates a specialized discriminator, incorporating a prior extraction branch, and employs a Sobel-guided adversarial loss to align texture distributions effectively. Concurrently, NOA utilizes a noise adversarial loss to distinctly separate the generative and Gaussian noise pattern distributions during adversarial training. Our extensive experiments confirm DASRGAN's superiority. Comparative analyses against leading methods across multiple benchmarks and upsampling factors reveal that DASRGAN sets new state-of-the-art performance standards. Code are available at \url{https://github.com/yongsongH/DASRGAN}. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 11 pages, 9 figures

arXiv:2305.11373 [pdf, other]

doi 10.1016/j.patcog.2023.109696

Deep Image Compression Using Scene Text Quality Assessment

Authors: Shohei Uchigasaki, Tomo Miyazaki, Shinichiro Omachi

Abstract: Image compression is a fundamental technology for Internet communication engineering. However, a high compression rate with general methods may degrade images, resulting in unreadable texts. In this paper, we propose an image compression method for maintaining text quality. We developed a scene text image quality assessment model to assess text quality in compressed images. The assessment model it… ▽ More Image compression is a fundamental technology for Internet communication engineering. However, a high compression rate with general methods may degrade images, resulting in unreadable texts. In this paper, we propose an image compression method for maintaining text quality. We developed a scene text image quality assessment model to assess text quality in compressed images. The assessment model iteratively searches for the best-compressed image holding high-quality text. Objective and subjective results showed that the proposed method was superior to existing methods. Furthermore, the proposed assessment model outperformed other deep-learning regression models. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: Accepted by Pattern Recognition, 2023

Journal ref: Pattern Recognition, 2023

arXiv:2212.12322 [pdf, other]

Infrared Image Super-Resolution: Systematic Review, and Future Trends

Authors: Yongsong Huang, Tomo Miyazaki, Xiaofeng Liu, Shinichiro Omachi

Abstract: Image Super-Resolution (SR) is essential for a wide range of computer vision and image processing tasks. Investigating infrared (IR) image (or thermal images) super-resolution is a continuing concern within the development of deep learning. This survey aims to provide a comprehensive perspective of IR image super-resolution, including its applications, hardware imaging system dilemmas, and taxonom… ▽ More Image Super-Resolution (SR) is essential for a wide range of computer vision and image processing tasks. Investigating infrared (IR) image (or thermal images) super-resolution is a continuing concern within the development of deep learning. This survey aims to provide a comprehensive perspective of IR image super-resolution, including its applications, hardware imaging system dilemmas, and taxonomy of image processing methodologies. In addition, the datasets and evaluation metrics in IR image super-resolution tasks are also discussed. Furthermore, the deficiencies in current technologies and possible promising directions for the community to explore are highlighted. To cope with the rapid development in this field, we intend to regularly update the relevant excellent work at \url{https://github.com/yongsongH/Infrared_Image_SR_Survey △ Less

Submitted 15 November, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

Comments: Submitted to IEEE TNNLS

arXiv:2209.02397 [pdf, other]

A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed Real-World Data

Authors: Zhengmi Tang, Tomo Miyazaki, Shinichiro Omachi

Abstract: Scene-text image synthesis techniques that aim to naturally compose text instances on background scene images are very appealing for training deep neural networks due to their ability to provide accurate and comprehensive annotation information. Prior studies have explored generating synthetic text images on two-dimensional and three-dimensional surfaces using rules derived from real-world observa… ▽ More Scene-text image synthesis techniques that aim to naturally compose text instances on background scene images are very appealing for training deep neural networks due to their ability to provide accurate and comprehensive annotation information. Prior studies have explored generating synthetic text images on two-dimensional and three-dimensional surfaces using rules derived from real-world observations. Some of these studies have proposed generating scene-text images through learning; however, owing to the absence of a suitable training dataset, unsupervised frameworks have been explored to learn from existing real-world data, which might not yield reliable performance. To ease this dilemma and facilitate research on learning-based scene text synthesis, we introduce DecompST, a real-world dataset prepared from some public benchmarks, containing three types of annotations: quadrilateral-level BBoxes, stroke-level text masks, and text-erased images. Leveraging the DecompST dataset, we propose a Learning-Based Text Synthesis engine (LBTS) that includes a text location proposal network (TLPNet) and a text appearance adaptation network (TAANet). TLPNet first predicts the suitable regions for text embedding, after which TAANet adaptively adjusts the geometry and color of the text instance to match the background context. After training, those networks can be integrated and utilized to generate the synthetic dataset for scene text analysis tasks. Comprehensive experiments were conducted to validate the effectiveness of the proposed LBTS along with existing methods, and the experimental results indicate the proposed LBTS can generate better pretraining data for scene text detectors. △ Less

Submitted 17 October, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

arXiv:2208.03008 [pdf, other]

doi 10.1007/978-3-031-21014-3_5

Rethinking Degradation: Radiograph Super-Resolution via AID-SRGAN

Authors: Yongsong Huang, Qingzhong Wang, Shinichiro Omachi

Abstract: In this paper, we present a medical AttentIon Denoising Super Resolution Generative Adversarial Network (AID-SRGAN) for diographic image super-resolution. First, we present a medical practical degradation model that considers various degradation factors beyond downsampling. To the best of our knowledge, this is the first composite degradation model proposed for radiographic images. Furthermore, we… ▽ More In this paper, we present a medical AttentIon Denoising Super Resolution Generative Adversarial Network (AID-SRGAN) for diographic image super-resolution. First, we present a medical practical degradation model that considers various degradation factors beyond downsampling. To the best of our knowledge, this is the first composite degradation model proposed for radiographic images. Furthermore, we propose AID-SRGAN, which can simultaneously denoise and generate high-resolution (HR) radiographs. In this model, we introduce an attention mechanism into the denoising module to make it more robust to complicated degradation. Finally, the SR module reconstructs the HR radiographs using the "clean" low-resolution (LR) radiographs. In addition, we propose a separate-joint training approach to train the model, and extensive experiments are conducted to show that the proposed method is superior to its counterparts. e.g., our proposed method achieves $31.90$ of PSNR with a scale factor of $4 \times$, which is $7.05 \%$ higher than that obtained by recent work, SPSR [16]. Our dataset and code will be made available at: https://github.com/yongsongH/AIDSRGAN-MICCAI2022. △ Less

Submitted 5 August, 2022; originally announced August 2022.

Comments: Accepted to MICCAI 2022 Workshop. Code: https://github.com/yongsongH/AIDSRGAN-MICCAI2022

arXiv:2104.11493 [pdf, other]

doi 10.1109/TIP.2021.3125260

Stroke-Based Scene Text Erasing Using Synthetic Data for Training

Authors: Zhengmi Tang, Tomo Miyazaki, Yoshihiro Sugaya, Shinichiro Omachi

Abstract: Scene text erasing, which replaces text regions with reasonable content in natural images, has drawn significant attention in the computer vision community in recent years. There are two potential subtasks in scene text erasing: text detection and image inpainting. Both subtasks require considerable data to achieve better performance; however, the lack of a large-scale real-world scene-text remova… ▽ More Scene text erasing, which replaces text regions with reasonable content in natural images, has drawn significant attention in the computer vision community in recent years. There are two potential subtasks in scene text erasing: text detection and image inpainting. Both subtasks require considerable data to achieve better performance; however, the lack of a large-scale real-world scene-text removal dataset does not allow existing methods to realize their potential. To compensate for the lack of pairwise real-world data, we made considerable use of synthetic text after additional enhancement and subsequently trained our model only on the dataset generated by the improved synthetic text engine. Our proposed network contains a stroke mask prediction module and background inpainting module that can extract the text stroke as a relatively small hole from the cropped text image to maintain more background content for better inpainting results. This model can partially erase text instances in a scene image with a bounding box or work with an existing scene-text detector for automatic scene text erasing. The experimental results from the qualitative and quantitative evaluation on the SCUT-Syn, ICDAR2013, and SCUT-EnsText datasets demonstrate that our method significantly outperforms existing state-of-the-art methods even when they are trained on real-world data. △ Less

Submitted 3 December, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

Journal ref: IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021, 9306-9320

arXiv:2008.10314 [pdf, other]

doi 10.1109/ICPR48806.2021.9412185

Fidelity-Controllable Extreme Image Compression with Generative Adversarial Networks

Authors: Shoma Iwai, Tomo Miyazaki, Yoshihiro Sugaya, Shinichiro Omachi

Abstract: We propose a GAN-based image compression method working at extremely low bitrates below 0.1bpp. Most existing learned image compression methods suffer from blur at extremely low bitrates. Although GAN can help to reconstruct sharp images, there are two drawbacks. First, GAN makes training unstable. Second, the reconstructions often contain unpleasing noise or artifacts. To address both of the draw… ▽ More We propose a GAN-based image compression method working at extremely low bitrates below 0.1bpp. Most existing learned image compression methods suffer from blur at extremely low bitrates. Although GAN can help to reconstruct sharp images, there are two drawbacks. First, GAN makes training unstable. Second, the reconstructions often contain unpleasing noise or artifacts. To address both of the drawbacks, our method adopts two-stage training and network interpolation. The two-stage training is effective to stabilize the training. Moreover, the network interpolation utilizes the models in both stages and reduces undesirable noise and artifacts, while maintaining important edges. Hence, we can control the trade-off between perceptual quality and fidelity without re-training models. The experimental results show that our model can reconstruct high quality images. Furthermore, our user study confirms that our reconstructions are preferable to state-of-the-art GAN-based image compression model. The code will be available. △ Less

Submitted 24 August, 2020; originally announced August 2020.

Comments: 8 pages, 11 figures

Journal ref: ICPR, 2020

arXiv:2004.07967 [pdf, other]

doi 10.3390/app11073214

Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence

Authors: Huy Manh Nguyen, Tomo Miyazaki, Yoshihiro Sugaya, Shinichiro Omachi

Abstract: Visual-semantic embedding aims to learn a joint embedding space where related video and sentence instances are located close to each other. Most existing methods put instances in a single embedding space. However, they struggle to embed instances due to the difficulty of matching visual dynamics in videos to textual features in sentences. A single space is not enough to accommodate various videos… ▽ More Visual-semantic embedding aims to learn a joint embedding space where related video and sentence instances are located close to each other. Most existing methods put instances in a single embedding space. However, they struggle to embed instances due to the difficulty of matching visual dynamics in videos to textual features in sentences. A single space is not enough to accommodate various videos and sentences. In this paper, we propose a novel framework that maps instances into multiple individual embedding spaces so that we can capture multiple relationships between instances, leading to compelling video retrieval. We propose to produce a final similarity between instances by fusing similarities measured in each embedding space using a weighted sum strategy. We determine the weights according to a sentence. Therefore, we can flexibly emphasize an embedding space. We conducted sentence-to-video retrieval experiments on a benchmark dataset. The proposed method achieved superior performance, and the results are competitive to state-of-the-art methods. These experimental results demonstrated the effectiveness of the proposed multiple embedding approach compared to existing methods. △ Less

Submitted 16 April, 2020; originally announced April 2020.

Comments: 8 pages, 5 figures

Journal ref: Applied Sciences, 2021

arXiv:1703.02662 [pdf, ps, other]

doi 10.1109/ACCESS.2018.2876860

Structural Data Recognition with Graph Model Boosting

Authors: Tomo Miyazaki, Shinichiro Omachi

Abstract: This paper presents a novel method for structural data recognition using a large number of graph models. In general, prevalent methods for structural data recognition have two shortcomings: 1) Only a single model is used to capture structural variation. 2) Naive recognition methods are used, such as the nearest neighbor method. In this paper, we propose strengthening the recognition performance of… ▽ More This paper presents a novel method for structural data recognition using a large number of graph models. In general, prevalent methods for structural data recognition have two shortcomings: 1) Only a single model is used to capture structural variation. 2) Naive recognition methods are used, such as the nearest neighbor method. In this paper, we propose strengthening the recognition performance of these models as well as their ability to capture structural variation. The proposed method constructs a large number of graph models and trains decision trees using the models. This paper makes two main contributions. The first is a novel graph model that can quickly perform calculations, which allows us to construct several models in a feasible amount of time. The second contribution is a novel approach to structural data recognition: graph model boosting. Comprehensive structural variations can be captured with a large number of graph models constructed in a boosting framework, and a sophisticated classifier can be formed by aggregating the decision trees. Consequently, we can carry out structural data recognition with powerful recognition capability in the face of comprehensive structural variation. The experiments shows that the proposed method achieves impressive results and outperforms existing methods on datasets of IAM graph database repository. △ Less

Submitted 7 March, 2017; originally announced March 2017.

Comments: 8 pages

Journal ref: IEEE Access, 2018

arXiv:1701.05703 [pdf, ps, other]

doi 10.1109/MCG.2019.2931431

Automatic Generation of Typographic Font from a Small Font Subset

Authors: Tomo Miyazaki, Tatsunori Tsuchiya, Yoshihiro Sugaya, Shinichiro Omachi, Masakazu Iwamura, Seiichi Uchida, Koichi Kise

Abstract: This paper addresses the automatic generation of a typographic font from a subset of characters. Specifically, we use a subset of a typographic font to extrapolate additional characters. Consequently, we obtain a complete font containing a number of characters sufficient for daily use. The automated generation of Japanese fonts is in high demand because a Japanese font requires over 1,000 characte… ▽ More This paper addresses the automatic generation of a typographic font from a subset of characters. Specifically, we use a subset of a typographic font to extrapolate additional characters. Consequently, we obtain a complete font containing a number of characters sufficient for daily use. The automated generation of Japanese fonts is in high demand because a Japanese font requires over 1,000 characters. Unfortunately, professional typographers create most fonts, resulting in significant financial and time investments for font generation. The proposed method can be a great aid for font creation because designers do not need to create the majority of the characters for a new font. The proposed method uses strokes from given samples for font generation. The strokes, from which we construct characters, are extracted by exploiting a character skeleton dataset. This study makes three main contributions: a novel method of extracting strokes from characters, which is applicable to both standard fonts and their variations; a fully automated approach for constructing characters; and a selection method for sample characters. We demonstrate our proposed method by generating 2,965 characters in 47 fonts. Objective and subjective evaluations verify that the generated characters are similar to handmade characters. △ Less

Submitted 20 January, 2017; originally announced January 2017.

Comments: 12 pages, 17 figures

Journal ref: IEEE Computer Graphics and Applications, 2019

Showing 1–15 of 15 results for author: Omachi, S