Search | arXiv e-print repository

Extreme Compression of Adaptive Neural Images

Authors: Leo Hoshikawa, Marcos V. Conde, Takeshi Ohashi, Atsushi Irie

Abstract: Implicit Neural Representations (INRs) and Neural Fields are a novel paradigm for signal representation, from images and audio to 3D scenes and videos. The fundamental idea is to represent a signal as a continuous and differentiable neural network. This idea offers unprecedented benefits such as continuous resolution and memory efficiency, enabling new compression techniques. However, representing… ▽ More Implicit Neural Representations (INRs) and Neural Fields are a novel paradigm for signal representation, from images and audio to 3D scenes and videos. The fundamental idea is to represent a signal as a continuous and differentiable neural network. This idea offers unprecedented benefits such as continuous resolution and memory efficiency, enabling new compression techniques. However, representing data as neural networks poses new challenges. For instance, given a 2D image as a neural network, how can we further compress such a neural image?. In this work, we present a novel analysis on compressing neural fields, with the focus on images. We also introduce Adaptive Neural Images (ANI), an efficient neural representation that enables adaptation to different inference or transmission requirements. Our proposed method allows to reduce the bits-per-pixel (bpp) of the neural image by 4x, without losing sensitive details or harming fidelity. We achieve this thanks to our successful implementation of 4-bit neural representations. Our work offers a new framework for develo** compressed neural fields. △ Less

Submitted 4 June, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

Comments: Technical Report. Work in progress

arXiv:2404.03256 [pdf, other]

Multi Positive Contrastive Learning with Pose-Consistent Generated Images

Authors: Sho Inayoshi, Aji Resindra Widya, Satoshi Ozaki, Junji Otsuka, Takeshi Ohashi

Abstract: Model pre-training has become essential in various recognition tasks. Meanwhile, with the remarkable advancements in image generation models, pre-training methods utilizing generated images have also emerged given their ability to produce unlimited training data. However, while existing methods utilizing generated images excel in classification, they fall short in more practical tasks, such as hum… ▽ More Model pre-training has become essential in various recognition tasks. Meanwhile, with the remarkable advancements in image generation models, pre-training methods utilizing generated images have also emerged given their ability to produce unlimited training data. However, while existing methods utilizing generated images excel in classification, they fall short in more practical tasks, such as human pose estimation. In this paper, we have experimentally demonstrated it and propose the generation of visually distinct images with identical human poses. We then propose a novel multi-positive contrastive learning, which optimally utilize the previously generated images to learn structural features of the human body. We term the entire learning pipeline as GenPoCCL. Despite using only less than 1% amount of data compared to current state-of-the-art method, GenPoCCL captures structural features of the human body more effectively, surpassing existing methods in a variety of human-centric perception tasks. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.20080 [pdf, other]

Mixed-precision Supernet Training from Vision Foundation Models using Low Rank Adapter

Authors: Yuiko Sakuma, Masakazu Yoshimura, Junji Otsuka, Atsushi Irie, Takeshi Ohashi

Abstract: Compression of large and performant vision foundation models (VFMs) into arbitrary bit-wise operations (BitOPs) allows their deployment on various hardware. We propose to fine-tune a VFM to a mixed-precision quantized supernet. The supernet-based neural architecture search (NAS) can be adopted for this purpose, which trains a supernet, and then subnets within arbitrary hardware budgets can be extr… ▽ More Compression of large and performant vision foundation models (VFMs) into arbitrary bit-wise operations (BitOPs) allows their deployment on various hardware. We propose to fine-tune a VFM to a mixed-precision quantized supernet. The supernet-based neural architecture search (NAS) can be adopted for this purpose, which trains a supernet, and then subnets within arbitrary hardware budgets can be extracted. However, existing methods face difficulties in optimizing the mixed-precision search space and incurring large memory costs during training. To tackle these challenges, first, we study the effective search space design for fine-tuning a VFM by comparing different operators (such as resolution, feature size, width, depth, and bit-widths) in terms of performance and BitOPs reduction. Second, we propose memory-efficient supernet training using a low-rank adapter (LoRA) and a progressive training strategy. The proposed method is evaluated for the recently proposed VFM, Segment Anything Model, fine-tuned on segmentation tasks. The searched model yields about a 95% reduction in BitOPs without incurring performance degradation. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.10091 [pdf, other]

PQDynamicISP: Dynamically Controlled Image Signal Processor for Any Image Sensors Pursuing Perceptual Quality

Authors: Masakazu Yoshimura, Junji Otsuka, Takeshi Ohashi

Abstract: Full DNN-based image signal processors (ISPs) have been actively studied and have achieved superior image quality compared to conventional ISPs. In contrast to this trend, we propose a lightweight ISP that consists of simple conventional ISP functions but achieves high image quality by increasing expressiveness. Specifically, instead of tuning the parameters of the ISP, we propose to control them… ▽ More Full DNN-based image signal processors (ISPs) have been actively studied and have achieved superior image quality compared to conventional ISPs. In contrast to this trend, we propose a lightweight ISP that consists of simple conventional ISP functions but achieves high image quality by increasing expressiveness. Specifically, instead of tuning the parameters of the ISP, we propose to control them dynamically for each environment and even locally. As a result, state-of-the-art accuracy is achieved on various datasets, including other tasks like tone map** and image enhancement, even though ours is lighter than DNN-based ISPs. Additionally, our method can process different image sensors with a single ISP through dynamic control, whereas conventional methods require training for each sensor. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: Keywords: image signal processor, ISP, image enhancement, tone map**

arXiv:2307.03338 [pdf]

From Conservatism to Innovation: The Sequential and Iterative Process of Smart Livestock Technology Adoption in Japanese Small-Farm Systems

Authors: Takumi Ohashi, Miki Saijo, Kento Suzuki, Shinsuke Arafuka

Abstract: As global demand for animal products is projected to increase significantly by 2050, driven by population growth and increased incomes, smart livestock technologies are essential for improving efficiency, animal welfare, and environmental sustainability. Conducted within the unique agricultural context of Japan, characterized by small-scale, family-run farms and strong government protection polici… ▽ More As global demand for animal products is projected to increase significantly by 2050, driven by population growth and increased incomes, smart livestock technologies are essential for improving efficiency, animal welfare, and environmental sustainability. Conducted within the unique agricultural context of Japan, characterized by small-scale, family-run farms and strong government protection policies, our study builds upon traditional theoretical frameworks that often oversimplify farmers' decision-making processes. By employing a sco** review, expert interviews, and a Modified Grounded Theory Approach, our research uncovers the intricate interplay between individual farmer values, farm management policies, social relations, agricultural policies, and livestock industry trends. We particularly highlight the unique dynamics within family-owned businesses, noting the tension between an "advanced management mindset" and "conservatism." Our study reveals that technology adoption is a sequential and iterative process, influenced by technology availability, farmers' digital literacy, technology implementation support, and observable technology impacts on animal health and productivity. These insights highlight the need for tailored support mechanisms and policies to enhance technology uptake, thereby promoting sustainable and efficient livestock production system. △ Less

Submitted 17 June, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: 58 pages, 3 figures

MSC Class: 91C99 ACM Class: J.4

arXiv:2303.13916 [pdf, other]

Self-Supervised Reversed Image Signal Processing via Reference-Guided Dynamic Parameter Selection

Authors: Junji Otsuka, Masakazu Yoshimura, Takeshi Ohashi

Abstract: Unprocessed sensor outputs (RAW images) potentially improve both low-level and high-level computer vision algorithms, but the lack of large-scale RAW image datasets is a barrier to research. Thus, reversed Image Signal Processing (ISP) which converts existing RGB images into RAW images has been studied. However, most existing methods require camera-specific metadata or paired RGB and RAW images to… ▽ More Unprocessed sensor outputs (RAW images) potentially improve both low-level and high-level computer vision algorithms, but the lack of large-scale RAW image datasets is a barrier to research. Thus, reversed Image Signal Processing (ISP) which converts existing RGB images into RAW images has been studied. However, most existing methods require camera-specific metadata or paired RGB and RAW images to model the conversion, and they are not always available. In addition, there are issues in handling diverse ISPs and recovering global illumination. To tackle these limitations, we propose a self-supervised reversed ISP method that does not require metadata and paired images. The proposed method converts a RGB image into a RAW-like image taken in the same environment with the same sensor as a reference RAW image by dynamically selecting parameters of the reversed ISP pipeline based on the reference RAW image. The parameter selection is trained via pseudo paired data created from unpaired RGB and RAW images. We show that the proposed method is able to learn various reversed ISPs with comparable accuracy to other state-of-the-art supervised methods and convert unknown RGB images from COCO and Flickr1M to target RAW-like images more accurately in terms of pixel distribution. We also demonstrate that our generated RAW images improve performance on real RAW image object detection task. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: 19 pages, 12 figures

arXiv:2303.12293 [pdf]

Designing the Metaverse: A Sco** Review to Map Current Research Effort on Ethical Implications

Authors: Matteo Zallio, Takumi Ohashi, P. John Clarkson

Abstract: The metaverse and digital, virtual environments have been part of recent history as places in which people can socialize, work and spend time playing games. However, the infancy of the development of these digital, virtual environments brings some challenges that are still not fully depicted. With this article, we seek to identify and map the currently available knowledge and scientific effort to… ▽ More The metaverse and digital, virtual environments have been part of recent history as places in which people can socialize, work and spend time playing games. However, the infancy of the development of these digital, virtual environments brings some challenges that are still not fully depicted. With this article, we seek to identify and map the currently available knowledge and scientific effort to discover what principles, guidelines, laws, policies, and practices are currently in place to allow for the design of digital, virtual environments, and the metaverse. Through a sco** review, we aimed to systematically survey the existing literature and discern gaps in knowledge within the domain of metaverse research from sociological, anthropological, cultural, and experiential perspectives. The objective of this review was twofold: (1) to examine the focus of the literature studying the metaverse from various angles and (2) to formulate a research agenda for the design and development of ethical digital, virtual environments. With this paper, we identified several works and articles detailing experiments and research on the design of digital, virtual environments and metaverses. We found an increased number of publications in the year 2022. This finding, together with the fact that only a few articles were focused on the domain of ethics, culture and society shows that there is still a vast amount of work to be done to create awareness, principles and policies that could help to design safe, secure and inclusive digital, virtual environments and metaverses. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 9 pages, 2 figures

arXiv:2211.05654 [pdf, other]

Efficient Joint Detection and Multiple Object Tracking with Spatially Aware Transformer

Authors: Siddharth Sagar Nijhawan, Leo Hoshikawa, Atsushi Irie, Masakazu Yoshimura, Junji Otsuka, Takeshi Ohashi

Abstract: We propose a light-weight and highly efficient Joint Detection and Tracking pipeline for the task of Multi-Object Tracking using a fully-transformer architecture. It is a modified version of TransTrack, which overcomes the computational bottleneck associated with its design, and at the same time, achieves state-of-the-art MOTA score of 73.20%. The model design is driven by a transformer based back… ▽ More We propose a light-weight and highly efficient Joint Detection and Tracking pipeline for the task of Multi-Object Tracking using a fully-transformer architecture. It is a modified version of TransTrack, which overcomes the computational bottleneck associated with its design, and at the same time, achieves state-of-the-art MOTA score of 73.20%. The model design is driven by a transformer based backbone instead of CNN, which is highly scalable with the input resolution. We also propose a drop-in replacement for Feed Forward Network of transformer encoder layer, by using Butterfly Transform Operation to perform channel fusion and depth-wise convolution to learn spatial context within the feature maps, otherwise missing within the attention maps of the transformer. As a result of our modifications, we reduce the overall model size of TransTrack by 58.73% and the complexity by 78.72%. Therefore, we expect our design to provide novel perspectives for architecture optimization in future research related to multi-object tracking. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2211.01146 [pdf, other]

DynamicISP: Dynamically Controlled Image Signal Processor for Image Recognition

Authors: Masakazu Yoshimura, Junji Otsuka, Atsushi Irie, Takeshi Ohashi

Abstract: Image Signal Processors (ISPs) play important roles in image recognition tasks as well as in the perceptual quality of captured images. In most cases, experts make a lot of effort to manually tune many parameters of ISPs, but the parameters are sub-optimal. In the literature, two types of techniques have been actively studied: a machine learning-based parameter tuning technique and a DNN-based ISP… ▽ More Image Signal Processors (ISPs) play important roles in image recognition tasks as well as in the perceptual quality of captured images. In most cases, experts make a lot of effort to manually tune many parameters of ISPs, but the parameters are sub-optimal. In the literature, two types of techniques have been actively studied: a machine learning-based parameter tuning technique and a DNN-based ISP technique. The former is lightweight but lacks expressive power. The latter has expressive power, but the computational cost is too heavy on edge devices. To solve these problems, we propose "DynamicISP," which consists of multiple classical ISP functions and dynamically controls the parameters of each frame according to the recognition result of the previous frame. We show our method successfully controls the parameters of multiple ISP functions and achieves state-of-the-art accuracy with low computational cost in single and multi-category object detection tasks. △ Less

Submitted 27 August, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted to ICCV2023. Several updates from v2 including additional experiments and modification of typos in Auto Gain equation

arXiv:2210.16046 [pdf, other]

Rawgment: Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety of Environments

Authors: Masakazu Yoshimura, Junji Otsuka, Atsushi Irie, Takeshi Ohashi

Abstract: Image recognition models that work in challenging environments (e.g., extremely dark, blurry, or high dynamic range conditions) must be useful. However, creating training datasets for such environments is expensive and hard due to the difficulties of data collection and annotation. It is desirable if we could get a robust model without the need for hard-to-obtain datasets. One simple approach is t… ▽ More Image recognition models that work in challenging environments (e.g., extremely dark, blurry, or high dynamic range conditions) must be useful. However, creating training datasets for such environments is expensive and hard due to the difficulties of data collection and annotation. It is desirable if we could get a robust model without the need for hard-to-obtain datasets. One simple approach is to apply data augmentation such as color jitter and blur to standard RGB (sRGB) images in simple scenes. Unfortunately, this approach struggles to yield realistic images in terms of pixel intensity and noise distribution due to not considering the non-linearity of Image Signal Processors (ISPs) and noise characteristics of image sensors. Instead, we propose a noise-accounted RAW image augmentation method. In essence, color jitter and blur augmentation are applied to a RAW image before applying non-linear ISP, resulting in realistic intensity. Furthermore, we introduce a noise amount alignment method that calibrates the domain gap in the noise property caused by the augmentation. We show that our proposed noise-accounted RAW augmentation method doubles the image recognition accuracy in challenging environments only with simple training data. △ Less

Submitted 27 March, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: Accepted to CVPR2023

arXiv:2201.07152 [pdf]

The Evolution of Assistive Technology: A Literature Review of Technology Developments and Applications

Authors: Matteo Zallio, Takumi Ohashi

Abstract: The term Assistive Technology has evolved over the years and identifies equipment or product systems, whether acquired, modified, or customized, that are used to increase, maintain, or improve functional capabilities of individuals with disabilities. Considering the advances that have been made, what trends can be identified to provide evidence of the evolution of AT as devices that foster accessi… ▽ More The term Assistive Technology has evolved over the years and identifies equipment or product systems, whether acquired, modified, or customized, that are used to increase, maintain, or improve functional capabilities of individuals with disabilities. Considering the advances that have been made, what trends can be identified to provide evidence of the evolution of AT as devices that foster accessibility and empower users with different abilities? Through a systematic literature review we identify research items that offer evidence of the evolution of the meaning, purpose, and applications of AT throughout the history. This paper provides evidence that AT evolved from products to improve functional capabilities of individuals with disabilities toward enabling technologies that facilitate tasks for people with different needs, abilities, gender, age, and culture. This evolution will lead to a positive demystification of the meaning and applications of AT toward broad usage acceptance among mainstream users. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: 9 pages, 4 figures

arXiv:2103.12993 [pdf, other]

Analysis of QoS in Heterogeneous Networks with Clustered Deployment and Caching Aware Capacity Allocation

Authors: Takehiro Ohashi

Abstract: In cellular networks, the densification of connected devices and base stations engender the ever-growing traffic intensity, and caching popular contents with smart management is a promising way to alleviate such consequences. Our research extends the previously proposed analysis of three-tier cache enabled Heterogeneous Networks (HetNets). The main contributions are threefold. We consider the more… ▽ More In cellular networks, the densification of connected devices and base stations engender the ever-growing traffic intensity, and caching popular contents with smart management is a promising way to alleviate such consequences. Our research extends the previously proposed analysis of three-tier cache enabled Heterogeneous Networks (HetNets). The main contributions are threefold. We consider the more realistic assumption; that is, the distribution of small base stations is following Poisson-Poisson cluster processes, which reflects the real situations of geographic restriction, user dense areas, and coverage-holes. We propose the allocation of downlink data transmission capacity according to the cases of requested contents which are either cached or non-cached in nearby nodes and elucidate the traffic efficiency of the allocation under the effect of clustered deployment of small base stations. The throughput and delay of the allocation system are derived based on the approximated sojourn time of the Discriminatory Processor Sharing (DPS) queue. We present the results of achievable efficiency and such a system's performance for a better caching solution to the challenges of future cellular networks. △ Less

Submitted 24 March, 2021; originally announced March 2021.

arXiv:2001.05613 [pdf, other]

doi 10.1016/j.imavis.2020.104028

Synergetic Reconstruction from 2D Pose and 3D Motion for Wide-Space Multi-Person Video Motion Capture in the Wild

Authors: Takuya Ohashi, Yosuke Ikegami, Yoshihiko Nakamura

Abstract: Although many studies have investigated markerless motion capture, the technology has not been applied to real sports or concerts. In this paper, we propose a markerless motion capture method with spatiotemporal accuracy and smoothness from multiple cameras in wide-space and multi-person environments. The proposed method predicts each person's 3D pose and determines the bounding box of multi-camer… ▽ More Although many studies have investigated markerless motion capture, the technology has not been applied to real sports or concerts. In this paper, we propose a markerless motion capture method with spatiotemporal accuracy and smoothness from multiple cameras in wide-space and multi-person environments. The proposed method predicts each person's 3D pose and determines the bounding box of multi-camera images small enough. This prediction and spatiotemporal filtering based on human skeletal model enables 3D reconstruction of the person and demonstrates high-accuracy. The accurate 3D reconstruction is then used to predict the bounding box of each camera image in the next frame. This is feedback from the 3D motion to 2D pose, and provides a synergetic effect on the overall performance of video motion capture. We evaluated the proposed method using various datasets and a real sports field. The experimental results demonstrate that the mean per joint position error (MPJPE) is 31.5 mm and the percentage of correct parts (PCP) is 99.5% for five people dynamically moving while satisfying the range of motion (RoM). Video demonstration, datasets, and additional materials are posted on our project page. △ Less

Submitted 14 October, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

Journal ref: Image and Vision Computing, Volume 104, 2020

arXiv:1912.03880 [pdf, other]

Video Motion Capture from the Part Confidence Maps of Multi-Camera Images by Spatiotemporal Filtering Using the Human Skeletal Model

Authors: Takuya Ohashi, Yosuke Ikegami, Kazuki Yamamoto, Wataru Takano, Yoshihiko Nakamura

Abstract: This paper discusses video motion capture, namely, 3D reconstruction of human motion from multi-camera images. After the Part Confidence Maps are computed from each camera image, the proposed spatiotemporal filter is applied to deliver the human motion data with accuracy and smoothness for human motion analysis. The spatiotemporal filter uses the human skeleton and mixes temporal smoothing in two-… ▽ More This paper discusses video motion capture, namely, 3D reconstruction of human motion from multi-camera images. After the Part Confidence Maps are computed from each camera image, the proposed spatiotemporal filter is applied to deliver the human motion data with accuracy and smoothness for human motion analysis. The spatiotemporal filter uses the human skeleton and mixes temporal smoothing in two-time inverse kinematics computations. The experimental results show that the mean per joint position error was 26.1mm for regular motions and 38.8mm for inverted motions. △ Less

Submitted 10 December, 2019; v1 submitted 9 December, 2019; originally announced December 2019.

Comments: International Conference on Intelligent Robots and Systems (IROS), 2018

arXiv:1901.09792 [pdf, other]

Sensorimotor learning for artificial body perception

Authors: German Diez-Valencia, Takuya Ohashi, Pablo Lanillos, Gordon Cheng

Abstract: Artificial self-perception is the machine ability to perceive its own body, i.e., the mastery of modal and intermodal contingencies of performing an action with a specific sensors/actuators body configuration. In other words, the spatio-temporal patterns that relate its sensors (e.g. visual, proprioceptive, tactile, etc.), its actions and its body latent variables are responsible of the distinctio… ▽ More Artificial self-perception is the machine ability to perceive its own body, i.e., the mastery of modal and intermodal contingencies of performing an action with a specific sensors/actuators body configuration. In other words, the spatio-temporal patterns that relate its sensors (e.g. visual, proprioceptive, tactile, etc.), its actions and its body latent variables are responsible of the distinction between its own body and the rest of the world. This paper describes some of the latest approaches for modelling artificial body self-perception: from Bayesian estimation to deep learning. Results show the potential of these free-model unsupervised or semi-supervised crossmodal/intermodal learning approaches. However, there are still challenges that should be overcome before we achieve artificial multisensory body perception. △ Less

Submitted 15 January, 2019; originally announced January 2019.

Comments: Workshop on Crossmodal Learning for Intelligent Robotics. IEEE Int. Conference on Intelligent Robots and Systems (IROS 2018)

Showing 1–15 of 15 results for author: Ohashi, T