-
MagicItem: Dynamic Behavior Design of Virtual Objects with Large Language Models in a Consumer Metaverse Platform
Authors:
Ryutaro Kurai,
Takefumi Hiraki,
Yuichi Hiroi,
Yutaro Hirao,
Monica Perusquia-Hernandez,
Hideaki Uchiyama,
Kiyoshi Kiyokawa
Abstract:
To create rich experiences in virtual reality (VR) environments, it is essential to define the behavior of virtual objects through programming. However, programming in 3D spaces requires a wide range of background knowledge and programming skills. Although Large Language Models (LLMs) have provided programming support, they are still primarily aimed at programmers. In metaverse platforms, where ma…
▽ More
To create rich experiences in virtual reality (VR) environments, it is essential to define the behavior of virtual objects through programming. However, programming in 3D spaces requires a wide range of background knowledge and programming skills. Although Large Language Models (LLMs) have provided programming support, they are still primarily aimed at programmers. In metaverse platforms, where many users inhabit VR spaces, most users are unfamiliar with programming, making it difficult for them to modify the behavior of objects in the VR environment easily. Existing LLM-based script generation methods for VR spaces require multiple lengthy iterations to implement the desired behaviors and are difficult to integrate into the operation of metaverse platforms. To address this issue, we propose a tool that generates behaviors for objects in VR spaces from natural language within Cluster, a metaverse platform with a large user base. By integrating LLMs with the Cluster Script provided by this platform, we enable users with limited programming experience to define object behaviors within the platform freely. We have also integrated our tool into a commercial metaverse platform and are conducting online experiments with 63 general users of the platform. The experiments show that even users with no programming background can successfully generate behaviors for objects in VR spaces, resulting in a highly satisfying system. Our research contributes to democratizing VR content creation by enabling non-programmers to design dynamic behaviors for virtual objects in metaverse platforms.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Learning Multiple Representations with Inconsistency-Guided Detail Regularization for Mask-Guided Matting
Authors:
Weihao Jiang,
Zhaozhi Xie,
Yuxiang Lu,
Longjie Qi,
**gyong Cai,
Hiroyuki Uchiyama,
Bin Chen,
Yue Ding,
Hongtao Lu
Abstract:
Mask-guided matting networks have achieved significant improvements and have shown great potential in practical applications in recent years. However, simply learning matting representation from synthetic and lack-of-real-world-diversity matting data, these approaches tend to overfit low-level details in wrong regions, lack generalization to objects with complex structures and real-world scenes su…
▽ More
Mask-guided matting networks have achieved significant improvements and have shown great potential in practical applications in recent years. However, simply learning matting representation from synthetic and lack-of-real-world-diversity matting data, these approaches tend to overfit low-level details in wrong regions, lack generalization to objects with complex structures and real-world scenes such as shadows, as well as suffer from interference of background lines or textures. To address these challenges, in this paper, we propose a novel auxiliary learning framework for mask-guided matting models, incorporating three auxiliary tasks: semantic segmentation, edge detection, and background line detection besides matting, to learn different and effective representations from different types of data and annotations. Our framework and model introduce the following key aspects: (1) to learn real-world adaptive semantic representation for objects with diverse and complex structures under real-world scenes, we introduce extra semantic segmentation and edge detection tasks on more diverse real-world data with segmentation annotations; (2) to avoid overfitting on low-level details, we propose a module to utilize the inconsistency between learned segmentation and matting representations to regularize detail refinement; (3) we propose a novel background line detection task into our auxiliary learning framework, to suppress interference of background lines or textures. In addition, we propose a high-quality matting benchmark, Plant-Mat, to evaluate matting methods on complex structures. Extensively quantitative and qualitative results show that our approach outperforms state-of-the-art mask-guided methods.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Ensemble Learning to Assess Dynamics of Affective Experience Ratings and Physiological Change
Authors:
Felix Dollack,
Kiyoshi Kiyokawa,
Huakun Liu,
Monica Perusquia-Hernandez,
Chirag Raman,
Hideaki Uchiyama,
Xin Wei
Abstract:
The congruence between affective experiences and physiological changes has been a debated topic for centuries. Recent technological advances in measurement and data analysis provide hope to solve this epic challenge. Open science and open data practices, together with data analysis challenges open to the academic community, are also promising tools for solving this problem. In this entry to the Em…
▽ More
The congruence between affective experiences and physiological changes has been a debated topic for centuries. Recent technological advances in measurement and data analysis provide hope to solve this epic challenge. Open science and open data practices, together with data analysis challenges open to the academic community, are also promising tools for solving this problem. In this entry to the Emotion Physiology and Experience Collaboration (EPiC) challenge, we propose a data analysis solution that combines theoretical assumptions with data-driven methodologies. We used feature engineering and ensemble selection. Each predictor was trained on subsets of the training data that would maximize the information available for training. Late fusion was used with an averaging step. We chose to average considering a ``wisdom of crowds'' strategy. This strategy yielded an overall RMSE of 1.19 in the test set. Future work should carefully explore if our assumptions are correct and the potential of weighted fusion.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Toward unlabeled multi-view 3D pedestrian detection by generalizable AI: techniques and performance analysis
Authors:
João Paulo Lima,
Diego Thomas,
Hideaki Uchiyama,
Veronica Teichrieb
Abstract:
We unveil how generalizable AI can be used to improve multi-view 3D pedestrian detection in unlabeled target scenes. One way to increase generalization to new scenes is to automatically label target data, which can then be used for training a detector model. In this context, we investigate two approaches for automatically labeling target data: pseudo-labeling using a supervised detector and automa…
▽ More
We unveil how generalizable AI can be used to improve multi-view 3D pedestrian detection in unlabeled target scenes. One way to increase generalization to new scenes is to automatically label target data, which can then be used for training a detector model. In this context, we investigate two approaches for automatically labeling target data: pseudo-labeling using a supervised detector and automatic labeling using an untrained detector (that can be applied out of the box without any training). We adopt a training framework for optimizing detector models using automatic labeling procedures. This framework encompasses different training sets/modes and multi-round automatic labeling strategies. We conduct our analyses on the publicly-available WILDTRACK and MultiviewX datasets. We show that, by using the automatic labeling approach based on an untrained detector, we can obtain superior results than directly using the untrained detector or a detector trained with an existing labeled source dataset. It achieved a MODA about 4% and 1% better than the best existing unlabeled method when using WILDTRACK and MultiviewX as target datasets, respectively.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Smart Dimming Sunglasses for Photophobia Using Spatial Light Modulator
Authors:
Xiaodan Hu,
Yan Zhang,
Hideaki Uchiyama,
Naoya Isoyama,
Nobuchika Sakata,
Kiyoshi Kiyokawa
Abstract:
We present a smart sunglasses system engineered to assist individuals experiencing photophobia, particularly those highly sensitive to light intensity. The system integrates a high dynamic range (HDR) camera and a liquid crystal spatial light modulator (SLM) to dynamically regulate light, adapting to environmental scenes by modifying pixel transmittance through a specialized control algorithm, the…
▽ More
We present a smart sunglasses system engineered to assist individuals experiencing photophobia, particularly those highly sensitive to light intensity. The system integrates a high dynamic range (HDR) camera and a liquid crystal spatial light modulator (SLM) to dynamically regulate light, adapting to environmental scenes by modifying pixel transmittance through a specialized control algorithm, thereby offering adaptable light management to meet the users' visual needs. Nonetheless, a conventional occlusion mask on the SLM, intended to block incoming light, emerges blurred and insufficient due to a misaligned focal plane. To address the challenge of imprecise light filtering, we introduce an optimization algorithm that meticulously adjusts the light attenuation process, effectively diminishing excessive brightness in targeted areas without adversely impacting regions with acceptable levels of luminance.
△ Less
Submitted 10 October, 2023; v1 submitted 14 April, 2023;
originally announced April 2023.
-
Scheduling Space Expander: An Extension of Concurrency Control for Data Ingestion Queries
Authors:
Sho Nakazono,
Hiroyuki Uchiyama,
Yasuhiro Fujiwara,
Hideyuki Kawashima
Abstract:
With the continuing advances of sensing devices and IoT/Telecom applications, database systems need to process data ingestion queries that update the sensor data frequently. However, as the rate of data ingestion queries increases, existing protocols have exhibited degraded performance since concurrent updates need to acquire lock to update the latest versions. To reduce the load on system on data…
▽ More
With the continuing advances of sensing devices and IoT/Telecom applications, database systems need to process data ingestion queries that update the sensor data frequently. However, as the rate of data ingestion queries increases, existing protocols have exhibited degraded performance since concurrent updates need to acquire lock to update the latest versions. To reduce the load on system on data ingestion queries, we focus on the theory of version order; we can test that a write is an old and unnecessary version by using version order of data items. In this paper, we propose a novel protocol extension method, scheduling space expander (SSE). SSE adds another control flow to conventional protocols to omit updates on data ingestion queries. It generates an erasing version order, which assumes that a transaction processes outdated unnecessary versions. SSE also tests the correctness of this version order efficiently and independently from conventional protocols. In addition, we present an optimization of SSE called epoch-based SSE (ESSE), which tests and maintains an erasing version order more efficiently than SSE. We extend two state-of-the-art 1VCC and MVCC protocols, Silo and MVTO with ESSE. Experimental results demonstrate that extensions of Silo and MVTO improve 2.7x and 2.5x performance on the TATP benchmark on a 144-core machine, and the extensions achieved performance comparable to that of the original protocol for the TPC-C benchmark.
△ Less
Submitted 25 January, 2023;
originally announced January 2023.
-
MOTSLAM: MOT-assisted monocular dynamic SLAM using single-view depth estimation
Authors:
Hanwei Zhang,
Hideaki Uchiyama,
Shintaro Ono,
Hiroshi Kawasaki
Abstract:
Visual SLAM systems targeting static scenes have been developed with satisfactory accuracy and robustness. Dynamic 3D object tracking has then become a significant capability in visual SLAM with the requirement of understanding dynamic surroundings in various scenarios including autonomous driving, augmented and virtual reality. However, performing dynamic SLAM solely with monocular images remains…
▽ More
Visual SLAM systems targeting static scenes have been developed with satisfactory accuracy and robustness. Dynamic 3D object tracking has then become a significant capability in visual SLAM with the requirement of understanding dynamic surroundings in various scenarios including autonomous driving, augmented and virtual reality. However, performing dynamic SLAM solely with monocular images remains a challenging problem due to the difficulty of associating dynamic features and estimating their positions. In this paper, we present MOTSLAM, a dynamic visual SLAM system with the monocular configuration that tracks both poses and bounding boxes of dynamic objects. MOTSLAM first performs multiple object tracking (MOT) with associated both 2D and 3D bounding box detection to create initial 3D objects. Then, neural-network-based monocular depth estimation is applied to fetch the depth of dynamic features. Finally, camera poses, object poses, and both static, as well as dynamic map points, are jointly optimized using a novel bundle adjustment. Our experiments on the KITTI dataset demonstrate that our system has reached best performance on both camera ego-motion and object tracking on monocular dynamic SLAM.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
TetraTSDF: 3D human reconstruction from a single image with a tetrahedral outer shell
Authors:
Hayato Onizuka,
Zehra Hayirci,
Diego Thomas,
Akihiro Sugimoto,
Hideaki Uchiyama,
Rin-ichiro Taniguchi
Abstract:
Recovering the 3D shape of a person from its 2D appearance is ill-posed due to ambiguities. Nevertheless, with the help of convolutional neural networks (CNN) and prior knowledge on the 3D human body, it is possible to overcome such ambiguities to recover detailed 3D shapes of human bodies from single images. Current solutions, however, fail to reconstruct all the details of a person wearing loose…
▽ More
Recovering the 3D shape of a person from its 2D appearance is ill-posed due to ambiguities. Nevertheless, with the help of convolutional neural networks (CNN) and prior knowledge on the 3D human body, it is possible to overcome such ambiguities to recover detailed 3D shapes of human bodies from single images. Current solutions, however, fail to reconstruct all the details of a person wearing loose clothes. This is because of either (a) huge memory requirement that cannot be maintained even on modern GPUs or (b) the compact 3D representation that cannot encode all the details. In this paper, we propose the tetrahedral outer shell volumetric truncated signed distance function (TetraTSDF) model for the human body, and its corresponding part connection network (PCN) for 3D human body shape regression. Our proposed model is compact, dense, accurate, and yet well suited for CNN-based regression task. Our proposed PCN allows us to learn the distribution of the TSDF in the tetrahedral volume from a single image in an end-to-end manner. Results show that our proposed method allows to reconstruct detailed shapes of humans wearing loose clothes from single RGB images.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
NWR: Rethinking Thomas Write Rule for Omittable Write Operations
Authors:
Sho Nakazono,
Hiroyuki Uchiyama,
Yasuhiro Fujiwara,
Yasuhiro Nakamura,
Hideyuki Kawashima
Abstract:
Concurrency control protocols are the key to scaling current DBMS performances. They efficiently interleave read and write operations in transactions, but occasionally they restrict concurrency by using coordination such as exclusive lockings. Although exclusive lockings ensure the correctness of DBMS, it incurs serious performance penalties on multi-core environments. In particular, existing prot…
▽ More
Concurrency control protocols are the key to scaling current DBMS performances. They efficiently interleave read and write operations in transactions, but occasionally they restrict concurrency by using coordination such as exclusive lockings. Although exclusive lockings ensure the correctness of DBMS, it incurs serious performance penalties on multi-core environments. In particular, existing protocols generally suffer from emerging highly write contended workloads, since they use innumerable lockings for write operations. In this paper, we rethink the Thomas write rule (TWR), which allows the timestamp ordering (T/O) protocol to omit write operations without any lockings. We formalize the notion of omitting and decouple it from the T/O protocol implementation, in order to define a new rule named non-visible write rule (NWR). When the rules of NWR are satisfied, any protocol can in theory generate omittable write operations with preserving the correctness without any lockings. In the experiments, we implement three NWR-extended protocols: Silo+NWR, TicToc+NWR, and MVTO+NWR. Experimental results demonstrate the efficiency and the low-overhead property of the extended protocols. We confirm that NWR-extended protocols achieve more than 11x faster than the originals in the best case of highly write contended YCSB-A and comparable performance with the originals in the other workloads.
△ Less
Submitted 6 March, 2020; v1 submitted 17 April, 2019;
originally announced April 2019.