Search | arXiv e-print repository

Neural Radiance Fields for Novel View Synthesis in Monocular Gastroscopy

Authors: Zijie Jiang, Yusuke Monno, Masatoshi Okutomi, Sho Suzuki, Kenji Miki

Abstract: Enabling the synthesis of arbitrarily novel viewpoint images within a patient's stomach from pre-captured monocular gastroscopic images is a promising topic in stomach diagnosis. Typical methods to achieve this objective integrate traditional 3D reconstruction techniques, including structure-from-motion (SfM) and Poisson surface reconstruction. These methods produce explicit 3D representations, su… ▽ More Enabling the synthesis of arbitrarily novel viewpoint images within a patient's stomach from pre-captured monocular gastroscopic images is a promising topic in stomach diagnosis. Typical methods to achieve this objective integrate traditional 3D reconstruction techniques, including structure-from-motion (SfM) and Poisson surface reconstruction. These methods produce explicit 3D representations, such as point clouds and meshes, thereby enabling the rendering of the images from novel viewpoints. However, the existence of low-texture and non-Lambertian regions within the stomach often results in noisy and incomplete reconstructions of point clouds and meshes, hindering the attainment of high-quality image rendering. In this paper, we apply the emerging technique of neural radiance fields (NeRF) to monocular gastroscopic data for synthesizing photo-realistic images for novel viewpoints. To address the performance degradation due to view sparsity in local regions of monocular gastroscopy, we incorporate geometry priors from a pre-reconstructed point cloud into the training of NeRF, which introduces a novel geometry-based loss to both pre-captured observed views and generated unobserved views. Compared to other recent NeRF methods, our approach showcases high-fidelity image renderings from novel viewpoints within the stomach both qualitatively and quantitatively. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted for EMBC 2024

arXiv:2404.10999 [pdf]

Machine-Learning-Enhanced Soft Robotic System Inspired by Rectal Functions for Investigating Fecal incontinence

Authors: Zebing Mao, Sota Suzuki, Hiroyuki Nabae, Shoko Miyagawa, Koichi Suzumori, Shingo Maeda

Abstract: Fecal incontinence, arising from a myriad of pathogenic mechanisms, has attracted considerable global attention. Despite its significance, the replication of the defecatory system for studying fecal incontinence mechanisms remains limited largely due to social stigma and taboos. Inspired by the rectum's functionalities, we have developed a soft robotic system, encompassing a power supply, pressure… ▽ More Fecal incontinence, arising from a myriad of pathogenic mechanisms, has attracted considerable global attention. Despite its significance, the replication of the defecatory system for studying fecal incontinence mechanisms remains limited largely due to social stigma and taboos. Inspired by the rectum's functionalities, we have developed a soft robotic system, encompassing a power supply, pressure sensing, data acquisition systems, a flushing mechanism, a stage, and a rectal module. The innovative soft rectal module includes actuators inspired by sphincter muscles, both soft and rigid covers, and soft rectum mold. The rectal mold, fabricated from materials that closely mimic human rectal tissue, is produced using the mold replication fabrication method. Both the soft and rigid components of the mold are realized through the application of 3D-printing technology. The sphincter muscles-inspired actuators featuring double-layer pouch structures are modeled and optimized based on multilayer perceptron methods aiming to obtain high contractions ratios (100%), high generated pressure (9.8 kPa), and small recovery time (3 s). Upon assembly, this defecation robot is capable of smoothly expelling liquid faeces, performing controlled solid fecal cutting, and defecating extremely solid long faeces, thus closely replicating the human rectum and anal canal's functions. This defecation robot has the potential to assist humans in understanding the complex defecation system and contribute to the development of well-being devices related to defecation. △ Less

Submitted 1 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2403.17423 [pdf, other]

Test-time Adaptation Meets Image Enhancement: Improving Accuracy via Uncertainty-aware Logit Switching

Authors: Shohei Enomoto, Naoya Hasegawa, Kazuki Adachi, Taku Sasaki, Shin'ya Yamaguchi, Satoshi Suzuki, Takeharu Eda

Abstract: Deep neural networks have achieved remarkable success in a variety of computer vision applications. However, there is a problem of degrading accuracy when the data distribution shifts between training and testing. As a solution of this problem, Test-time Adaptation~(TTA) has been well studied because of its practicality. Although TTA methods increase accuracy under distribution shift by updating t… ▽ More Deep neural networks have achieved remarkable success in a variety of computer vision applications. However, there is a problem of degrading accuracy when the data distribution shifts between training and testing. As a solution of this problem, Test-time Adaptation~(TTA) has been well studied because of its practicality. Although TTA methods increase accuracy under distribution shift by updating the model at test time, using high-uncertainty predictions is known to degrade accuracy. Since the input image is the root of the distribution shift, we incorporate a new perspective on enhancing the input image into TTA methods to reduce the prediction's uncertainty. We hypothesize that enhancing the input image reduces prediction's uncertainty and increase the accuracy of TTA methods. On the basis of our hypothesis, we propose a novel method: Test-time Enhancer and Classifier Adaptation~(TECA). In TECA, the classification model is combined with the image enhancement model that transforms input images into recognition-friendly ones, and these models are updated by existing TTA methods. Furthermore, we found that the prediction from the enhanced image does not always have lower uncertainty than the prediction from the original image. Thus, we propose logit switching, which compares the uncertainty measure of these predictions and outputs the lower one. In our experiments, we evaluate TECA with various TTA methods and show that TECA reduces prediction's uncertainty and increases accuracy of TTA methods despite having no hyperparameters and little parameter overhead. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted to IJCNN2024

arXiv:2311.13460 [pdf, other]

Multi-Objective Bayesian Optimization with Active Preference Learning

Authors: Ryota Ozaki, Kazuki Ishikawa, Youhei Kanzaki, Shinya Suzuki, Shion Takeno, Ichiro Takeuchi, Masayuki Karasuyama

Abstract: There are a lot of real-world black-box optimization problems that need to optimize multiple criteria simultaneously. However, in a multi-objective optimization (MOO) problem, identifying the whole Pareto front requires the prohibitive search cost, while in many practical scenarios, the decision maker (DM) only needs a specific solution among the set of the Pareto optimal solutions. We propose a B… ▽ More There are a lot of real-world black-box optimization problems that need to optimize multiple criteria simultaneously. However, in a multi-objective optimization (MOO) problem, identifying the whole Pareto front requires the prohibitive search cost, while in many practical scenarios, the decision maker (DM) only needs a specific solution among the set of the Pareto optimal solutions. We propose a Bayesian optimization (BO) approach to identifying the most preferred solution in the MOO with expensive objective functions, in which a Bayesian preference model of the DM is adaptively estimated by an interactive manner based on the two types of supervisions called the pairwise preference and improvement request. To explore the most preferred solution, we define an acquisition function in which the uncertainty both in the objective functions and the DM preference is incorporated. Further, to minimize the interaction cost with the DM, we also propose an active learning strategy for the preference estimation. We empirically demonstrate the effectiveness of our proposed method through the benchmark function optimization and the hyper-parameter optimization problems for machine learning models. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2308.16454 [pdf, other]

Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff

Authors: Satoshi Suzuki, Shin'ya Yamaguchi, Shoichiro Takeda, Sekitoshi Kanai, Naoki Makishima, Atsushi Ando, Ryo Masumura

Abstract: This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs). Although adversarial training (AT) improves robustness, it degrades the standard accuracy, thus yielding the tradeoff. To mitigate this tradeoff, we propose a novel AT method called ARREST, which comprises three components: (i) adversarial finetu… ▽ More This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs). Although adversarial training (AT) improves robustness, it degrades the standard accuracy, thus yielding the tradeoff. To mitigate this tradeoff, we propose a novel AT method called ARREST, which comprises three components: (i) adversarial finetuning (AFT), (ii) representation-guided knowledge distillation (RGKD), and (iii) noisy replay (NR). AFT trains a DNN on adversarial examples by initializing its parameters with a DNN that is standardly pretrained on clean examples. RGKD and NR respectively entail a regularization term and an algorithm to preserve latent representations of clean examples during AFT. RGKD penalizes the distance between the representations of the standardly pretrained and AFT DNNs. NR switches input adversarial examples to nonadversarial ones when the representation changes significantly during AFT. By combining these components, ARREST achieves both high standard accuracy and robustness. Experimental results demonstrate that ARREST mitigates the tradeoff more effectively than previous AT-based methods do. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: Accepted by International Conference on Computer Vision (ICCV) 2023

arXiv:2306.02273 [pdf, ps, other]

End-to-End Joint Target and Non-Target Speakers ASR

Authors: Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando

Abstract: This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information. However, in conversational ASR applicatio… ▽ More This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information. However, in conversational ASR applications, transcribing both the target speaker's speech and non-target speakers' ones is often required to understand interactive information. To naturally consider both target and non-target speakers in a single ASR model, our idea is to extend autoregressive modeling-based multi-talker ASR systems to utilize the enrollment speech of the target speaker. Our proposed ASR is performed by recursively generating both textual tokens and tokens that represent target or non-target speakers. Our experiments demonstrate the effectiveness of our proposed method. △ Less

Submitted 4 June, 2023; originally announced June 2023.

Comments: Accepted at Interspeech 2023

arXiv:2304.11413 [pdf, other]

Three-dimensional hand guidance by midair haptic display

Authors: Koya Hiura, Shun Suzuki, Tao Morisaki, Masahiro Fujiwara, Yasutoshi Makino, Hiroyuki Shinoda

Abstract: Guiding human movements using tactile information is one of the promising applications of haptics. Using midair ultrasonic haptic stimulation, it is possible to guide a hand without visual information.However, the information of movement shown by conventional methods was partial. It has not been shown a method to guide a hand to an arbitrary point in three dimensional space. In this study, we prop… ▽ More Guiding human movements using tactile information is one of the promising applications of haptics. Using midair ultrasonic haptic stimulation, it is possible to guide a hand without visual information.However, the information of movement shown by conventional methods was partial. It has not been shown a method to guide a hand to an arbitrary point in three dimensional space. In this study, we propose a method of guiding the hand to the top of a virtual cone presented haptically and evaluate the effectiveness of the method through experiments. As a result, the method guided the participant's hand to the goal in a 30 cm cube workspace with an error of 64.34 mm △ Less

Submitted 22 April, 2023; originally announced April 2023.

arXiv:2210.15937 [pdf, other]

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

Authors: Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato

Abstract: This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA). Although the effectiveness of pre-trained encoders in various fields has been reported, conventional MSA methods employ them for only linguistic modality, and their application has not been investigated. This paper compares the features yielded… ▽ More This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA). Although the effectiveness of pre-trained encoders in various fields has been reported, conventional MSA methods employ them for only linguistic modality, and their application has not been investigated. This paper compares the features yielded by large-scale pre-trained encoders with conventional heuristic features. One each of the largest pre-trained encoders publicly available for each modality are used; CLIP-ViT, WavLM, and BERT for visual, acoustic, and linguistic modalities, respectively. Experiments on two datasets reveal that methods with domain-specific pre-trained encoders attain better performance than those with conventional features in both unimodal and multimodal scenarios. We also find it better to use the outputs of the intermediate layers of the encoders than those of the output layer. The codes are available at https://github.com/ando-hub/MSA_Pretrain. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: Accepted to SLT 2022

arXiv:2207.04659 [pdf, other]

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

Authors: Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura

Abstract: In this paper, we investigate the semi-supervised joint training of text to speech (TTS) and automatic speech recognition (ASR), where a small amount of paired data and a large amount of unpaired text data are available. Conventional studies form a cycle called the TTS-ASR pipeline, where the multispeaker TTS model synthesizes speech from text with a reference speech and the ASR model reconstructs… ▽ More In this paper, we investigate the semi-supervised joint training of text to speech (TTS) and automatic speech recognition (ASR), where a small amount of paired data and a large amount of unpaired text data are available. Conventional studies form a cycle called the TTS-ASR pipeline, where the multispeaker TTS model synthesizes speech from text with a reference speech and the ASR model reconstructs the text from the synthesized speech, after which both models are trained with a cycle-consistency loss. However, the synthesized speech does not reflect the speaker characteristics of the reference speech and the synthesized speech becomes overly easy for the ASR model to recognize after training. This not only decreases the TTS model quality but also limits the ASR model improvement. To solve this problem, we propose improving the cycleconsistency-based training with a speaker consistency loss and step-wise optimization. The speaker consistency loss brings the speaker characteristics of the synthesized speech closer to that of the reference speech. In the step-wise optimization, we first freeze the parameter of the TTS model before both models are trained to avoid over-adaptation of the TTS model to the ASR model. Experimental results demonstrate the efficacy of the proposed method. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: Accepted to INTERSPEECH 2022

arXiv:2203.03119 [pdf, other]

Fabchain: Managing Audit-able 3D Print Job over Blockchain

Authors: Ryosuke Abe, Shigeya Suzuki, Kenji Saito, Hiroya Tanaka, Osamu Nakamura, Jun Murai

Abstract: Improvements in fabrication devices such as 3D printers are becoming possible for personal fabrication to freely fabricate any products. To clarify who is liable for the product, the fabricator should keep the fabrication history in an immutable and sustainably accessible manner. In this paper, we propose a new scheme, "Fabchain," that can record the fabrication history in such a manner. By utiliz… ▽ More Improvements in fabrication devices such as 3D printers are becoming possible for personal fabrication to freely fabricate any products. To clarify who is liable for the product, the fabricator should keep the fabrication history in an immutable and sustainably accessible manner. In this paper, we propose a new scheme, "Fabchain," that can record the fabrication history in such a manner. By utilizing a scheme that employs a blockchain as an audit-able communication channel, Fabchain manages print jobs for the fabricator's 3D printer over the blockchain, while maintaining a history of a print job. We implemented Fabchain on Ethereum and evaluated the performance for recording a print job. Our results demonstrate that Fabchain can complete communication of a print job sequence in less than 1 minute on the Ethereum test network. We conclude that Fabchain can manage a print job in a reasonable duration for 3D printing, while satisfying the requirements for immutability and sustainability. △ Less

Submitted 6 March, 2022; originally announced March 2022.

arXiv:2112.07093 [pdf, other]

doi 10.1109/QCE53715.2022.00056

QuISP: a Quantum Internet Simulation Package

Authors: Ryosuke Satoh, Michal Hajdušek, Naphan Benchasattabuse, Shota Nagayama, Kentaro Teramoto, Takaaki Matsuo, Sara Ayman Metwalli, Takahiko Satoh, Shigeya Suzuki, Rodney Van Meter

Abstract: We present an event-driven simulation package called QuISP for large-scale quantum networks built on top of the OMNeT++ discrete event simulation framework. Although the behavior of quantum networking devices have been revealed by recent research, it is still an open question how they will work in networks of a practical size. QuISP is designed to simulate large-scale quantum networks to investiga… ▽ More We present an event-driven simulation package called QuISP for large-scale quantum networks built on top of the OMNeT++ discrete event simulation framework. Although the behavior of quantum networking devices have been revealed by recent research, it is still an open question how they will work in networks of a practical size. QuISP is designed to simulate large-scale quantum networks to investigate their behavior under realistic, noisy and heterogeneous configurations. The protocol architecture we propose enables studies of different choices for error management and other key decisions. Our confidence in the simulator is supported by comparing its output to analytic results for a small network. A key reason for simulation is to look for emergent behavior when large numbers of individually characterized devices are combined. QuISP can handle thousands of qubits in dozens of nodes on a laptop computer, preparing for full Quantum Internet simulation. This simulator promotes the development of protocols for larger and more complex quantum networks. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: 17 pages, 12 figures

Journal ref: 2022 IEEE International Conference on Quantum Computing and Engineering (QCE), pp 353-364 (2022)

arXiv:2112.07092 [pdf, other]

doi 10.1109/QCE53715.2022.00055

A Quantum Internet Architecture

Authors: Rodney Van Meter, Ryosuke Satoh, Naphan Benchasattabuse, Takaaki Matsuo, Michal Hajdušek, Takahiko Satoh, Shota Nagayama, Shigeya Suzuki

Abstract: Entangled quantum communication is advancing rapidly, with laboratory and metropolitan testbeds under development, but to date there is no unifying Quantum Internet architecture. We propose a Quantum Internet architecture centered around the Quantum Recursive Network Architecture (QRNA), using RuleSet-based connections established using a two-pass connection setup. Scalability and internetworking… ▽ More Entangled quantum communication is advancing rapidly, with laboratory and metropolitan testbeds under development, but to date there is no unifying Quantum Internet architecture. We propose a Quantum Internet architecture centered around the Quantum Recursive Network Architecture (QRNA), using RuleSet-based connections established using a two-pass connection setup. Scalability and internetworking (for both technological and administrative boundaries) are achieved using recursion in naming and connection control. In the near term, this architecture will support end-to-end, two-party entanglement on minimal hardware, and it will extend smoothly to multi-party entanglement and the use of quantum error correction on advanced hardware in the future. For a network internal gateway protocol, we recommend (but do not require) qDijkstra with seconds per Bell pair as link cost for routing; the external gateway protocol is designed to build recursively. The strength of our architecture is shown by assessing extensibility and demonstrating how robust protocol operation can be confirmed using the RuleSet paradigm. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: 17 pages, 7 numbered figures

Journal ref: 2022 IEEE International Conference on Quantum Computing and Engineering (QCE), pp. 341-352 (2022)

arXiv:2108.11018 [pdf, other]

A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?

Authors: Hiroaki Mikami, Kenji Fukumizu, Shogo Murai, Shuji Suzuki, Yuta Kikuchi, Taiji Suzuki, Shin-ichi Maeda, Kohei Hayashi

Abstract: Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks. The most significant advantage of using synthetic images is that the ground-truth labels are automatically available, enabling unlimited expansion of the data size without human cost. However, synthetic data may have a huge doma… ▽ More Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks. The most significant advantage of using synthetic images is that the ground-truth labels are automatically available, enabling unlimited expansion of the data size without human cost. However, synthetic data may have a huge domain gap, in which case increasing the data size does not improve the performance. How can we know that? In this study, we derive a simple scaling law that predicts the performance from the amount of pre-training data. By estimating the parameters of the law, we can judge whether we should increase the data or change the setting of image synthesis. Further, we analyze the theory of transfer learning by considering learning dynamics and confirm that the derived generalization bound is consistent with our empirical findings. We empirically validated our scaling law on various experimental settings of benchmark tasks, model sizes, and complexities of synthetic images. △ Less

Submitted 8 October, 2021; v1 submitted 24 August, 2021; originally announced August 2021.

arXiv:2107.13263 [pdf, other]

Learning-Based Depth and Pose Estimation for Monocular Endoscope with Loss Generalization

Authors: Aji Resindra Widya, Yusuke Monno, Masatoshi Okutomi, Sho Suzuki, Takuji Gotoda, Kenji Miki

Abstract: Gastroendoscopy has been a clinical standard for diagnosing and treating conditions that affect a part of a patient's digestive system, such as the stomach. Despite the fact that gastroendoscopy has a lot of advantages for patients, there exist some challenges for practitioners, such as the lack of 3D perception, including the depth and the endoscope pose information. Such challenges make navigati… ▽ More Gastroendoscopy has been a clinical standard for diagnosing and treating conditions that affect a part of a patient's digestive system, such as the stomach. Despite the fact that gastroendoscopy has a lot of advantages for patients, there exist some challenges for practitioners, such as the lack of 3D perception, including the depth and the endoscope pose information. Such challenges make navigating the endoscope and localizing any found lesion in a digestive tract difficult. To tackle these problems, deep learning-based approaches have been proposed to provide monocular gastroendoscopy with additional yet important depth and pose information. In this paper, we propose a novel supervised approach to train depth and pose estimation networks using consecutive endoscopy images to assist the endoscope navigation in the stomach. We firstly generate real depth and pose training data using our previously proposed whole stomach 3D reconstruction pipeline to avoid poor generalization ability between computer-generated (CG) models and real data for the stomach. In addition, we propose a novel generalized photometric loss function to avoid the complicated process of finding proper weights for balancing the depth and the pose loss terms, which is required for existing direct depth and pose supervision approaches. We then experimentally show that our proposed generalized loss performs better than existing direct supervision losses. △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: Accepted for EMBC 2021

arXiv:2008.01523 [pdf, other]

A System for Worldwide COVID-19 Information Aggregation

Authors: Akiko Aizawa, Frederic Bergeron, Junjie Chen, Fei Cheng, Katsuhiko Hayashi, Kentaro Inui, Hiroyoshi Ito, Daisuke Kawahara, Masaru Kitsuregawa, Hirokazu Kiyomaru, Masaki Kobayashi, Takashi Kodama, Sadao Kurohashi, Qianying Liu, Masaki Matsubara, Yusuke Miyao, Atsuyuki Morishima, Yugo Murawaki, Kazumasa Omura, Haiyue Song, Eiichiro Sumita, Shinji Suzuki, Ribeka Tanaka, Yu Tanaka, Masashi Toyoda , et al. (4 additional authors not shown)

Abstract: The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-… ▽ More The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related website dataset collected through crowdsourcing ensures the quality of the articles. A neural machine translation module translates articles in other languages into Japanese and English. A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently by putting articles into different categories. △ Less

Submitted 11 October, 2020; v1 submitted 27 July, 2020; originally announced August 2020.

Comments: Accepted to EMNLP 2020 Workshop NLP-COVID

arXiv:2005.04906 [pdf, other]

doi 10.1145/3375923.3375948

An Inductive Transfer Learning Approach using Cycle-consistent Adversarial Domain Adaptation with Application to Brain Tumor Segmentation

Authors: Yuta Tokuoka, Shuji Suzuki, Yohei Sugawara

Abstract: With recent advances in supervised machine learning for medical image analysis applications, the annotated medical image datasets of various domains are being shared extensively. Given that the annotation labelling requires medical expertise, such labels should be applied to as many learning tasks as possible. However, the multi-modal nature of each annotated image renders it difficult to share th… ▽ More With recent advances in supervised machine learning for medical image analysis applications, the annotated medical image datasets of various domains are being shared extensively. Given that the annotation labelling requires medical expertise, such labels should be applied to as many learning tasks as possible. However, the multi-modal nature of each annotated image renders it difficult to share the annotation label among diverse tasks. In this work, we provide an inductive transfer learning (ITL) approach to adopt the annotation label of the source domain datasets to tasks of the target domain datasets using Cycle-GAN based unsupervised domain adaptation (UDA). To evaluate the applicability of the ITL approach, we adopted the brain tissue annotation label on the source domain dataset of Magnetic Resonance Imaging (MRI) images to the task of brain tumor segmentation on the target domain dataset of MRI. The results confirm that the segmentation accuracy of brain tumor segmentation improved significantly. The proposed ITL approach can make significant contribution to the field of medical image analysis, as we develop a fundamental tool to improve and promote various tasks using medical images. △ Less

Submitted 11 May, 2020; originally announced May 2020.

Journal ref: Proceedings of the 2019 6th International Conference on Biomedical and Bioinformatics Engineering, November 2019, Pages 44-48

arXiv:2004.12288 [pdf, other]

Stomach 3D Reconstruction Based on Virtual Chromoendoscopic Image Generation

Authors: Aji Resindra Widya, Yusuke Monno, Masatoshi Okutomi, Sho Suzuki, Takuji Gotoda, Kenji Miki

Abstract: Gastric endoscopy is a standard clinical process that enables medical practitioners to diagnose various lesions inside a patient's stomach. If any lesion is found, it is very important to perceive the location of the lesion relative to the global view of the stomach. Our previous research showed that this could be addressed by reconstructing the whole stomach shape from chromoendoscopic images usi… ▽ More Gastric endoscopy is a standard clinical process that enables medical practitioners to diagnose various lesions inside a patient's stomach. If any lesion is found, it is very important to perceive the location of the lesion relative to the global view of the stomach. Our previous research showed that this could be addressed by reconstructing the whole stomach shape from chromoendoscopic images using a structure-from-motion (SfM) pipeline, in which indigo carmine (IC) blue dye sprayed images were used to increase feature matches for SfM by enhancing stomach surface's textures. However, spraying the IC dye to the whole stomach requires additional time, labor, and cost, which is not desirable for patients and practitioners. In this paper, we propose an alternative way to achieve whole stomach 3D reconstruction without the need of the IC dye by generating virtual IC-sprayed (VIC) images based on image-to-image style translation trained on unpaired real no-IC and IC-sprayed images. We have specifically investigated the effect of input and output color channel selection for generating the VIC images and found that translating no-IC green-channel images to IC-sprayed red-channel images gives the best SfM reconstruction result. △ Less

Submitted 26 April, 2020; originally announced April 2020.

Comments: Accepted for main conference in EMBC 2020

arXiv:2002.02635 [pdf, other]

Noncontact Thermal and Vibrotactile Display Using Focused Airborne Ultrasound

Authors: Takaaki Kamigaki, Shun Suzuki, Hiroyuki Shinoda

Abstract: In a typical mid-air haptics system, focused airborne ultrasound provides vibrotactile sensations to localized areas on a bare skin. Herein, a method for displaying thermal sensations to hands where mesh fabric gloves are worn is proposed. The gloves employed in this study are commercially available mesh fabric gloves with sound absorption characteristics, such as cotton work gloves without any ad… ▽ More In a typical mid-air haptics system, focused airborne ultrasound provides vibrotactile sensations to localized areas on a bare skin. Herein, a method for displaying thermal sensations to hands where mesh fabric gloves are worn is proposed. The gloves employed in this study are commercially available mesh fabric gloves with sound absorption characteristics, such as cotton work gloves without any additional devices such as Peltier elements. The method proposed in this study can also provide vibrotactile sensations by changing the ultrasonic irradiation pattern. In this paper, we report basic experimental investigations on the proposed method. By performing thermal measurements, we evaluate the local heat generation on the surfaces of both the glove and the skin by focused airborne ultrasound irradiation. In addition, we performed perceptual experiments, thereby confirming that the proposed method produced both thermal and vibrotactile sensations. Furthermore, these sensations were selectively provided to a certain extent by changing the ultrasonic irradiation pattern. These results validate the effectiveness of our method and its feasibility in mid-air haptics applications. △ Less

Submitted 7 February, 2020; originally announced February 2020.

Comments: 6 pages

arXiv:1910.11534 [pdf, other]

Team PFDet's Methods for Open Images Challenge 2019

Authors: Yusuke Niitani, Toru Ogawa, Shuji Suzuki, Takuya Akiba, Tommi Kerola, Kohei Ozaki, Shotaro Sano

Abstract: We present the instance segmentation and the object detection method used by team PFDet for Open Images Challenge 2019. We tackle a massive dataset size, huge class imbalance and federated annotations. Using this method, the team PFDet achieved 3rd and 4th place in the instance segmentation and the object detection track, respectively. We present the instance segmentation and the object detection method used by team PFDet for Open Images Challenge 2019. We tackle a massive dataset size, huge class imbalance and federated annotations. Using this method, the team PFDet achieved 3rd and 4th place in the instance segmentation and the object detection track, respectively. △ Less

Submitted 25 October, 2019; originally announced October 2019.

arXiv:1908.00213 [pdf, other]

Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Authors: Seiya Tokui, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, Shunta Saito, Shuji Suzuki, Kota Uenishi, Brian Vogel, Hiroyuki Yamazaki Vincent

Abstract: Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units… ▽ More Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units with a familiar NumPy-like API through CuPy, supports general and dynamic models in Python through Define-by-Run, and also provides add-on packages for state-of-the-art computer vision models as well as distributed training. △ Less

Submitted 1 August, 2019; originally announced August 2019.

Comments: Accepted for Applied Data Science Track in KDD'19

arXiv:1906.00127 [pdf, other]

Multi-objective Bayesian Optimization using Pareto-frontier Entropy

Authors: Shinya Suzuki, Shion Takeno, Tomoyuki Tamura, Kazuki Shitara, Masayuki Karasuyama

Abstract: This paper studies an entropy-based multi-objective Bayesian optimization (MBO). The entropy search is successful approach to Bayesian optimization. However, for MBO, existing entropy-based methods ignore trade-off among objectives or introduce unreliable approximations. We propose a novel entropy-based MBO called Pareto-frontier entropy search (PFES) by considering the entropy of Pareto-frontier,… ▽ More This paper studies an entropy-based multi-objective Bayesian optimization (MBO). The entropy search is successful approach to Bayesian optimization. However, for MBO, existing entropy-based methods ignore trade-off among objectives or introduce unreliable approximations. We propose a novel entropy-based MBO called Pareto-frontier entropy search (PFES) by considering the entropy of Pareto-frontier, which is an essential notion of the optimality of the multi-objective problem. Our entropy can incorporate the trade-off relation of the optimal values, and further, we derive an analytical formula without introducing additional approximations or simplifications to the standard entropy search setting. We also show that our entropy computation is practically feasible by using a recursive decomposition technique which has been known in studies of the Pareto hyper-volume computation. Besides the usual MBO setting, in which all the objectives are simultaneously observed, we also consider the "decoupled" setting, in which the objective functions can be observed separately. PFES can easily adapt to the decoupled setting by considering the entropy of the marginal density for each output dimension. This approach incorporates dependency among objectives conditioned on Pareto-frontier, which is ignored by the existing method. Our numerical experiments show effectiveness of PFES through several benchmark datasets. △ Less

Submitted 10 February, 2020; v1 submitted 31 May, 2019; originally announced June 2019.

arXiv:1905.12988 [pdf, other]

3D Reconstruction of Whole Stomach from Endoscope Video Using Structure-from-Motion

Authors: Aji Resindra Widya, Yusuke Monno, Kosuke Imahori, Masatoshi Okutomi, Sho Suzuki, Takuji Gotoda, Kenji Miki

Abstract: Gastric endoscopy is a common clinical practice that enables medical doctors to diagnose the stomach inside a body. In order to identify a gastric lesion's location such as early gastric cancer within the stomach, this work addressed to reconstruct the 3D shape of a whole stomach with color texture information generated from a standard monocular endoscope video. Previous works have tried to recons… ▽ More Gastric endoscopy is a common clinical practice that enables medical doctors to diagnose the stomach inside a body. In order to identify a gastric lesion's location such as early gastric cancer within the stomach, this work addressed to reconstruct the 3D shape of a whole stomach with color texture information generated from a standard monocular endoscope video. Previous works have tried to reconstruct the 3D structures of various organs from endoscope images. However, they are mainly focused on a partial surface. In this work, we investigated how to enable structure-from-motion (SfM) to reconstruct the whole shape of a stomach from a standard endoscope video. We specifically investigated the combined effect of chromo-endoscopy and color channel selection on SfM. Our study found that 3D reconstruction of the whole stomach can be achieved by using red channel images captured under chromo-endoscopy by spreading indigo carmine (IC) dye on the stomach surface. △ Less

Submitted 30 May, 2019; originally announced May 2019.

Comments: 5 pages, 4 figures, accepted in EMBC 2019

arXiv:1811.10862 [pdf, other]

Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects

Authors: Yusuke Niitani, Takuya Akiba, Tommi Kerola, Toru Ogawa, Shotaro Sano, Shuji Suzuki

Abstract: Efficient and reliable methods for training of object detectors are in higher demand than ever, and more and more data relevant to the field is becoming available. However, large datasets like Open Images Dataset v4 (OID) are sparsely annotated, and some measure must be taken in order to ensure the training of a reliable detector. In order to take the incompleteness of these datasets into account,… ▽ More Efficient and reliable methods for training of object detectors are in higher demand than ever, and more and more data relevant to the field is becoming available. However, large datasets like Open Images Dataset v4 (OID) are sparsely annotated, and some measure must be taken in order to ensure the training of a reliable detector. In order to take the incompleteness of these datasets into account, one possibility is to use pretrained models to detect the presence of the unverified objects. However, the performance of such a strategy depends largely on the power of the pretrained model. In this study, we propose part-aware sampling, a method that uses human intuition for the hierarchical relation between objects. In terse terms, our method works by making assumptions like "a bounding box for a car should contain a bounding box for a tire". We demonstrate the power of our method on OID and compare the performance against a method based on a pretrained model. Our method also won the first and second place on the public and private test sets of the Google AI Open Images Competition 2018. △ Less

Submitted 21 April, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

Comments: CVPR2019 oral

arXiv:1809.00778 [pdf, other]

PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track

Authors: Takuya Akiba, Tommi Kerola, Yusuke Niitani, Toru Ogawa, Shotaro Sano, Shuji Suzuki

Abstract: We present a large-scale object detection system by team PFDet. Our system enables training with huge datasets using 512 GPUs, handles sparsely verified classes, and massive class imbalance. Using our method, we achieved 2nd place in the Google AI Open Images Object Detection Track 2018 on Kaggle. We present a large-scale object detection system by team PFDet. Our system enables training with huge datasets using 512 GPUs, handles sparsely verified classes, and massive class imbalance. Using our method, we achieved 2nd place in the Google AI Open Images Object Detection Track 2018 on Kaggle. △ Less

Submitted 3 September, 2018; originally announced September 2018.

Comments: Technical report for Open Images Challenge 2018 Object Detection Track

arXiv:1801.00464 [pdf]

Comparative Analysis of Human Movement Prediction: Space Syntax and Inverse Reinforcement Learning

Authors: Soma Suzuki

Abstract: Space syntax matrix has been the main approach for human movement prediction in the urban environment. An alternative, relatively new methodology is an agent-based pedestrian model constructed using machine learning techniques. Even though both approaches have been studied intensively, the quantitative comparison between them has not been conducted. In this paper, comparative analysis of space syn… ▽ More Space syntax matrix has been the main approach for human movement prediction in the urban environment. An alternative, relatively new methodology is an agent-based pedestrian model constructed using machine learning techniques. Even though both approaches have been studied intensively, the quantitative comparison between them has not been conducted. In this paper, comparative analysis of space syntax metrics and maximum entropy inverse reinforcement learning (MEIRL) is performed. The experimental result on trajectory data of artificially generated pedestrian agents shows that MEIRL outperforms space syntax matrix. The possibilities for combining two methods are drawn out as conclusions, and the relative challenges with the data collection are highlighted. △ Less

Submitted 25 January, 2018; v1 submitted 1 January, 2018; originally announced January 2018.

arXiv:1712.07887 [pdf]

Multiagent-based Participatory Urban Simulation through Inverse Reinforcement Learning

Authors: Soma Suzuki

Abstract: The multiagent-based participatory simulation features prominently in urban planning as the acquired model is considered as the hybrid system of the domain and the local knowledge. However, the key problem of generating realistic agents for particular social phenomena invariably remains. The existing models have attempted to dictate the factors involving human behavior, which appeared to be intrac… ▽ More The multiagent-based participatory simulation features prominently in urban planning as the acquired model is considered as the hybrid system of the domain and the local knowledge. However, the key problem of generating realistic agents for particular social phenomena invariably remains. The existing models have attempted to dictate the factors involving human behavior, which appeared to be intractable. In this paper, Inverse Reinforcement Learning (IRL) is introduced to address this problem. IRL is developed for computational modeling of human behavior and has achieved great successes in robotics, psychology and machine learning. The possibilities presented by this new style of modeling are drawn out as conclusions, and the relative challenges with this modeling are highlighted. △ Less

Submitted 21 December, 2017; originally announced December 2017.

arXiv:1711.04325 [pdf, other]

Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

Authors: Takuya Akiba, Shuji Suzuki, Keisuke Fukuda

Abstract: We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also desc… ▽ More We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also describes the details of the hardware and software of the system used to achieve the above performance. △ Less

Submitted 12 November, 2017; originally announced November 2017.

Comments: NIPS'17 Workshop: Deep Learning at Supercomputer Scale

arXiv:1710.11351 [pdf, other]

ChainerMN: Scalable Distributed Deep Learning Framework

Authors: Takuya Akiba, Keisuke Fukuda, Shuji Suzuki

Abstract: One of the keys for deep learning to have made a breakthrough in various fields was to utilize high computing powers centering around GPUs. Enabling the use of further computing abilities by distributed processing is essential not only to make the deep learning bigger and faster but also to tackle unsolved challenges. We present the design, implementation, and evaluation of ChainerMN, the distribu… ▽ More One of the keys for deep learning to have made a breakthrough in various fields was to utilize high computing powers centering around GPUs. Enabling the use of further computing abilities by distributed processing is essential not only to make the deep learning bigger and faster but also to tackle unsolved challenges. We present the design, implementation, and evaluation of ChainerMN, the distributed deep learning framework we have developed. We demonstrate that ChainerMN can scale the learning process of the ResNet-50 model to the ImageNet dataset up to 128 GPUs with the parallel efficiency of 90%. △ Less

Submitted 31 October, 2017; originally announced October 2017.

arXiv:1109.4357 [pdf, ps, other]

Argument filterings and usable rules in higher-order rewrite systems

Authors: Sho Suzuki, Keiichirou Kusakari, Frédéric Blanqui

Abstract: The static dependency pair method is a method for proving the termination of higher-order rewrite systems a la Nipkow. It combines the dependency pair method introduced for first-order rewrite systems with the notion of strong computability introduced for typed lambda-calculi. Argument filterings and usable rules are two important methods of the dependency pair framework used by current state-of-t… ▽ More The static dependency pair method is a method for proving the termination of higher-order rewrite systems a la Nipkow. It combines the dependency pair method introduced for first-order rewrite systems with the notion of strong computability introduced for typed lambda-calculi. Argument filterings and usable rules are two important methods of the dependency pair framework used by current state-of-the-art first-order automated termination provers. In this paper, we extend the class of higher-order systems on which the static dependency pair method can be applied. Then, we extend argument filterings and usable rules to higher-order rewriting, hence providing the basis for a powerful automated termination prover for higher-order rewrite systems. △ Less

Submitted 20 September, 2011; originally announced September 2011.

Journal ref: IPSJ Transactions on Programming 4, 2 (2011) 1-12

Showing 1–29 of 29 results for author: Suzuki, S