Search | arXiv e-print repository

Cross-Cultural Validation of Partner Models for Voice User Interfaces

Authors: Katie Seaborn, Iona Gessinger, Suzuka Yoshida, Benjamin R. Cowan, Philip R. Doyle

Abstract: Recent research has begun to assess people's perceptions of voice user interfaces (VUIs) as dialogue partners, termed partner models. Current self-report measures are only available in English, limiting research to English-speaking users. To improve the diversity of user samples and contexts that inform partner modelling research, we translated, localized, and evaluated the Partner Modelling Quest… ▽ More Recent research has begun to assess people's perceptions of voice user interfaces (VUIs) as dialogue partners, termed partner models. Current self-report measures are only available in English, limiting research to English-speaking users. To improve the diversity of user samples and contexts that inform partner modelling research, we translated, localized, and evaluated the Partner Modelling Questionnaire (PMQ) for non-English speaking Western (German, n=185) and East Asian (Japanese, n=198) cohorts where VUI use is popular. Through confirmatory factor analysis (CFA), we find that the scale produces equivalent levels of goodness-to-fit for both our German and Japanese translations, confirming its cross-cultural validity. Still, the structure of the communicative flexibility factor did not replicate directly across Western and East Asian cohorts. We discuss how our translations can open up critical research on cultural similarities and differences in partner model use and design, whilst highlighting the challenges for ensuring accurate translation across cultural contexts. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Accepted at ACM CUI '24

arXiv:2405.08831 [pdf, other]

doi 10.1145/3613905.3651099

Deceptive, Disruptive, No Big Deal: Japanese People React to Simulated Dark Commercial Patterns

Authors: Katie Seaborn, Tatsuya Itagaki, Mizuki Watanabe, Yijia Wang, ** Geng, Takao Fujii, Yuto Mandai, Miu Kojima, Suzuka Yoshida

Abstract: Dark patterns and deceptive designs (DPs) are user interface elements that trick people into taking actions that benefit the purveyor. Such designs are widely deployed, with special varieties found in certain nations like Japan that can be traced to global power hierarchies and the local socio-linguistic context of use. In this breaking work, we report on the first user study involving Japanese pe… ▽ More Dark patterns and deceptive designs (DPs) are user interface elements that trick people into taking actions that benefit the purveyor. Such designs are widely deployed, with special varieties found in certain nations like Japan that can be traced to global power hierarchies and the local socio-linguistic context of use. In this breaking work, we report on the first user study involving Japanese people (n=30) experiencing a mock shop** website injected with simulated DPs. We found that Alphabet Soup and Misleading Reference Pricing were the most deceptive and least noticeable. Social Proofs, Sneaking in Items, and Untranslation were the least deceptive but Untranslation prevented most from cancelling their account. Mood significantly worsened after experiencing the website. We contribute the first empirical findings on a Japanese consumer base alongside a scalable approach to evaluating user attitudes, perceptions, and behaviours towards DPs in an interactive context. We urge for more human participant research and ideally collaborations with industry to assess real designs in the wild. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Journal ref: CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (2024), Article No.: 95, 1-8

arXiv:2404.16309 [pdf, other]

Robot Swarm Control Based on Smoothed Particle Hydrodynamics for Obstacle-Unaware Navigation

Authors: Michikuni Eguchi, Mai Nishimura, Shigeo Yoshida, Takefumi Hiraki

Abstract: Robot swarms hold immense potential for performing complex tasks far beyond the capabilities of individual robots. However, the challenge in unleashing this potential is the robots' limited sensory capabilities, which hinder their ability to detect and adapt to unknown obstacles in real-time. To overcome this limitation, we introduce a novel robot swarm control method with an indirect obstacle det… ▽ More Robot swarms hold immense potential for performing complex tasks far beyond the capabilities of individual robots. However, the challenge in unleashing this potential is the robots' limited sensory capabilities, which hinder their ability to detect and adapt to unknown obstacles in real-time. To overcome this limitation, we introduce a novel robot swarm control method with an indirect obstacle detector using a smoothed particle hydrodynamics (SPH) model. The indirect obstacle detector can predict the collision with an obstacle and its collision point solely from the robot's velocity information. This approach enables the swarm to effectively and accurately navigate environments without the need for explicit obstacle detection, significantly enhancing their operational robustness and efficiency. Our method's superiority is quantitatively validated through a comparative analysis, showcasing its significant navigation and pattern formation improvements under obstacle-unaware conditions. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2402.15830 [pdf, other]

doi 10.1145/3613904.3642870

Swarm Body: Embodied Swarm Robots

Authors: Sosuke Ichihashi, So Kuroki, Mai Nishimura, Kazumi Kasaura, Takefumi Hiraki, Kazutoshi Tanaka, Shigeo Yoshida

Abstract: The human brain's plasticity allows for the integration of artificial body parts into the human body. Leveraging this, embodied systems realize intuitive interactions with the environment. We introduce a novel concept: embodied swarm robots. Swarm robots constitute a collective of robots working in harmony to achieve a common objective, in our case, serving as functional body parts. Embodied swarm… ▽ More The human brain's plasticity allows for the integration of artificial body parts into the human body. Leveraging this, embodied systems realize intuitive interactions with the environment. We introduce a novel concept: embodied swarm robots. Swarm robots constitute a collective of robots working in harmony to achieve a common objective, in our case, serving as functional body parts. Embodied swarm robots can dynamically alter their shape, density, and the correspondences between body parts and individual robots. We contribute an investigation of the influence on embodiment of swarm robot-specific factors derived from these characteristics, focusing on a hand. Our paper is the first to examine these factors through virtual reality (VR) and real-world robot studies to provide essential design considerations and applications of embodied swarm robots. Through quantitative and qualitative analysis, we identified a system configuration to achieve the embodiment of swarm robots. △ Less

Submitted 29 February, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

arXiv:2312.01707 [pdf, other]

Perceptual Dimensions of Physical Properties of Handheld Objects Induced by Impedance Changes

Authors: Takeru Hashimoto, Shigeo Yoshida, Takuji Narumi

Abstract: Haptics in virtual reality is the emerging dimension after audiovisual experiences. Researchers designed several handheld VR controllers to simulate haptic experiences in virtual reality environments. Some of these devices, equipped to deliver active force, can dynamically alter the timing and intensity of force feedback, potentially offering a wide array of haptic sensations. Past research primar… ▽ More Haptics in virtual reality is the emerging dimension after audiovisual experiences. Researchers designed several handheld VR controllers to simulate haptic experiences in virtual reality environments. Some of these devices, equipped to deliver active force, can dynamically alter the timing and intensity of force feedback, potentially offering a wide array of haptic sensations. Past research primarily used a single index to evaluate how users perceive physical property parameters, potentially limiting the assessment to the designer's intended scope and neglecting other potential perceptual experiences. Therefore, this study evaluates not how much but how humans feel a physical property when stimuli are changed. We conducted interviews to investigate how people feel when a haptic device changes motion impedance. We used thematic analysis to abstract the results of the interviews and gain an understanding of how humans attribute force feedback to a phenomenon. We also generated a vocabulary from the themes obtained from the interviews and asked users to evaluate force feedback using the semantic difference method. A factor analysis was used to investigate how changing the basic elements of motion, such as inertia, viscosity, and stiffness of the motion system, affects haptic perception. As a result, we obtained four critical factors: size, viscosity, weight, and flexibility factor, and clarified the correspondence between these factors and the change of impedance. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2310.10093 [pdf, other]

High-speed full-color computer-generated holography using a digital micromirror device and fiber-coupled RGB laser diode

Authors: Shuhei Yoshida

Abstract: Computer-generated holography (CGH) can be used to display three-dimensional (3D) images and has a special feature that no other technology possesses: it can reconstruct arbitrary object wavefronts. In this study, we investigated a high-speed full-color reconstruction method for improving the realism of 3D images produced using CGH. The proposed method uses a digital micromirror device (DMD) with… ▽ More Computer-generated holography (CGH) can be used to display three-dimensional (3D) images and has a special feature that no other technology possesses: it can reconstruct arbitrary object wavefronts. In this study, we investigated a high-speed full-color reconstruction method for improving the realism of 3D images produced using CGH. The proposed method uses a digital micromirror device (DMD) with a high-speed switching capability as the hologram display device. It produces 3D video by time-division multiplexing using an optical system incorporating fiber-coupled laser diodes (LDs) operating in red, green, and blue wavelengths. The wavelength dispersion of the DMD is compensated for by superimposing plane waves on the hologram. Fourier transform optics are used to separate the object, conjugate, and zeroth-order light, thus eliminating the need for an extensive 4f system. The resources used in this research, such as the programs used for the hologram generation and the schematics of the LD driver, are available on GitHub. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2307.04427 [pdf, other]

doi 10.1126/science.adc9818

Observation of high-energy neutrinos from the Galactic plane

Authors: R. Abbasi, M. Ackermann, J. Adams, J. A. Aguilar, M. Ahlers, M. Ahrens, J. M. Alameddine, A. A. Alves Jr., N. M. Amin, K. Andeen, T. Anderson, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, S. Axani, X. Bai, A. Balagopal V., S. W. Barwick, V. Basu, S. Baur, R. Bay, J. J. Beatty, K. -H. Becker, J. Becker Tjus , et al. (364 additional authors not shown)

Abstract: The origin of high-energy cosmic rays, atomic nuclei that continuously impact Earth's atmosphere, has been a mystery for over a century. Due to deflection in interstellar magnetic fields, cosmic rays from the Milky Way arrive at Earth from random directions. However, near their sources and during propagation, cosmic rays interact with matter and produce high-energy neutrinos. We search for neutrin… ▽ More The origin of high-energy cosmic rays, atomic nuclei that continuously impact Earth's atmosphere, has been a mystery for over a century. Due to deflection in interstellar magnetic fields, cosmic rays from the Milky Way arrive at Earth from random directions. However, near their sources and during propagation, cosmic rays interact with matter and produce high-energy neutrinos. We search for neutrino emission using machine learning techniques applied to ten years of data from the IceCube Neutrino Observatory. We identify neutrino emission from the Galactic plane at the 4.5$σ$ level of significance, by comparing diffuse emission models to a background-only hypothesis. The signal is consistent with modeled diffuse emission from the Galactic plane, but could also arise from a population of unresolved point sources. △ Less

Submitted 10 July, 2023; originally announced July 2023.

Comments: Submitted on May 12th, 2022; Accepted on May 4th, 2023

Journal ref: Science 380, 6652, 1338-1343 (2023)

arXiv:2210.10221 [pdf, other]

doi 10.1109/ICIP46576.2022.9898014

Non-iterative optimization of pseudo-labeling thresholds for training object detection models from multiple datasets

Authors: Yuki Tanaka, Shuhei M. Yoshida, Makoto Terao

Abstract: We propose a non-iterative method to optimize pseudo-labeling thresholds for learning object detection from a collection of low-cost datasets, each of which is annotated for only a subset of all the object classes. A popular approach to this problem is first to train teacher models and then to use their confident predictions as pseudo ground-truth labels when training a student model. To obtain th… ▽ More We propose a non-iterative method to optimize pseudo-labeling thresholds for learning object detection from a collection of low-cost datasets, each of which is annotated for only a subset of all the object classes. A popular approach to this problem is first to train teacher models and then to use their confident predictions as pseudo ground-truth labels when training a student model. To obtain the best result, however, thresholds for prediction confidence must be adjusted. This process typically involves iterative search and repeated training of student models and is time-consuming. Therefore, we develop a method to optimize the thresholds without iterative optimization by maximizing the $F_β$-score on a validation dataset, which measures the quality of pseudo labels and can be measured without training a student model. We experimentally demonstrate that our proposed method achieves an mAP comparable to that of grid search on the COCO and VOC datasets. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: ICIP2022

Journal ref: 2022 IEEE International Conference on Image Processing (ICIP), 2022, pp. 1676-1680

arXiv:2209.03042 [pdf, other]

doi 10.1088/1748-0221/17/11/P11003

Graph Neural Networks for Low-Energy Event Classification & Reconstruction in IceCube

Authors: R. Abbasi, M. Ackermann, J. Adams, N. Aggarwal, J. A. Aguilar, M. Ahlers, M. Ahrens, J. M. Alameddine, A. A. Alves Jr., N. M. Amin, K. Andeen, T. Anderson, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, S. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, V. Basu, R. Bay, J. J. Beatty, K. -H. Becker , et al. (359 additional authors not shown)

Abstract: IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challen… ▽ More IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challenge due to the irregular detector geometry, inhomogeneous scattering and absorption of light in the ice and, below 100 GeV, the relatively low number of signal photons produced per event. To address this challenge, it is possible to represent IceCube events as point cloud graphs and use a Graph Neural Network (GNN) as the classification and reconstruction method. The GNN is capable of distinguishing neutrino events from cosmic-ray backgrounds, classifying different neutrino event types, and reconstructing the deposited energy, direction and interaction vertex. Based on simulation, we provide a comparison in the 1-100 GeV energy range to the current state-of-the-art maximum likelihood techniques used in current IceCube analyses, including the effects of known systematic uncertainties. For neutrino event classification, the GNN increases the signal efficiency by 18% at a fixed false positive rate (FPR), compared to current IceCube methods. Alternatively, the GNN offers a reduction of the FPR by over a factor 8 (to below half a percent) at a fixed signal efficiency. For the reconstruction of energy, direction, and interaction vertex, the resolution improves by an average of 13%-20% compared to current maximum likelihood techniques in the energy range of 1-30 GeV. The GNN, when run on a GPU, is capable of processing IceCube events at a rate nearly double of the median IceCube trigger rate of 2.7 kHz, which opens the possibility of using low energy neutrinos in online searches for transient events. △ Less

Submitted 11 October, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: Prepared for submission to JINST

arXiv:2207.13370 [pdf, other]

MagGlove: A Haptic Glove with Movable Magnetic Force for Manipulation Learning

Authors: Mikiya Kusunoki, Shogo Yoshida, Haoran Xie

Abstract: Recently, haptic gloves have been extensively explored for various practical applications, such as manipulation learning. Previous glove devices have different force-driven systems, such as shape memory alloys, servo motors and pneumatic actuators; however, these proposed devices may have difficulty in fast finger movement, easy reproduction, and safety issues. In this study, we propose MagGlove,… ▽ More Recently, haptic gloves have been extensively explored for various practical applications, such as manipulation learning. Previous glove devices have different force-driven systems, such as shape memory alloys, servo motors and pneumatic actuators; however, these proposed devices may have difficulty in fast finger movement, easy reproduction, and safety issues. In this study, we propose MagGlove, a novel haptic glove with a movable magnet mechanism that has a linear motor, to solve these issues. The proposed MagGlove device is a compact system on the back of the wearer's hand with high responsiveness, ease of use, and good safety. The proposed device is adaptive with the modification of the magnitude of the current flowing through the coil. Based on our evaluation study, it is verified that the proposed device can achieve finger motion in the given tasks. Therefore, MagGlove can provide flexible support tailored to the wearers' learning levels in manipulation learning tasks. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: 4 pages, 8 figures, accepted in the proceedings of Cyberworlds 2022

arXiv:2111.09029 [pdf, other]

doi 10.1109/IJCNN52387.2021.9534370

Towards Interpretable and Reliable Reading Comprehension: A Pipeline Model with Unanswerability Prediction

Authors: Kosuke Nishida, Kyosuke Nishida, Itsumi Saito, Sen Yoshida

Abstract: Multi-hop QA with annotated supporting facts, which is the task of reading comprehension (RC) considering the interpretability of the answer, has been extensively studied. In this study, we define an interpretable reading comprehension (IRC) model as a pipeline model with the capability of predicting unanswerable queries. The IRC model justifies the answer prediction by establishing consistency be… ▽ More Multi-hop QA with annotated supporting facts, which is the task of reading comprehension (RC) considering the interpretability of the answer, has been extensively studied. In this study, we define an interpretable reading comprehension (IRC) model as a pipeline model with the capability of predicting unanswerable queries. The IRC model justifies the answer prediction by establishing consistency between the predicted supporting facts and the actual rationale for interpretability. The IRC model detects unanswerable questions, instead of outputting the answer forcibly based on the insufficient information, to ensure the reliability of the answer. We also propose an end-to-end training method for the pipeline RC model. To evaluate the interpretability and the reliability, we conducted the experiments considering unanswerability in a multi-hop question for a given passage. We show that our end-to-end trainable pipeline model outperformed a non-interpretable model on our modified HotpotQA dataset. Experimental results also show that the IRC model achieves comparable results to the previous non-interpretable models in spite of the trade-off between prediction performance and interpretability. △ Less

Submitted 18 November, 2021; v1 submitted 17 November, 2021; originally announced November 2021.

Comments: IJCNN 2021 (https://ieeexplore.ieee.org/abstract/document/9534370)

Journal ref: International Joint Conference on Neural Networks (IJCNN), 2021, pp. 1-8

arXiv:2109.12776 [pdf, other]

Joint Multimedia Event Extraction from Video and Article

Authors: Brian Chen, Xudong Lin, Christopher Thomas, Manling Li, Shoya Yoshida, Lovish Chum, Heng Ji, Shih-Fu Chang

Abstract: Visual and textual modalities contribute complementary information about events described in multimedia documents. Videos contain rich dynamics and detailed unfoldings of events, while text describes more high-level and abstract concepts. However, existing event extraction methods either do not handle video or solely target video while ignoring other modalities. In contrast, we propose the first a… ▽ More Visual and textual modalities contribute complementary information about events described in multimedia documents. Videos contain rich dynamics and detailed unfoldings of events, while text describes more high-level and abstract concepts. However, existing event extraction methods either do not handle video or solely target video while ignoring other modalities. In contrast, we propose the first approach to jointly extract events from video and text articles. We introduce the new task of Video MultiMedia Event Extraction (Video M2E2) and propose two novel components to build the first system towards this task. First, we propose the first self-supervised multimodal event coreference model that can determine coreference between video events and text events without any manually annotated pairs. Second, we introduce the first multimodal transformer which extracts structured event information jointly from both videos and text documents. We also construct and will publicly release a new benchmark of video-article pairs, consisting of 860 video-article pairs with extensive annotations for evaluating methods on this task. Our experimental results demonstrate the effectiveness of our proposed method on our new benchmark dataset. We achieve 6.0% and 5.8% absolute F-score gain on multimodal event coreference resolution and multimedia event extraction. △ Less

Submitted 26 September, 2021; originally announced September 2021.

Comments: To be presented at EMNLP 2021 findings

arXiv:2109.08354 [pdf, other]

Task-adaptive Pre-training of Language Models with Word Embedding Regularization

Authors: Kosuke Nishida, Kyosuke Nishida, Sen Yoshida

Abstract: Pre-trained language models (PTLMs) acquire domain-independent linguistic knowledge through pre-training with massive textual resources. Additional pre-training is effective in adapting PTLMs to domains that are not well covered by the pre-training corpora. Here, we focus on the static word embeddings of PTLMs for domain adaptation to teach PTLMs domain-specific meanings of words. We propose a nov… ▽ More Pre-trained language models (PTLMs) acquire domain-independent linguistic knowledge through pre-training with massive textual resources. Additional pre-training is effective in adapting PTLMs to domains that are not well covered by the pre-training corpora. Here, we focus on the static word embeddings of PTLMs for domain adaptation to teach PTLMs domain-specific meanings of words. We propose a novel fine-tuning process: task-adaptive pre-training with word embedding regularization (TAPTER). TAPTER runs additional pre-training by making the static word embeddings of a PTLM close to the word embeddings obtained in the target domain with fastText. TAPTER requires no additional corpus except for the training data of the downstream task. We confirmed that TAPTER improves the performance of the standard fine-tuning and the task-adaptive pre-training on BioASQ (question answering in the biomedical domain) and on SQuAD (the Wikipedia domain) when their pre-training corpora were not dominated by in-domain data. △ Less

Submitted 17 September, 2021; originally announced September 2021.

Comments: ACL Findings 2021

arXiv:2109.04719 [pdf, other]

doi 10.1145/3460881.3460937

NaviChoker: Augmenting Pressure Sensation via Pneumatic Actuator

Authors: Shogo Yoshida, Haoran Xie, Kazunori Miyata

Abstract: Many technologies have been developed in recent years to present audiovisual information in new ways, but develo** an information presentation interface to convey tactile information is still a challenge. We propose a tactile device using wearable technology that is an all-around pressure presentation system using pneumatic actuators. Specifically, we develop a system in which a choker equipped… ▽ More Many technologies have been developed in recent years to present audiovisual information in new ways, but develo** an information presentation interface to convey tactile information is still a challenge. We propose a tactile device using wearable technology that is an all-around pressure presentation system using pneumatic actuators. Specifically, we develop a system in which a choker equipped with a pneumatic actuator is worn around the neck, that applies pressure in any direction to indicate to the user the direction in which to walk and also when to start and stop walking. In this paper, we describe the construction of the device, evaluation experiments, our assessment of the prototype, and future plans for the device. △ Less

Submitted 10 September, 2021; originally announced September 2021.

Comments: Proceedings of AH 2021. 4 pages, 7 figures

arXiv:2109.00133 [pdf, other]

AugLimb: Compact Robotic Limb for Human Augmentation

Authors: Zeyu Ding, Shogo Yoshida, Toby Chong, Tsukasa Fukusato, Takuma Torii, Haoran Xie

Abstract: This work proposes a compact robotic limb, AugLimb, that can augment our body functions and support the daily activities. AugLimb adopts the double-layer scissor unit for the extendable mechanism which can achieve 2.5 times longer than the forearm length. The proposed device can be mounted on the user's upper arm, and transform into compact state without obstruction to wearers. The proposed device… ▽ More This work proposes a compact robotic limb, AugLimb, that can augment our body functions and support the daily activities. AugLimb adopts the double-layer scissor unit for the extendable mechanism which can achieve 2.5 times longer than the forearm length. The proposed device can be mounted on the user's upper arm, and transform into compact state without obstruction to wearers. The proposed device is lightweight with low burden exerted on the wearer. We developed the prototype of AugLimb to demonstrate the proposed mechanisms. We believe that the design methodology of AugLimb can facilitate human augmentation research for practical use. see http://www.jaist.ac.jp/~xie/auglimb.html △ Less

Submitted 31 August, 2021; originally announced September 2021.

Comments: 2 pages, 3 figures

arXiv:2103.02893 [pdf, other]

Lower-Bounded Proper Losses for Weakly Supervised Classification

Authors: Shuhei M. Yoshida, Takashi Takenouchi, Masashi Sugiyama

Abstract: This paper discusses the problem of weakly supervised classification, in which instances are given weak labels that are produced by some label-corruption process. The goal is to derive conditions under which loss functions for weak-label learning are proper and lower-bounded -- two essential requirements for the losses used in class-probability estimation. To this end, we derive a representation t… ▽ More This paper discusses the problem of weakly supervised classification, in which instances are given weak labels that are produced by some label-corruption process. The goal is to derive conditions under which loss functions for weak-label learning are proper and lower-bounded -- two essential requirements for the losses used in class-probability estimation. To this end, we derive a representation theorem for proper losses in supervised learning, which dualizes the Savage representation. We use this theorem to characterize proper weak-label losses and find a condition for them to be lower-bounded. From these theoretical findings, we derive a novel regularization scheme called generalized logit squeezing, which makes any proper weak-label loss bounded from below, without losing properness. Furthermore, we experimentally demonstrate the effectiveness of our proposed approach, as compared to improper or unbounded losses. The results highlight the importance of properness and lower-boundedness. △ Less

Submitted 11 June, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

Comments: ICML2021 camera ready, code available at https://github.com/yoshum/lower-bounded-proper-losses

arXiv:2103.01409 [pdf, other]

doi 10.1109/RoboSoft51838.2021.9479188

BPActuators: Lightweight and Low-Cost Soft Actuators by Balloons and Plastics

Authors: Qiukai Qi, Shogo Yoshida, Genki Kakihana, Takuma Torii, Van Anh Ho, Haoran Xie

Abstract: To increase the awareness and impact, soft robotics needs to go beyond the lab environment and should be readily accessible to those even with no robotic expertise. However, most prevailing manufacturing methodologies require either professional equipment or materials that are not usually available to common people, thereby constraining the accessibility of soft robotics. In this communication, we… ▽ More To increase the awareness and impact, soft robotics needs to go beyond the lab environment and should be readily accessible to those even with no robotic expertise. However, most prevailing manufacturing methodologies require either professional equipment or materials that are not usually available to common people, thereby constraining the accessibility of soft robotics. In this communication, we propose a lightweight and low-cost soft bending actuator, called BPActuator, that can be easily fabricated with plastics and balloons. We fabricated a range of actuators with various morphology for characterization in terms of deformation and load-bearing capacity, and demonstrated that they can bend up to 35 degrees and exert force at the tip around 0.070$\pm$0.015N, which is over 5 times higher than their average gravity. We further implemented a gripper with three fingers using the proposed actuators, and found that the gripper can realize human-like grasp of a range of daily objects. The gripper can lift objects at least 8 times heavier than its own weight. Furthermore, the BPActuator is cost effective and each costs about 0.22 USD. Given these advantages, the BPActuators are expected to significantly improve the accessibility of soft robotics to a wider group without robotic expertise. △ Less

Submitted 4 March, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: Accepted to the 4th IEEE International Conference on Soft Robotics (RoboSoft), IEEE copyright

arXiv:2101.11589 [pdf, other]

doi 10.1088/1748-0221/16/07/P07041

A Convolutional Neural Network based Cascade Reconstruction for the IceCube Neutrino Observatory

Authors: R. Abbasi, M. Ackermann, J. Adams, J. A. Aguilar, M. Ahlers, M. Ahrens, C. Alispach, A. A. Alves Jr., N. M. Amin, R. An, K. Andeen, T. Anderson, I. Ansseau, G. Anton, C. Argüelles, S. Axani, X. Bai, A. Balagopal V., A. Barbano, S. W. Barwick, B. Bastian, V. Basu, V. Baum, S. Baur, R. Bay , et al. (343 additional authors not shown)

Abstract: Continued improvements on existing reconstruction methods are vital to the success of high-energy physics experiments, such as the IceCube Neutrino Observatory. In IceCube, further challenges arise as the detector is situated at the geographic South Pole where computational resources are limited. However, to perform real-time analyses and to issue alerts to telescopes around the world, powerful an… ▽ More Continued improvements on existing reconstruction methods are vital to the success of high-energy physics experiments, such as the IceCube Neutrino Observatory. In IceCube, further challenges arise as the detector is situated at the geographic South Pole where computational resources are limited. However, to perform real-time analyses and to issue alerts to telescopes around the world, powerful and fast reconstruction methods are desired. Deep neural networks can be extremely powerful, and their usage is computationally inexpensive once the networks are trained. These characteristics make a deep learning-based approach an excellent candidate for the application in IceCube. A reconstruction method based on convolutional architectures and hexagonally shaped kernels is presented. The presented method is robust towards systematic uncertainties in the simulation and has been tested on experimental data. In comparison to standard reconstruction methods in IceCube, it can improve upon the reconstruction accuracy, while reducing the time necessary to run the reconstruction by two to three orders of magnitude. △ Less

Submitted 26 July, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

Comments: 39 pages, 15 figures, submitted to Journal of Instrumentation; added references

Journal ref: JINST 16 (2021) P07041

arXiv:2101.11272 [pdf, other]

VisualMRC: Machine Reading Comprehension on Document Images

Authors: Ryota Tanaka, Kyosuke Nishida, Sen Yoshida

Abstract: Recent studies on machine reading comprehension have focused on text-level understanding but have not yet reached the level of human understanding of the visual layout and content of real-world documents. In this study, we introduce a new visual machine reading comprehension dataset, named VisualMRC, wherein given a question and a document image, a machine reads and comprehends texts in the image… ▽ More Recent studies on machine reading comprehension have focused on text-level understanding but have not yet reached the level of human understanding of the visual layout and content of real-world documents. In this study, we introduce a new visual machine reading comprehension dataset, named VisualMRC, wherein given a question and a document image, a machine reads and comprehends texts in the image to answer the question in natural language. Compared with existing visual question answering (VQA) datasets that contain texts in images, VisualMRC focuses more on develo** natural language understanding and generation abilities. It contains 30,000+ pairs of a question and an abstractive answer for 10,000+ document images sourced from multiple domains of webpages. We also introduce a new model that extends existing sequence-to-sequence models, pre-trained with large-scale text corpora, to take into account the visual layout and content of documents. Experiments with VisualMRC show that this model outperformed the base sequence-to-sequence models and a state-of-the-art VQA model. However, its performance is still below that of humans on most automatic evaluation metrics. The dataset will facilitate research aimed at connecting vision and language understanding. △ Less

Submitted 10 May, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

Comments: Accepted as a full paper at AAAI 2021. The first two authors have equal contribution

arXiv:1708.04750 [pdf, other]

doi 10.1109/VTCFall.2013.6692147

Coordinated Linear Precoding in Downlink Multicell MU-MISO OFDMA Networks

Authors: Mirza Golam Kibria, Hidekazu Murata, Susumu Yoshida

Abstract: This paper considers coordinated linear precoding in downlink multicell multiuser orthogonal frequency-division multiple access (OFDMA) network. A less-complex, fast and provably convergent algorithm that maximizes the weighted sum-rate with per base station (BS) transmit power constraint is formulated. We approximate the nonconvex weighted sum- rate maximization (WSRM) problem with a solvable con… ▽ More This paper considers coordinated linear precoding in downlink multicell multiuser orthogonal frequency-division multiple access (OFDMA) network. A less-complex, fast and provably convergent algorithm that maximizes the weighted sum-rate with per base station (BS) transmit power constraint is formulated. We approximate the nonconvex weighted sum- rate maximization (WSRM) problem with a solvable convex form by means of sequential parametric convex approximation (SPCA) approach. The second order cone program (SOCP) formulations of the objective function and constraints of the optimization problem are derived through proper change of vari- ables, first order linear approximation and hyperbolic constraints transformation, etc. The algorithm converges to the suboptimal solution taking fewer number of iterations in comparison to other known iterative WSRM algorithms. Finally, numerical results are presented to justify the effectiveness and superiority of the proposed algorithm. △ Less

Submitted 15 August, 2017; originally announced August 2017.

Comments: arXiv admin note: substantial text overlap with arXiv:1309.4203

Journal ref: Proc. IEEE Vehicular Technology Conference (IEEE VTC)., Las Vegas, USA, Sep. 2013

arXiv:1707.04173 [pdf]

Review: Modeling and Classical Controller Of Quad-rotor

Authors: Tarek N. Dief, Shigeo Yoshida

Abstract: This paper presents an overview of the most effective ideas for the Quad-rotor project. The concept of modeling using different methods is presented. The modeling part presented the nonlinear model, and the concept of linearization using small disturbance theory. Parameter identifications part explained the most important parameters that affect the system stability and tried to get suitable soluti… ▽ More This paper presents an overview of the most effective ideas for the Quad-rotor project. The concept of modeling using different methods is presented. The modeling part presented the nonlinear model, and the concept of linearization using small disturbance theory. Parameter identifications part explained the most important parameters that affect the system stability and tried to get suitable solutions for these problems and identify some parameters experimentally. Data filtration, Kalman filter, Structure design, motor distribution, aerodynamic effect, analysis of shroud and its effect on the resultant thrust were explained. The control part incorporates different classical schemes such as PD and PID controllers to stabilize the Quad-rotor. Also, different ideas are presented to stabilize the quad rotor using PID controllers with some modification to get high maneuverability and better performance. △ Less

Submitted 15 July, 2017; v1 submitted 13 July, 2017; originally announced July 2017.

Comments: IRACST - International Journal of Computer Science and Information Technology & Security (IJCSITS), ISSN: 2249-9555 Vol. 5, No4, August 2015

arXiv:1612.05000 [pdf, other]

Development of a Real-time Colorectal Tumor Classification System for Narrow-band Imaging zoom-videoendoscopy

Authors: Tsubasa Hirakawa, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Tetsushi Koide, Shigeto Yoshida, Hiroshi Mieno, Shinji Tanaka

Abstract: Colorectal endoscopy is important for the early detection and treatment of colorectal cancer and is used worldwide. A computer-aided diagnosis (CAD) system that provides an objective measure to endoscopists during colorectal endoscopic examinations would be of great value. In this study, we describe a newly developed CAD system that provides real-time objective measures. Our system captures the vi… ▽ More Colorectal endoscopy is important for the early detection and treatment of colorectal cancer and is used worldwide. A computer-aided diagnosis (CAD) system that provides an objective measure to endoscopists during colorectal endoscopic examinations would be of great value. In this study, we describe a newly developed CAD system that provides real-time objective measures. Our system captures the video stream from an endoscopic system and transfers it to a desktop computer. The captured video stream is then classified by a pretrained classifier and the results are displayed on a monitor. The experimental results show that our developed system works efficiently in actual endoscopic examinations and is medically significant. △ Less

Submitted 20 December, 2016; v1 submitted 15 December, 2016; originally announced December 2016.

Comments: 9 pages, 8 figures

arXiv:1611.02443 [pdf, other]

Domain Adaptation with L2 constraints for classifying images from different endoscope systems

Authors: Toru Tamaki, Shoji Sonoyama, Takio Kurita, Tsubasa Hirakawa, Bisser Raytchev, Kazufumi Kaneda, Tetsushi Koide, Shigeto Yoshida, Hiroshi Mieno, Shinji Tanaka, Kazuaki Chayama

Abstract: This paper proposes a method for domain adaptation that extends the maximum margin domain transfer (MMDT) proposed by Hoffman et al., by introducing L2 distance constraints between samples of different domains; thus, our method is denoted as MMDTL2. Motivated by the differences between the images taken by narrow band imaging (NBI) endoscopic devices, we utilize different NBI devices as different d… ▽ More This paper proposes a method for domain adaptation that extends the maximum margin domain transfer (MMDT) proposed by Hoffman et al., by introducing L2 distance constraints between samples of different domains; thus, our method is denoted as MMDTL2. Motivated by the differences between the images taken by narrow band imaging (NBI) endoscopic devices, we utilize different NBI devices as different domains and estimate the transformations between samples of different domains, i.e., image samples taken by different NBI endoscope systems. We first formulate the problem in the primal form, and then derive the dual form with much lesser computational costs as compared to the naive approach. From our experimental results using NBI image datasets from two different NBI endoscopic devices, we find that MMDTL2 is better than MMDT and also support vector machines without adaptation, especially when NBI image features are high-dimensional and the per-class training samples are greater than 20. △ Less

Submitted 2 February, 2018; v1 submitted 8 November, 2016; originally announced November 2016.

Comments: 15 pages

arXiv:1608.06713 [pdf, other]

Transfer Learning for Endoscopic Image Classification

Authors: Shoji Sonoyama, Toru Tamaki, Tsubasa Hirakawa, Bisser Raytchev, Kazufumi Kaneda, Tetsushi Koide, Shigeto Yoshida, Hiroshi Mieno, Shinji Tanaka

Abstract: In this paper we propose a method for transfer learning of endoscopic images. For transferring between features obtained from images taken by different (old and new) endoscopes, we extend the Max-Margin Domain Transfer (MMDT) proposed by Hoffman et al. in order to use L2 distance constraints as regularization, called Max-Margin Domain Transfer with L2 Distance Constraints (MMDTL2). Furthermore, we… ▽ More In this paper we propose a method for transfer learning of endoscopic images. For transferring between features obtained from images taken by different (old and new) endoscopes, we extend the Max-Margin Domain Transfer (MMDT) proposed by Hoffman et al. in order to use L2 distance constraints as regularization, called Max-Margin Domain Transfer with L2 Distance Constraints (MMDTL2). Furthermore, we develop the dual formulation of the optimization problem in order to reduce the computation cost. Experimental results demonstrate that the proposed MMDTL2 outperforms MMDT for real data sets taken by different endoscopes. △ Less

Submitted 24 August, 2016; originally announced August 2016.

Comments: 5 pages, FCV2016

arXiv:1608.06709 [pdf, other]

Computer-Aided Colorectal Tumor Classification in NBI Endoscopy Using CNN Features

Authors: Toru Tamaki, Shoji Sonoyama, Tsubasa Hirakawa, Bisser Raytchev, Kazufumi Kaneda, Tetsushi Koide, Shigeto Yoshida, Hiroshi Mieno, Shinji Tanaka

Abstract: In this paper we report results for recognizing colorectal NBI endoscopic images by using features extracted from convolutional neural network (CNN). In this comparative study, we extract features from different layers from different CNN models, and then train linear SVM classifiers. Experimental results with 10-fold cross validations show that features from first few convolution layers are enough… ▽ More In this paper we report results for recognizing colorectal NBI endoscopic images by using features extracted from convolutional neural network (CNN). In this comparative study, we extract features from different layers from different CNN models, and then train linear SVM classifiers. Experimental results with 10-fold cross validations show that features from first few convolution layers are enough to achieve similar performance (i.e., recognition rate of 95%) with non-CNN local features such as Bag-of-Visual words, Fisher vector, and VLAD. △ Less

Submitted 24 August, 2016; originally announced August 2016.

Comments: 5 pages, FCV2016

arXiv:1503.03964 [pdf, ps, other]

doi 10.1007/s00354-016-0306-y

Interactive Restless Multi-armed Bandit Game and Swarm Intelligence Effect

Authors: Shunsuke Yoshida, Masato Hisakado, Shintaro Mori

Abstract: We obtain the conditions for the emergence of the swarm intelligence effect in an interactive game of restless multi-armed bandit (rMAB). A player competes with multiple agents. Each bandit has a payoff that changes with a probability $p_{c}$ per round. The agents and player choose one of three options: (1) Exploit (a good bandit), (2) Innovate (asocial learning for a good bandit among $n_{I}$ ran… ▽ More We obtain the conditions for the emergence of the swarm intelligence effect in an interactive game of restless multi-armed bandit (rMAB). A player competes with multiple agents. Each bandit has a payoff that changes with a probability $p_{c}$ per round. The agents and player choose one of three options: (1) Exploit (a good bandit), (2) Innovate (asocial learning for a good bandit among $n_{I}$ randomly chosen bandits), and (3) Observe (social learning for a good bandit). Each agent has two parameters $(c,p_{obs})$ to specify the decision: (i) $c$, the threshold value for Exploit, and (ii) $p_{obs}$, the probability for Observe in learning. The parameters $(c,p_{obs})$ are uniformly distributed. We determine the optimal strategies for the player using complete knowledge about the rMAB. We show whether or not social or asocial learning is more optimal in the $(p_{c},n_{I})$ space and define the swarm intelligence effect. We conduct a laboratory experiment (67 subjects) and observe the swarm intelligence effect only if $(p_{c},n_{I})$ are chosen so that social learning is far more optimal than asocial learning. △ Less

Submitted 13 March, 2015; originally announced March 2015.

Comments: 18 pages, 4 figures

Journal ref: New generation computing, vol.34, No. 3, 291-306, 2016

arXiv:1311.5904 [pdf, ps, other]

doi 10.1016/j.jpdc.2014.08.001

The IceProd Framework: Distributed Data Processing for the IceCube Neutrino Observatory

Authors: M. G. Aartsen, R. Abbasi, M. Ackermann, J. Adams, J. A. Aguilar, M. Ahlers, D. Altmann, C. Arguelles, J. Auffenberg, X. Bai, M. Baker, S. W. Barwick, V. Baum, R. Bay, J. J. Beatty, J. Becker Tjus, K. -H. Becker, S. BenZvi, P. Berghaus, D. Berley, E. Bernardini, A. Bernhard, D. Z. Besson, G. Binder, D. Bindig , et al. (262 additional authors not shown)

Abstract: IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, iden- tify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. IceProd is a distributed management system based on Python, XML-RPC and GridFTP. It… ▽ More IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, iden- tify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. IceProd is a distributed management system based on Python, XML-RPC and GridFTP. It is driven by a central database in order to coordinate and admin- ister production of simulations and processing of data produced by the IceCube detector. IceProd runs as a separate layer on top of other middleware and can take advantage of a variety of computing resources, including grids and batch systems such as CREAM, Condor, and PBS. This is accomplished by a set of dedicated daemons that process job submission in a coordinated fashion through the use of middleware plugins that serve to abstract the details of job submission and job management from the framework. △ Less

Submitted 22 August, 2014; v1 submitted 22 November, 2013; originally announced November 2013.

Journal ref: Journal of Parallel & Distributed Computing 75:198,2015

Showing 1–27 of 27 results for author: Yoshida, S