-
Inkjet printed intelligent reflecting surface (IRS) for indoor applications
Authors:
Kairi Takimoto,
Kazutomo Nakamura,
Peter Njogu,
Kota Suzuki,
Masato Sugimoto,
Ashif Fathnan,
Takashi Kondo,
Masayuki Mori,
Daisuke Anzai,
Hiroki Wakatsuchi
Abstract:
A passive, low-cost, paper-based intelligent reflecting surface (IRS) is designed to reflect a signal in a desired direction to overcome non-line-of-sight scenarios in indoor environments. The IRS is fabricated using conductive silver ink printed on a paper with a specific nanoparticle arrangement, yielding a cost effective paper-based IRS that can easily be mass-produced. Full-wave numerical simu…
▽ More
A passive, low-cost, paper-based intelligent reflecting surface (IRS) is designed to reflect a signal in a desired direction to overcome non-line-of-sight scenarios in indoor environments. The IRS is fabricated using conductive silver ink printed on a paper with a specific nanoparticle arrangement, yielding a cost effective paper-based IRS that can easily be mass-produced. Full-wave numerical simulation results were consistent with measurements results, demonstrating the IRS's ability to reflect incident wave into a desired nonspecular direction based on the inkjet-printed design and materials.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
End-to-End Joint Target and Non-Target Speakers ASR
Authors:
Ryo Masumura,
Naoki Makishima,
Taiga Yamane,
Yoshihiko Yamazaki,
Saki Mizuno,
Mana Ihori,
Mihiro Uchida,
Keita Suzuki,
Hiroshi Sato,
Tomohiro Tanaka,
Akihiko Takashima,
Satoshi Suzuki,
Takafumi Moriya,
Nobukatsu Hojo,
Atsushi Ando
Abstract:
This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information. However, in conversational ASR applicatio…
▽ More
This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information. However, in conversational ASR applications, transcribing both the target speaker's speech and non-target speakers' ones is often required to understand interactive information. To naturally consider both target and non-target speakers in a single ASR model, our idea is to extend autoregressive modeling-based multi-talker ASR systems to utilize the enrollment speech of the target speaker. Our proposed ASR is performed by recursively generating both textual tokens and tokens that represent target or non-target speakers. Our experiments demonstrate the effectiveness of our proposed method.
△ Less
Submitted 4 June, 2023;
originally announced June 2023.
-
Hybrid Life: Integrating Biological, Artificial, and Cognitive Systems
Authors:
Manuel Baltieri,
Hiroyuki Iizuka,
Olaf Witkowski,
Lana Sinapayen,
Keisuke Suzuki
Abstract:
Artificial life is a research field studying what processes and properties define life, based on a multidisciplinary approach spanning the physical, natural and computational sciences. Artificial life aims to foster a comprehensive study of life beyond "life as we know it" and towards "life as it could be", with theoretical, synthetic and empirical models of the fundamental properties of living sy…
▽ More
Artificial life is a research field studying what processes and properties define life, based on a multidisciplinary approach spanning the physical, natural and computational sciences. Artificial life aims to foster a comprehensive study of life beyond "life as we know it" and towards "life as it could be", with theoretical, synthetic and empirical models of the fundamental properties of living systems. While still a relatively young field, artificial life has flourished as an environment for researchers with different backgrounds, welcoming ideas and contributions from a wide range of subjects. Hybrid Life is an attempt to bring attention to some of the most recent developments within the artificial life community, rooted in more traditional artificial life studies but looking at new challenges emerging from interactions with other fields. In particular, Hybrid Life focuses on three complementary themes: 1) theories of systems and agents, 2) hybrid augmentation, with augmented architectures combining living and artificial systems, and 3) hybrid interactions among artificial and biological systems. After discussing some of the major sources of inspiration for these themes, we will focus on an overview of the works that appeared in Hybrid Life special sessions, hosted by the annual Artificial Life Conference between 2018 and 2022.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Authors:
Atsushi Ando,
Ryo Masumura,
Akihiko Takashima,
Satoshi Suzuki,
Naoki Makishima,
Keita Suzuki,
Takafumi Moriya,
Takanori Ashihara,
Hiroshi Sato
Abstract:
This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA). Although the effectiveness of pre-trained encoders in various fields has been reported, conventional MSA methods employ them for only linguistic modality, and their application has not been investigated. This paper compares the features yielded…
▽ More
This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA). Although the effectiveness of pre-trained encoders in various fields has been reported, conventional MSA methods employ them for only linguistic modality, and their application has not been investigated. This paper compares the features yielded by large-scale pre-trained encoders with conventional heuristic features. One each of the largest pre-trained encoders publicly available for each modality are used; CLIP-ViT, WavLM, and BERT for visual, acoustic, and linguistic modalities, respectively. Experiments on two datasets reveal that methods with domain-specific pre-trained encoders attain better performance than those with conventional features in both unimodal and multimodal scenarios. We also find it better to use the outputs of the intermediate layers of the encoders than those of the output layer. The codes are available at https://github.com/ando-hub/MSA_Pretrain.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Compressing Sign Information in DCT-based Image Coding via Deep Sign Retrieval
Authors:
Kei Suzuki,
Chihiro Tsutake,
Keita Takahashi,
Toshiaki Fujii
Abstract:
Compressing the sign information of discrete cosine transform (DCT) coefficients is an intractable problem in image coding schemes due to the equiprobable characteristics of the signs. To overcome this difficulty, we propose an efficient compression method for the sign information called "sign retrieval." This method is inspired by phase retrieval, which is a classical signal restoration problem o…
▽ More
Compressing the sign information of discrete cosine transform (DCT) coefficients is an intractable problem in image coding schemes due to the equiprobable characteristics of the signs. To overcome this difficulty, we propose an efficient compression method for the sign information called "sign retrieval." This method is inspired by phase retrieval, which is a classical signal restoration problem of finding the phase information of discrete Fourier transform coefficients from their magnitudes. The sign information of all DCT coefficients is excluded from a bitstream at the encoder and is complemented at the decoder through our sign retrieval method. We show through experiments that our method outperforms previous ones in terms of the bit amount for the signs and computation cost. Our method, implemented in Python language, is available from https://github.com/ctsutake/dsr.
△ Less
Submitted 10 May, 2024; v1 submitted 21 September, 2022;
originally announced September 2022.
-
A Transponder Aggregator with Efficient Use of Filtering Function for Transponder Noise Suppression
Authors:
Kenya Suzuki,
Osamu Moriwaki,
Koichi Hadama,
Keita Yamaguchi,
Hiroki Taniguchi,
Yoshiaki Kisaka,
Daisuke Ogawa,
Makoto Takeshita,
Stefano Camatel,
Yiran Ma,
Mitsunori Fukutoku
Abstract:
Colorless, directionless, and contentionless reconfigurable optical add/drop multiplexing (CDC-ROADM) provides highly flexible physical layer network configuration. Such CDC-ROADM must operate in multiple wavelength bands which are being increasingly implemented in optical transmission systems. The operation in C+L bands requires switch devices used in CDC-ROADM to also be capable of multiband ope…
▽ More
Colorless, directionless, and contentionless reconfigurable optical add/drop multiplexing (CDC-ROADM) provides highly flexible physical layer network configuration. Such CDC-ROADM must operate in multiple wavelength bands which are being increasingly implemented in optical transmission systems. The operation in C+L bands requires switch devices used in CDC-ROADM to also be capable of multiband operation. Recent studies on wavelength division multiplexing (WDM) systems have pointed out the impact of amplified spontaneous emission (ASE) noise generated by signals of different wavelengths, which causes OSNR degradation. Therefore, it is desirable to filter out the ASE noise from different transponders when multiplexing multiple wavelengths at the transmitter side, especially in a system with non-wavelength selective combiners such as directional couplers and multicast switches. The use of transponder aggregators with filtering functions, such as the M x N wavelength selective switch (WSS), is preferable for this filtering. However, the downside of these devices is that it is difficult to provide economical multiband support. Therefore, we propose an economical transponder aggregator configuration by allowing a certain amount of ASE superposition and reducing the number of filtering functions. In this paper, we fabricated a prototype of the proposed transponder aggregator by combining silica-based planar lightwave circuit technology and C+L band WSS, both commercially available, and verified its feasibility through transmission experiments. The novel transponder aggregator is a practical solution for a multiband CDC-ROADM system with improved OSNR performance.
△ Less
Submitted 3 October, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Speak Like a Dog: Human to Non-human creature Voice Conversion
Authors:
Kohei Suzuki,
Shoki Sakamoto,
Tadahiro Taniguchi,
Hirokazu Kameoka
Abstract:
This paper proposes a new voice conversion (VC) task from human speech to dog-like speech while preserving linguistic information as an example of human to non-human creature voice conversion (H2NH-VC) tasks. Although most VC studies deal with human to human VC, H2NH-VC aims to convert human speech into non-human creature-like speech. Non-parallel VC allows us to develop H2NH-VC, because we cannot…
▽ More
This paper proposes a new voice conversion (VC) task from human speech to dog-like speech while preserving linguistic information as an example of human to non-human creature voice conversion (H2NH-VC) tasks. Although most VC studies deal with human to human VC, H2NH-VC aims to convert human speech into non-human creature-like speech. Non-parallel VC allows us to develop H2NH-VC, because we cannot collect a parallel dataset that non-human creatures speak human language. In this study, we propose to use dogs as an example of a non-human creature target domain and define the "speak like a dog" task. To clarify the possibilities and characteristics of the "speak like a dog" task, we conducted a comparative experiment using existing representative non-parallel VC methods in acoustic features (Mel-cepstral coefficients and Mel-spectrograms), network architectures (five different kernel-size settings), and training criteria (variational autoencoder (VAE)- based and generative adversarial network-based). Finally, the converted voices were evaluated using mean opinion scores: dog-likeness, sound quality and intelligibility, and character error rate (CER). The experiment showed that the employment of the Mel-spectrogram improved the dog-likeness of the converted speech, while it is challenging to preserve linguistic information. Challenges and limitations of the current VC methods for H2NH-VC are highlighted.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Personal Mobility With Synchronous Trunk-Knee Passive Exoskeleton: Optimizing Human-Robot Energy Transfer
Authors:
Diego Paez-Granados,
Hideki Kadone,
Modar Hassan,
Yang Chen,
Kenji Suzuki
Abstract:
We present a personal mobility device for lower-body impaired users through a light-weighted exoskeleton on wheels. On its core, a novel passive exoskeleton provides postural transition leveraging natural body postures with support to the trunk on sit-to-stand and stand-to-sit (STS) transitions by a single gas spring as an energy storage unit. We propose a direction-dependent coupling of knees and…
▽ More
We present a personal mobility device for lower-body impaired users through a light-weighted exoskeleton on wheels. On its core, a novel passive exoskeleton provides postural transition leveraging natural body postures with support to the trunk on sit-to-stand and stand-to-sit (STS) transitions by a single gas spring as an energy storage unit. We propose a direction-dependent coupling of knees and hip joints through a double-pulley wire system, transferring energy from the torso motion towards balancing the moment load at the knee joint actuator. Herewith, the exoskeleton maximizes energy transfer and the naturalness of the user's movement. We introduce an embodied user interface for hands-free navigation through a torso pressure sensing with minimal trunk rotations, resulting on average $19^{\circ} \pm 13^{\circ}$ on six unimpaired users. We evaluated the design for STS assistance on 11 unimpaired users observing motions and muscle activity during the transitions. Results comparing assisted and unassisted STS transitions validated a significant reduction (up to $68\%$ $p<0.01$) at the involved muscle groups. Moreover, we showed it feasible through natural torso leaning movements of $+12^{\circ}\pm 6.5^{\circ}$ and $- 13.7^{\circ} \pm 6.1^{\circ}$ for standing and sitting, respectively. Passive postural transition assistance warrants further work on increasing its applicability and broadening the user population.
△ Less
Submitted 10 January, 2022;
originally announced January 2022.
-
Linearly-involved Moreau-Enhanced-over-Subspace Model: Debiased Sparse Modeling and Stable Outlier-Robust Regression
Authors:
Masahiro Yukawa,
Hiroyuki Kaneko,
Kyohei Suzuki,
Isao Yamada
Abstract:
We present an efficient mathematical framework based on the linearly-involved Moreau-enhanced-over-subspace (LiMES) model. Two concrete applications are considered: sparse modeling and robust regression. The popular minimax concave (MC) penalty for sparse modeling subtracts, from the $\ell_1$ norm, its Moreau envelope, inducing nearly unbiased estimates and thus yielding remarkable performance enh…
▽ More
We present an efficient mathematical framework based on the linearly-involved Moreau-enhanced-over-subspace (LiMES) model. Two concrete applications are considered: sparse modeling and robust regression. The popular minimax concave (MC) penalty for sparse modeling subtracts, from the $\ell_1$ norm, its Moreau envelope, inducing nearly unbiased estimates and thus yielding remarkable performance enhancements. To extend it to underdetermined linear systems, we propose the projective minimax concave penalty using the projection onto the input subspace, where the Moreau-enhancement effect is restricted to the subspace for preserving the overall convexity. We also present a novel concept of stable outlier-robust regression which distinguishes noise and outlier explicitly. The LiMES model encompasses those two specific examples as well as two other applications: stable principal component pursuit and robust classification. The LiMES function involved in the model is an ``additively nonseparable'' weakly convex function but is defined with the Moreau envelope returning the minimum of a ``separable'' convex function. This mixed nature of separability and nonseparability allows an application of the LiMES model to the underdetermined case with an efficient algorithmic implementation. Two linear/affine operators play key roles in the model: one corresponds to the projection mentioned above and the other takes care of robust regression/classification. A necessary and sufficient condition for convexity of the smooth part of the objective function is studied. Numerical examples show the efficacy of LiMES in applications to sparse modeling and robust regression.
△ Less
Submitted 1 April, 2023; v1 submitted 10 January, 2022;
originally announced January 2022.
-
First demonstration of C + L band CDCROADM with simple node configuration using multiband switching devices
Authors:
Shuto Yamamoto,
Hiroki Taniguchi,
Yoshiaki Kisaka,
Stefano Camatel,
Yiran Ma,
Daisuke Ogawa,
Koichi Hadama,
Mitsunori Fukutoku,
Takashi Goh,
Kenya Suzuki
Abstract:
While ultrahigh-baud-rate optical signals are effective for extending the transmission distance of large capacity signals, they also reduce the number of wavelengths that can be arranged in a band because of their wider bandwidth. This reduces the flexibility of optical path configuration in reconfigurable optical add/drop multiplexing (ROADM) networks. In colorless, directionless and contentionle…
▽ More
While ultrahigh-baud-rate optical signals are effective for extending the transmission distance of large capacity signals, they also reduce the number of wavelengths that can be arranged in a band because of their wider bandwidth. This reduces the flexibility of optical path configuration in reconfigurable optical add/drop multiplexing (ROADM) networks. In colorless, directionless and contentionless (CDC)-ROADM in particular, the effect reduces the add/drop ratio at a node. Multiband ROADM systems are an effective countermeasure for overcoming this issue, but they make the node configuration more complicated and its operation more difficult. In this paper, we analyze the challenges of C + L band CDC-ROADM and show that optical switch devices that operate over multiple bands are effective in meeting them. For this purpose, we built a C + L band CDC-ROADM node based on C + L band wavelength selective switches (WSSs) and multicast switches (MCSs) and confirmed its effectiveness experimentally. In particular, to simplify the node configuration, we propose a reduction in the number of optical amplifiers used for node loss compensation and experimentally verify its feasibility.
△ Less
Submitted 4 June, 2021;
originally announced June 2021.
-
Facial movement synergies and Action Unit detection from distal wearable Electromyography and Computer Vision
Authors:
Monica Perusquia-Hernandez,
Felix Dollack,
Chun Kwang Tan,
Shushi Namba,
Saho Ayabe-Kanamura,
Kenji Suzuki
Abstract:
Distal facial Electromyography (EMG) can be used to detect smiles and frowns with reasonable accuracy. It capitalizes on volume conduction to detect relevant muscle activity, even when the electrodes are not placed directly on the source muscle. The main advantage of this method is to prevent occlusion and obstruction of the facial expression production, whilst allowing EMG measurements. However,…
▽ More
Distal facial Electromyography (EMG) can be used to detect smiles and frowns with reasonable accuracy. It capitalizes on volume conduction to detect relevant muscle activity, even when the electrodes are not placed directly on the source muscle. The main advantage of this method is to prevent occlusion and obstruction of the facial expression production, whilst allowing EMG measurements. However, measuring EMG distally entails that the exact source of the facial movement is unknown. We propose a novel method to estimate specific Facial Action Units (AUs) from distal facial EMG and Computer Vision (CV). This method is based on Independent Component Analysis (ICA), Non-Negative Matrix Factorization (NNMF), and sorting of the resulting components to determine which is the most likely to correspond to each CV-labeled action unit (AU). Performance on the detection of AU06 (Orbicularis Oculi) and AU12 (Zygomaticus Major) was estimated by calculating the agreement with Human Coders. The results of our proposed algorithm showed an accuracy of 81% and a Cohen's Kappa of 0.49 for AU6; and accuracy of 82% and a Cohen's Kappa of 0.53 for AU12. This demonstrates the potential of distal EMG to detect individual facial movements. Using this multimodal method, several AU synergies were identified. We quantified the co-occurrence and timing of AU6 and AU12 in posed and spontaneous smiles using the human-coded labels, and for comparison, using the continuous CV-labels. The co-occurrence analysis was also performed on the EMG-based labels to uncover the relationship between muscle synergies and the kinematics of visible facial movement.
△ Less
Submitted 20 August, 2020;
originally announced August 2020.
-
Control Interface for Hands-free Navigation of Standing Mobility Vehicles based on Upper-Body Natural Movements
Authors:
Yang Chen,
Diego Paez-Granados,
Hideki Kadone,
Kenji Suzuki
Abstract:
In this paper, we propose and evaluate a novel human-machine interface (HMI) for controlling a standing mobility vehicle or person carrier robot, aiming for a hands-free control through upper-body natural postures derived from gaze tracking while walking. We target users with lower-body impairment with remaining upper-body motion capabilities. The developed HMI bases on a sensing array for capturi…
▽ More
In this paper, we propose and evaluate a novel human-machine interface (HMI) for controlling a standing mobility vehicle or person carrier robot, aiming for a hands-free control through upper-body natural postures derived from gaze tracking while walking. We target users with lower-body impairment with remaining upper-body motion capabilities. The developed HMI bases on a sensing array for capturing body postures; an intent recognition algorithm for continuous map** of body motions to robot control space; and a personalizing system for multiple body sizes and shapes. We performed two user studies: first, an analysis of the required body muscles involved in navigating with the proposed control; and second, an assessment of the HMI compared with a standard joystick through quantitative and qualitative metrics in a narrow circuit task. We concluded that the main user control contribution comes from Rectus Abdominis and Erector Spinae muscle groups at different levels. Finally, the comparative study showed that a joystick still outperforms the proposed HMI in usability perceptions and controllability metrics, however, the smoothness of user control was similar in jerk and fluency. Moreover, users' perceptions showed that hands-free control made it more anthropomorphic, animated, and even safer.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
Visualizing intestines for diagnostic assistance of ileus based on intestinal region segmentation from 3D CT images
Authors:
Hirohisa Oda,
Kohei Nishio,
Takayuki Kitasaka,
Hizuru Amano,
Aitaro Takimoto,
Hiroo Uchida,
Kojiro Suzuki,
Hayato Itoh,
Masahiro Oda,
Kensaku Mori
Abstract:
This paper presents a visualization method of intestine (the small and large intestines) regions and their stenosed parts caused by ileus from CT volumes. Since it is difficult for non-expert clinicians to find stenosed parts, the intestine and its stenosed parts should be visualized intuitively. Furthermore, the intestine regions of ileus cases are quite hard to be segmented. The proposed method…
▽ More
This paper presents a visualization method of intestine (the small and large intestines) regions and their stenosed parts caused by ileus from CT volumes. Since it is difficult for non-expert clinicians to find stenosed parts, the intestine and its stenosed parts should be visualized intuitively. Furthermore, the intestine regions of ileus cases are quite hard to be segmented. The proposed method segments intestine regions by 3D FCN (3D U-Net). Intestine regions are quite difficult to be segmented in ileus cases since the inside the intestine is filled with fluids. These fluids have similar intensities with intestinal wall on 3D CT volumes. We segment the intestine regions by using 3D U-Net trained by a weak annotation approach. Weak-annotation makes possible to train the 3D U-Net with small manually-traced label images of the intestine. This avoids us to prepare many annotation labels of the intestine that has long and winding shape. Each intestine segment is volume-rendered and colored based on the distance from its endpoint in volume rendering. Stenosed parts (disjoint points of an intestine segment) can be easily identified on such visualization. In the experiments, we showed that stenosed parts were intuitively visualized as endpoints of segmented regions, which are colored by red or blue.
△ Less
Submitted 2 March, 2020;
originally announced March 2020.