-
Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes
Authors:
Hyeonuk Nam,
Deokki Min,
Seungdeok Choi,
Inhan Choi,
Yong-Hwa Park
Abstract:
To tackle sound event detection (SED) task, we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency war** and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial…
▽ More
To tackle sound event detection (SED) task, we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency war** and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial dilated frequency dynamic convolution (PDFD) or squeeze-and-Excitation (SE) with time-frame frequency-wise SE (tfwSE). To train MAESTRO labels with coarse temporal resolution, we apply max pooling on prediction for the MAESTRO dataset. Using best ensemble model, we apply self training to obtain pseudo label from DESED weak set, DESED unlabeled set and AudioSet. AudioSet labels are filtered to focus on high-confidence pseudo labels and AudioSet pseudo labels are used to train on DESED labels only. We used change-detection-based sound event bounding boxes (cSEBBs) as post processing for ensemble models on self training and submission models.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Sleep Model -- A Sequence Model for Predicting the Next Sleep Stage
Authors:
Iksoo Choi,
Wonyong Sung
Abstract:
As sleep disorders are becoming more prevalent there is an urgent need to classify sleep stages in a less disturbing way.In particular, sleep-stage classification using simple sensors, such as single-channel electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), or electrocardiography (ECG) has gained substantial interest. In this study, we proposed a sleep model that pred…
▽ More
As sleep disorders are becoming more prevalent there is an urgent need to classify sleep stages in a less disturbing way.In particular, sleep-stage classification using simple sensors, such as single-channel electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), or electrocardiography (ECG) has gained substantial interest. In this study, we proposed a sleep model that predicts the next sleep stage and used it to improve sleep classification accuracy. The sleep models were built using sleep-sequence data and employed either statistical $n$-gram or deep neural network-based models. We developed beam-search decoding to combine the information from the sensor and the sleep models. Furthermore, we evaluated the performance of the $n$-gram and long short-term memory (LSTM) recurrent neural network (RNN)-based sleep models and demonstrated the improvement of sleep-stage classification using an EOG sensor. The developed sleep models significantly improved the accuracy of sleep-stage classification, particularly in the absence of an EEG sensor.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
HDR Video Reconstruction with Tri-Exposure Quad-Bayer Sensors
Authors:
Yitong Jiang,
Inchang Choi,
Jun Jiang,
**wei Gu
Abstract:
We propose a novel high dynamic range (HDR) video reconstruction method with new tri-exposure quad-bayer sensors. Thanks to the larger number of exposure sets and their spatially uniform deployment over a frame, they are more robust to noise and spatial artifacts than previous spatially varying exposure (SVE) HDR video methods. Nonetheless, the motion blur from longer exposures, the noise from sho…
▽ More
We propose a novel high dynamic range (HDR) video reconstruction method with new tri-exposure quad-bayer sensors. Thanks to the larger number of exposure sets and their spatially uniform deployment over a frame, they are more robust to noise and spatial artifacts than previous spatially varying exposure (SVE) HDR video methods. Nonetheless, the motion blur from longer exposures, the noise from short exposures, and inherent spatial artifacts of the SVE methods remain huge obstacles. Additionally, temporal coherence must be taken into account for the stability of video reconstruction. To tackle these challenges, we introduce a novel network architecture that divides-and-conquers these problems. In order to better adapt the network to the large dynamic range, we also propose LDR-reconstruction loss that takes equal contributions from both the highlighted and the shaded pixels of HDR frames. Through a series of comparisons and ablation studies, we show that the tri-exposure quad-bayer with our solution is more optimal to capture than previous reconstruction methods, particularly for the scenes with larger dynamic range and objects with motion.
△ Less
Submitted 19 March, 2021;
originally announced March 2021.
-
Quantifying the Effects of Prosody Modulation on User Engagement and Satisfaction in Conversational Systems
Authors:
Jason Ingyu Choi,
Eugene Agichtein
Abstract:
As voice-based assistants such as Alexa, Siri, and Google Assistant become ubiquitous, users increasingly expect to maintain natural and informative conversations with such systems. However, for an open-domain conversational system to be coherent and engaging, it must be able to maintain the user's interest for extended periods, without sounding boring or annoying. In this paper, we investigate on…
▽ More
As voice-based assistants such as Alexa, Siri, and Google Assistant become ubiquitous, users increasingly expect to maintain natural and informative conversations with such systems. However, for an open-domain conversational system to be coherent and engaging, it must be able to maintain the user's interest for extended periods, without sounding boring or annoying. In this paper, we investigate one natural approach to this problem, of modulating response prosody, i.e., changing the pitch and cadence of the response to indicate delight, sadness or other common emotions, as well as using pre-recorded interjections. Intuitively, this approach should improve the naturalness of the conversation, but attempts to quantify the effects of prosodic modulation on user satisfaction and engagement remain challenging. To accomplish this, we report results obtained from a large-scale empirical study that measures the effects of prosodic modulation on user behavior and engagement across multiple conversation domains, both immediately after each turn, and at the overall conversation level. Our results indicate that the prosody modulation significantly increases both immediate and overall user satisfaction. However, since the effects vary across different domains, we verify that prosody modulations do not substitute for coherent, informative content of the responses. Together, our results provide useful tools and insights for improving the naturalness of responses in conversational systems.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
A Passivity-based Nonlinear Admittance Control with Application to Powered Upper-limb Control under Unknown Environmental Interactions
Authors:
Min Jun Kim,
Woongyong Lee,
Jae Yeon Choi,
Goobong Chung,
Kyung-Lyong Han,
Il Seop Choi,
Christian Ott,
Wan Kyun Chung
Abstract:
This paper presents an admittance controller based on the passivity theory for a powered upper-limb exoskeleton robot which is governed by the nonlinear equation of motion. Passivity allows us to include a human operator and environmental interaction in the control loop. The robot interacts with the human operator via F/T sensor and interacts with the environment mainly via end-effectors. Although…
▽ More
This paper presents an admittance controller based on the passivity theory for a powered upper-limb exoskeleton robot which is governed by the nonlinear equation of motion. Passivity allows us to include a human operator and environmental interaction in the control loop. The robot interacts with the human operator via F/T sensor and interacts with the environment mainly via end-effectors. Although the environmental interaction cannot be detected by any sensors (hence unknown), passivity allows us to have natural interaction. An analysis shows that the behavior of the actual system mimics that of a nominal model as the control gain goes to infinity, which implies that the proposed approach is an admittance controller. However, because the control gain cannot grow infinitely in practice, the performance limitation according to the achievable control gain is also analyzed. The result of this analysis indicates that the performance in the sense of infinite norm increases linearly with the control gain. In the experiments, the proposed properties were verified using 1 degree-of-freedom testbench, and an actual powered upper-limb exoskeleton was used to lift and maneuver the unknown payload.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.