-
Teacher-Student Learning based Low Complexity Relay Selection in Wireless Powered Communications
Authors:
Aysun Gurur Onalan,
Berkay Kopru,
Sinem Coleri
Abstract:
Radio Frequency Energy Harvesting (RF-EH) networks are key enablers of massive Internet-of-things by providing controllable and long-distance energy transfer to energy-limited devices. Relays, hel** either energy or information transfer, have been demonstrated to significantly improve the performance of these networks. This paper studies the joint relay selection, scheduling, and power control p…
▽ More
Radio Frequency Energy Harvesting (RF-EH) networks are key enablers of massive Internet-of-things by providing controllable and long-distance energy transfer to energy-limited devices. Relays, hel** either energy or information transfer, have been demonstrated to significantly improve the performance of these networks. This paper studies the joint relay selection, scheduling, and power control problem in multiple-source-multiple-relay RF-EH networks under nonlinear EH conditions. We first obtain the optimal solution to the scheduling and power control problem for the given relay selection. Then, the relay selection problem is formulated as a classification problem, for which two convolutional neural network (CNN) based architectures are proposed. While the first architecture employs conventional 2D convolution blocks and benefits from skip connections between layers; the second architecture replaces them with inception blocks, to decrease trainable parameter size without sacrificing accuracy for memory-constrained applications. To decrease the runtime complexity further, teacher-student learning is employed such that the teacher network is larger, and the student is a smaller size CNN-based architecture distilling the teacher's knowledge. A novel dichotomous search-based algorithm is employed to determine the best architecture for the student network. Our simulation results demonstrate that the proposed solutions provide lower complexity than the state-of-art iterative approaches without compromising optimality.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Role of Audio in Audio-Visual Video Summarization
Authors:
Ibrahim Shoer,
Berkay Kopru,
Engin Erzin
Abstract:
Video summarization attracts attention for efficient video representation, retrieval, and browsing to ease volume and traffic surge problems. Although video summarization mostly uses the visual channel for compaction, the benefits of audio-visual modeling appeared in recent literature. The information coming from the audio channel can be a result of audio-visual correlation in the video content. I…
▽ More
Video summarization attracts attention for efficient video representation, retrieval, and browsing to ease volume and traffic surge problems. Although video summarization mostly uses the visual channel for compaction, the benefits of audio-visual modeling appeared in recent literature. The information coming from the audio channel can be a result of audio-visual correlation in the video content. In this study, we propose a new audio-visual video summarization framework integrating four ways of audio-visual information fusion with GRU-based and attention-based networks. Furthermore, we investigate a new explainability methodology using audio-visual canonical correlation analysis (CCA) to better understand and explain the role of audio in the video summarization task. Experimental evaluations on the TVSum dataset attain F1 score and Kendall-tau score improvements for the audio-visual video summarization. Furthermore, splitting video content on TVSum and COGNIMUSE datasets based on audio-visual CCA as positively and negatively correlated videos yields a strong performance improvement over the positively correlated videos for audio-only and audio-visual video summarization.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
Extraction of Medication Names from Twitter Using Augmentation and an Ensemble of Language Models
Authors:
Igor Kulev,
Berkay Köprü,
Raul Rodriguez-Esteban,
Diego Saldana,
Yi Huang,
Alessandro La Torraca,
Elif Ozkirimli
Abstract:
The BioCreative VII Track 3 challenge focused on the identification of medication names in Twitter user timelines. For our submission to this challenge, we expanded the available training data by using several data augmentation techniques. The augmented data was then used to fine-tune an ensemble of language models that had been pre-trained on general-domain Twitter content. The proposed approach…
▽ More
The BioCreative VII Track 3 challenge focused on the identification of medication names in Twitter user timelines. For our submission to this challenge, we expanded the available training data by using several data augmentation techniques. The augmented data was then used to fine-tune an ensemble of language models that had been pre-trained on general-domain Twitter content. The proposed approach outperformed the prior state-of-the-art algorithm Kusuri and ranked high in the competition for our selected objective function, overlap** F1 score.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Affective Burst Detection from Speech using Kernel-fusion Dilated Convolutional Neural Networks
Authors:
Berkay Kopru,
Engin Erzin
Abstract:
As speech-interfaces are getting richer and widespread, speech emotion recognition promises more attractive applications. In the continuous emotion recognition (CER) problem, tracking changes across affective states is an important and desired capability. Although CER studies widely use correlation metrics in evaluations, these metrics do not always capture all the high-intensity changes in the af…
▽ More
As speech-interfaces are getting richer and widespread, speech emotion recognition promises more attractive applications. In the continuous emotion recognition (CER) problem, tracking changes across affective states is an important and desired capability. Although CER studies widely use correlation metrics in evaluations, these metrics do not always capture all the high-intensity changes in the affective domain. In this paper, we define a novel affective burst detection problem to accurately capture high-intensity changes of the affective attributes. For this problem, we formulate a two-class classification approach to isolate affective burst regions over the affective state contour. The proposed classifier is a kernel-fusion dilated convolutional neural network (KFDCNN) architecture driven by speech spectral features to segment the affective attribute contour into idle and burst sections. Experimental evaluations are performed on the RECOLA and CreativeIT datasets. The proposed KFDCNN is observed to outperform baseline feedforward neural networks on both datasets.
△ Less
Submitted 8 October, 2021;
originally announced October 2021.
-
Use of Affective Visual Information for Summarization of Human-Centric Videos
Authors:
Berkay Köprü,
Engin Erzin
Abstract:
Increasing volume of user-generated human-centric video content and their applications, such as video retrieval and browsing, require compact representations that are addressed by the video summarization literature. Current supervised studies formulate video summarization as a sequence-to-sequence learning problem and the existing solutions often neglect the surge of human-centric view, which inhe…
▽ More
Increasing volume of user-generated human-centric video content and their applications, such as video retrieval and browsing, require compact representations that are addressed by the video summarization literature. Current supervised studies formulate video summarization as a sequence-to-sequence learning problem and the existing solutions often neglect the surge of human-centric view, which inherently contains affective content. In this study, we investigate the affective-information enriched supervised video summarization task for human-centric videos. First, we train a visual input-driven state-of-the-art continuous emotion recognition model (CER-NET) on the RECOLA dataset to estimate emotional attributes. Then, we integrate the estimated emotional attributes and the high-level representations from the CER-NET with the visual information to define the proposed affective video summarization architectures (AVSUM). In addition, we investigate the use of attention to improve the AVSUM architectures and propose two new architectures based on temporal attention (TA-AVSUM) and spatial attention (SA-AVSUM). We conduct video summarization experiments on the TvSum database. The proposed AVSUM-GRU architecture with an early fusion of high level GRU embeddings and the temporal attention based TA-AVSUM architecture attain competitive video summarization performances by bringing strong performance improvements for the human-centric videos compared to the state-of-the-art in terms of F-score and self-defined face recall metrics.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Neural Network Based Sleep Phases Classification for Resource Constraint Environments
Authors:
Berkay Köprü,
Murat Aslan,
Alisher Kholmatov
Abstract:
Sleep is restoration process of the body. The efficiency of this restoration process is directly correlated to the amount of time spent at each sleep phase. Hence, automatic tracking of sleep via wearable devices has attracted both the researchers and industry. Current state-of-the-art sleep tracking solutions are memory and processing greedy and they require cloud or mobile phone connectivity. We…
▽ More
Sleep is restoration process of the body. The efficiency of this restoration process is directly correlated to the amount of time spent at each sleep phase. Hence, automatic tracking of sleep via wearable devices has attracted both the researchers and industry. Current state-of-the-art sleep tracking solutions are memory and processing greedy and they require cloud or mobile phone connectivity. We propose a memory efficient sleep tracking architecture which can work in the embedded environment without needing any cloud or mobile phone connection. In this study, a novel architecture is proposed that consists of a feature extraction and Artificial Neural Networks based stacking classifier. Besides, we discussed how to tackle with sequential nature of the sleep staging for the memory constraint environments through the proposed framework. To verify the system, a dataset is collected from 24 different subjects for 31 nights with a wrist worn device having 3-axis accelerometer (ACC) and photoplethysmogram (PPG) sensors. Over the collected dataset, the proposed classification architecture achieves 20\% and 14\% better F1 scores than its competitors. Apart from the superior performance, proposed architecture is a promising solution for resource constraint embedded systems by allocating only 4.2 kilobytes of memory (RAM).
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Multimodal Continuous Emotion Recognition using Deep Multi-Task Learning with Correlation Loss
Authors:
Berkay Köprü,
Engin Erzin
Abstract:
In this study, we focus on continuous emotion recognition using body motion and speech signals to estimate Activation, Valence, and Dominance (AVD) attributes. Semi-End-To-End network architecture is proposed where both extracted features and raw signals are fed, and this network is trained using multi-task learning (MTL) rather than the state-of-the-art single task learning (STL). Furthermore, co…
▽ More
In this study, we focus on continuous emotion recognition using body motion and speech signals to estimate Activation, Valence, and Dominance (AVD) attributes. Semi-End-To-End network architecture is proposed where both extracted features and raw signals are fed, and this network is trained using multi-task learning (MTL) rather than the state-of-the-art single task learning (STL). Furthermore, correlation losses, Concordance Correlation Coefficient (CCC) and Pearson Correlation Coefficient (PCC), are used as an optimization objective during the training. Experiments are conducted on CreativeIT and RECOLA database, and evaluations are performed using the CCC metric. To highlight the effect of MTL, correlation losses and multi-modality, we respectively compare the performance of MTL against STL, CCC loss against root mean square error (MSE) loss and, PCC loss, multi-modality against single modality. We observe significant performance improvements with MTL training over STL, especially for estimation of the valence. Furthermore, the CCC loss achieves more than 7% CCC improvements on CreativeIT, and 13% improvements on RECOLA against MSE loss.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.