EEG-SVRec: An EEG Dataset with User Multidimensional Affective Engagement Labels in Short Video Recommendation

Shaorun Zhang [email protected] 0009-0006-3287-2956 DCST, Tsinghua UniversityQuan Cheng LaboratoryZhongguancun LaboratoryBei**gChina Zhiyu HeLABEL:first-co 0000-0003-1291-2739 DCST, Tsinghua UniversityQuan Cheng LaboratoryZhongguancun LaboratoryBei**gChina Ziyi Ye 0000-0002-5622-0235 DCST, Tsinghua UniversityQuan Cheng LaboratoryZhongguancun LaboratoryBei**gChina Peijie Sun 0000-0003-1291-2739 DCST, Tsinghua UniversityQuan Cheng LaboratoryZhongguancun LaboratoryBei**gChina Qingyao Ai 0000-0002-5030-709X DCST, Tsinghua UniversityQuan Cheng LaboratoryZhongguancun LaboratoryBei**gChina Min ZhangLABEL:correspond 0000-0003-3158-1920 DCST, Tsinghua UniversityQuan Cheng LaboratoryZhongguancun LaboratoryBei**gChina  and  Yiqun Liu 0000-0002-0140-4512 DCST, Tsinghua UniversityQuan Cheng LaboratoryZhongguancun LaboratoryBei**gChina
(2023)
Abstract.

In recent years, short video platforms have gained widespread popularity, making the quality of video recommendations crucial for retaining users. Existing recommendation systems primarily rely on behavioral data, which faces limitations when inferring user preferences due to issues such as data sparsity and noise from accidental interactions or personal habits. To address these challenges and provide a more comprehensive understanding of user affective experience and cognitive activity, we propose EEG-SVRec, the first EEG dataset with User Multidimensional Affective Engagement Labels in Short Video Recommendation.

The study involves 30 participants and collects 3,657 interactions, offering a rich dataset that can be used for a deeper exploration of user preference and cognitive activity. By incorporating self-assessment techniques and real-time, low-cost EEG signals, we offer a more detailed understanding user affective experiences (valence, arousal, immersion, interest, visual and auditory) and the cognitive mechanisms behind their behavior. We establish benchmarks for rating prediction by the recommendation algorithm, showing significant improvement with the inclusion of EEG signals. Furthermore, we demonstrate the potential of this dataset in gaining insights into the affective experience and cognitive activity behind user behaviors in recommender systems. This work presents a novel perspective for enhancing short video recommendation by leveraging the rich information contained in EEG signals and multidimensional affective engagement scores, paving the way for future research in short video recommendation systems.

Short video, EEG signal, Recommendation system.
copyright: acmcopyrightjournalyear: 2023doi: XXXXXXX.XXXXXXXconference: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval; July 14-18, 2024; Washington D.C., USAprice: 15.00isbn: 978-1-4503-XXXX-X/18/06ccs: Information systems Users and interactive retrieval

1. Introduction

In recent years, short videos have emerged as a popular medium for entertainment and communication across various social media platforms, attracting millions of users worldwide. These videos typically span from a few seconds to several minutes and encompass a broad spectrum of content. Short video platforms generally gather, process, and analyze user behavior data and video information. To enhance the recommendation quality and retain users, various recommendation strategies are employed, including interest-based recommendations (Gou et al., 2011; Ye et al., 2011), popularity-based recommendations (Bressan et al., 2016; Yang, 2016), and personalized recommendations (Zhang and Liu, 2021). The choice of a recommendation strategy can significantly impact users’ affective engagement while browsing short videos.

Existing short video recommendation systems mainly focus on behavioral metrics, such as likes, dwell time, view percentage, etc., to improve recommendation performance (Jannach et al., 2018; Shani and Gunawardana, 2011; Sun et al., 2023). These behavior data are usually collected from user logs and applied as implicit feedback signals to infer user preferences. Although these observed data usually contain abundant information, only considering existing information is not enough to gain a comprehensive understanding of users (Lu et al., 2019; Baeza-Yates, 2018). There still exist challenges in capturing user preference from behavioral data. Firstly, behavioral data, such as likes and comments, is usually sparse. Secondly, the presence of noise, resulting from accidental interactions or personal habits, can affect the reliability of the data.

In order to deeply understand users’ cognitive activities, we record EEG (Electroencephalograph) signals during short video browsing. EEG, as a neuroelectrical signal, containing rich spatial, temporal, and frequency band information about human experience, can be used to study the underlying neural mechanisms and can reflect relevant information about user cognition, emotion, and attention (Teplan et al., 2002; Li et al., 2022b; Moshfeghi et al., 2016; Ye et al., 2022a). Providing high temporal resolution data, the application of EEG technology in the Information Retrieval (IR) domain has been proven to be useful (Ye et al., 2022b; Davis III et al., 2021). At the same time, the latest developments in EEG recording devices are known for their high portability and low operating costs (Hu et al., 2019; Rashid et al., 2020), which are necessary for real-world application scenarios. The high temporal resolution of EEG data enables it to effectively address the real-time demands of short video recommendation scenarios.

To further understand the relationship between user behavior and EEG signals, it is essential to incorporate user affective experiences into the annotation of short videos. These affective experiences are from different dimensions. Emotion elicited by short videos plays a significant role in the browsing experience, which is commonly modeled via two dimensions: valence and arousal (Thayer, 1990). Both the degree to which short videos align with user interest and the level of immersion experienced by users while browsing short videos influence user behavior and perception. Besides, short videos serve as a combined visual and auditory medium, so understanding the impact of visual and auditory features on users’ perceptions can be helpful. Accordingly, we collect six Multidimensional Affective Engagement Scores (MAES), which are valence, arousal, immersion, interest, visual and auditory, and extract the visual and auditory features of the videos.

By employing self-assessment techniques, we obtain a more detailed and multidimensional perspective of user experience. Furthermore, real-time, low-cost EEG signals can be utilized to gain insights into users’ cognitive activity.

Therefore, We proposed to build EEG-VSRec111Dataset and codes are available at https://anonymous.4open.science/r/Z-SV-CFB1, an EEG dataset with user Multidimensional Affective Engagement in Short Video Recommendations. We conducted the user study where participants continuously viewed short videos in several sessions. After each session, the participants rated the MAES for each video. We recruited 30 participants and collected 3,657 interactions, each with temporal EEG signals during viewing as well as user behavior and multidimensional labels. Finally, we collected three types of data: user behavior log, EEG signals, and self-assessment of six MAES.

Subsequently, we present the statistical information of the dataset and show the rich information contained in the dataset. Besides, we discuss the possible applications for the dataset. We first show its impact on user understanding in the short video recommender system, with some primary discoveries thrown. We also establish benchmarks for rating prediction inferred from EEG signals and prevalent recommendation algorithms. Experiments show significant performance improvement with the inclusion of EEG signals, demonstrating the importance of introducing brain signals to recommender systems.

These are our main contributions:

  • We proposed the first dataset that contains EEG signals in a real scenario of watching short video streaming. On the basis of user behavior, we provided multidimensional affective engagement scores (MAES), which are valence, arousal, immersion, interest, visual and auditory, as explicit feedback.

  • We establish benchmarks for rating prediction by the recommendation algorithm. Comparative experiments show significant performance improvement with the inclusion of EEG signals, demonstrating the importance of introducing brain signals to recommender systems.

  • We show the perspective of understanding the affective experience and cognition activity behind user behaviors in the recommender system.

The remainder of this paper is organized as follows: we review related datasets in Section 2. Then we introduce our dataset and its collecting procedure in Section 3. Section 4 presents the combination and the statistical analysis for our dataset. Next, we conducted experiments to show the potential applications in Section 5. Finally, Section 6 and Section 7 discuss and conclude our work.

2. Related Datasets

In this section, we review the work of datasets in the short-video recommendation scenario and EEG datasets in affective computing and compare our dataset with theirs (Table 1).

Table 1. Comparison of the EEG-SVRec with other datasets in the video/music recommendation and the video affective computing domain. U&I represents user and item id. Peri.Bio represents peripheral biosignal (such as, heartbeat, eye tracking, ECG).
Domain Datasets Item/Stimulus U&I Impression Ratings Emotion Peri. Bio EEG
Recommendation (open domain) Movielens Movie
Toffee Short Video
KuaiRand Short Video
MMSSL Short Video
Tenrec News, Short Video
Last.fm Music
MUMR Music
Affective computing (closed domain) DEAP 1min music videos
SEED 4min movie clips
AMIGOS short and long movies
Recommendation (open domain) EEG-SVRec (ours) Short Video

2.1. Dataset in Short Video Recommendation

Short videos, a new type of online streaming media, have attracted increasing attention and have been one of the most popular internet applications in recent years. Thus, research on short video recommendations has gained traction, and some related datasets have been released.

The datasets in short video recommendations are usually collected from online platforms with user id, item id, and their interaction behavior. An unbiased sequential recommendation dataset KuaiRand (Gao et al., 2022) contains millions of intervened interactions on randomly exposed videos. Tenrec (Yuan et al., 2022) is a large-scale and multipurpose real-world dataset with the item either a news article or a video. MicroLens (Ni et al., 2023) consists of one billion user-item interactions with raw modality information about videos. MMSSL (Wei et al., 2023) is collected from the TikTok platform to log the viewed short videos of users. The multi-modal characteristics are visual, acoustic, and title textual features of videos. Some researchers conduct experiments on the Micro-Video dataset to validate their model (Liu et al., 2021). They construct a dataset by randomly sampling 100K users and their watched micro-videos over a period of two days. Other researchers crawled micro-videos from Jan 2017 to Jun 2018 from Toffee, a large-scale Chinese micro-video sharing platform (Wei et al., 2019). Though the item is not a short video, the dataset Movielens  (Harper and Konstan, 2015) interacting with the movie contains user ratings (ranged 1-5), which have large scale and have had a substantial impact on education, research, and industry.

Different from above, music dataset Last.fm-1k222https://www.last.fm/ represents the whole listening habits for nearly 1,000 users. MUMR (Li et al., 2022a) used a dataset in the music recommendation scenario with the collection of the contexts from low-cost smart bracelets. He et al. (2023) consider immersion in online short videos with psychological labels, video features, and EEG signals. In contrast to them, we provide the dataset with various multidimensional affective engagements, giving a deep understanding of users.

Since we collected from user studies, our data contains detailed video and audio features, behavior logs, user multidimensional affection engagement scores, and EEG and ECG signals.

2.2. EEG Dataset in Affective Computing

EEG (Electroencephalogram) has been popular in neuroscience and psychology since it is a non-invasive technique used to measure the electrical activity of the brain. Utilizing physiological signals to help understand people’s affection and cognition has become widespread in affection computing for its good balance between mechanistic exploration and real-world practical application. By analyzing EEG signals, researchers can identify patterns that are associated with different emotional states. Researchers collected EEG and peripheral physiological signals when using music, images, and videos as stimulation. Affection is annotated by the participants.

MIIR (Stober et al., 2015) record the EEG signals from 10 participants when listening to and imagining (by tap** the beat) 12 short music fragments. Then they rate their ta** ability and familiarity. Images can also be the stimulus. A dataset in neuromarketing containing EEG signals of 14 electrodes from 25 participants and their likes/dislikes on e-commerce products over 14 categories with 3 images each (Yadava et al., 2017).

The stimulation of videos includes both visual and auditory aspects, making the information more diverse and rich. DEAP (Koelstra et al., 2011) is the dataset of 32 participants whose EEG and peripheral physiological signals were recorded as each watched 40 one-minute excerpts of music videos. The SEED database (Zheng and Lu, 2015) contains EEG data of 15 subjects, which are collected via 62 EEG electrodes from the participants when they are watching 15 Chinese film clips with three types of emotions, i.e., negative, positive, and neutral. Moreover, AMIGOS (Miranda-Correa et al., 2018) collected EEG, ECG, and GSR from 40 participants when watching 16 short videos and 4 long videos. Participants annotate their emotions during watching these videos with self-assessment of valence, arousal, control, familiarity, liking, and basic emotions.

Datasets play a very important role in EEG affective computing. New methods and models have been proposed based on existing datasets to facilitate evaluation. However, the videos given to participants to watch are pre-selected for stimulating different emotions and are consistent between participants (closed domain). Participants were unable to actively influence the video playback, e.g., slide done at any time to switch to the next video. Unlike them, our experiments take place in a real online short video browsing scenario, where videos come from among millions of videos on the platform and are presented to participants through personalized recommendation algorithms or in non-personalized or randomized ways (open domain). During browsing, participants actively engage in behaviors such as swi** and liking videos.

These existing research efforts show the application potential of EEG in various fields. In the context of short video recommendations, there still has no dataset to find the correlation between physiological signals and affective engagement during real short video scenarios. What we add on top of these works is that we conduct a user study where participants browse short videos in a real scenario and collect their behavior, multidimensional affective engagement labels, and EEG and ECG signals.

3. Dataset Construction

This section mainly covers ethical and privacy, participants, video stimuli material, apparatus, and experimental procedure (browsing stage and labeling stage).

3.1. Ethical and Privacy

Our user study has underwent review and obtained approval from the institutional ethics committee, xxx University (approve number: xxx 333The protocol ID is hidden for double-blinded review). This study has undergone a rigorous ethical review process to ensure the protection of the participant’s rights. In compliance with established ethical guidelines, we have taken multiple measures to protect the participants’ privacy, including anonymizing the collected data and obtaining informed consent from all participants before the study. Furthermore, participants were fully informed about the study’s objectives, procedures, and potential outcomes. The EEG data collection method employed in this research is non-invasive and poses no harm to the participants. This approach ensures that the study adheres to ethical standards while maintaining the integrity of the research findings. As for the item in the dataset, we only provide anonymized video ids, encoded video tags, and extracted video characteristics (shown in Section 4.3).

Refer to caption
Figure 1. EEG and ECG data acquisition setup: (a) A participant wears an EEG cap while watching short videos in a laboratory setting (Image display has been approved). (b) International 10-20 electrode placement standard for EEG.

3.2. Participants

Refer to caption
Figure 2. The overall procedure of the lab study for data collection.

We recruited 30 college students aged between 18 and 30 (M=22.17, SD=2.20) for our study. The participant group consisted of 16 males and 14 females, majoring in various fields such as computer science, law, medicine, and sociology. All participants were familiar with at least one short video platform and used it at least once a day. To protect participants’ privacy, we provided each participant with a new account on the short video platform. Each participant was required to participate in two experimental settings: a 10-hour preference collection phase and a 3-hour lab study phase as Figure 1(a). (including preparation and rest time). Upon completion of the experiments, each participant received approximately 60 dollars in research compensation.

3.3. Video Stimuli Material

Participants browse short videos on a popular video platform, and all items are on the platform. The platform has two settings: personalized and non-personalized. Since they are all affected by the strategy of the platform, we present randomized videos as well. Thus, we categorized the short video stimuli to be presented to the participants into three video pools: personalized, non-personalized, and randomized.

The personalized video pool mainly consists of videos selected based on the preference information collected during the 10-hour preference acquisition phase for each participant, obtained through the short video platform’s algorithm. The non-personalized video pool, with personalized-off, disregards user interaction history and distributes videos may be based on their current popularity ranking. It is worth mentioning that the videos in personalized and randomized pools have a duration of 30-60 seconds, while the non-personalized video pool’s time restriction of 30-60 seconds was removed due to the distribution mechanism by platforms. The randomized video pool is sampled from the large video platform’s video collection, filtered by different popularity levels. We first divided the large video pool into three levels based on view counts, and then randomly selected 100 videos from each level. After that, to ensure the category richness and healthiness of the selected short videos, we filtered 25 videos in each group.

The selection of videos from these three pools results in different session compositions. Four distinct session modes were established: personalized mode, randomized mode, mixed mode, and non-personalized mode. It is clear that the personalized and randomized modes consist of 20-30 specific videos from their respective video pools, with a duration of 30-60 seconds each. In the mixed mode, an assortment of 20-30 videos is presented, with an equal proportion of personalized and randomized videos, maintaining a 1:1 ratio. Video sequences are random, ensuring a well-distributed and varied exposure for the study participants. The non-personalized mode involves extracting a certain number of videos from the non-personalized video pool.

3.4. Apparatus

We used a smartphone with a 6.67-inch screen and a 120Hz refresh rate, which connected to a stable local area network (LAN) Wi-Fi to ensure network stability. Participants were allowed to adjust the screen brightness and device volume to a comfortable level before the experiment. They can also adjust the seat position and the angle of the smartphone to a suitable position. During the browsing stage, participants were required to minimize body and head movements to ensure the high quality of the collected physiological signals in Figure 1(a). A Scan NuAmps Express system (Compumedics Ltd., VIC, Australia) along with a 64-channel Quik-Cap (Compumedical NeuroScan) was utilized for recording the participants’ EEG data in Figure 1(b) (Homan et al., 1987). Some electrode points were also used to eliminate head movement and other artifacts. The impedance of the EEG channels was calibrated to be under 10 kΩΩ\Omegaroman_Ω in the preparation step, and the sampling rate was set at 1,000 Hz.

3.5. Experiemental Procedure

Each participant underwent 10-hour preference information collection phase in a week, followed by laboratory experiment phase that included browsing and labelling stages. In the laboratory experiment, participants viewed 4 to 5 sessions of short videos, with each session comprising a 15-minute browsing stage and a roughly 10-minute labelling stage. After completing the video labelling for each session, participants were given a 5-minute rest before proceeding to the next session’s browsing stage.

During the browsing stage, participants watched sessions of short video sequences distributed from different video pools with each session comprised of 20-30 short videos. Throughout the short video browsing process, participants were allowed to interact with the videos primarily through liking and swi** away (the video). If participants enjoyed the video they were currently watching, they could click the like button at any time during playback. Additionally, if participants did not wish to continue watching the video, they were allowed to swipe away anytime. It’s noted that the video will be replayed when done without swi** away. Electroencephalogram (EEG) and electrocardiogram (ECG) physiological signals were continuously collected.

After each participant has completed browsing a short video sequence within a specific session, we conducted a video-level multidimensional affective engagement self-assessment labelling stage. Participants were given a brief recap of each video chronologically based on their browsing history. Subsequently, they rated each short video on a 5-point Likert scale across six multidimensional affective engagement indicators. the labelling instructions are given to the participants 1. The six dimensions are valence, arousal, immersion, interest, visual, and auditory. Valence represented the positive and negative aspects of emotions, while Arousal indicated the intensity of emotions. Immersion denoted the degree of the participant’s involvement while watching the video, and Interest indicated the extent to which the video aligned with the participant’s personal interests. Visual and Auditory scores described the presentation quality of visual elements (e.g., scenery, graphics) and auditory elements, (e.g., voices, music).

Our experiment collected MAES through questionnaires, gathering ratings for videos within each session after its completion. Participants were asked to recall the videos by viewing the first few seconds and to rate them across the six dimensions until they could adequately recall the video. In post-experiment interviews, participants reported that the number of videos per session did not cause memory difficulties, so they could generally recall the browsing history and complete labelling after watching the initial seconds of each video. Having completed the video labelling for each session, participants were given a 5-minute rest before proceeding to the browsing stage in the next session. Thus, the labelling stage generated a corresponding score for each of the six MAES for every short video.

Ultimately, we obtained three types of video-level data: browsing behavior logs, EEG and ECG signals, and multidimensional affective engagement self-assessment labelling.

4. Dataset Description

In this EEG dataset, we ultimately collected 3,657 interactions from 30 users involving 2,636 items (short videos). Due to the different participants watching the same short video in randomized mode, multiple interactions can be associated with the same item. Each interaction (U-I pair) corresponds to a related EEG and ECG segment. Additionally, each interaction is associated with a behavioral log and a self-assessment of MAES. To further describe the dataset, we introduce it from four aspects: the EEG signals, behavioral, self-assessment data, and characteristics of short videos.

Table 2. The Statistics of Dataset. Each interaction has corresponding MAEs and EEG signals.
#User #Item #Interaction #EEG datasize
EEG-SVRec 30 2,636 3,657 62GB

4.1. EEG statistics and preprocessing

Here, EEG data are collected through all 3,657 interactions. For each interaction, the size of EEG data is (Ch𝐶{Ch}italic_C italic_h, fs𝑓𝑠{fs}italic_f italic_s \cdot T𝑇{T}italic_T), where fs𝑓𝑠{fs}italic_f italic_s is the sample rate (1000 Hz), T𝑇{T}italic_T denotes the recording duration of the interaction, and Ch𝐶{Ch}italic_C italic_h is the number of electrode channels (62 in total). We preprocess EEG data extract features as follows:

The raw EEG data is subjected to a series of preprocessing steps to eliminate noise and artifacts and enhance the signal quality. The preprocessing pipeline comprises the following stages. First, baseline correction: We first perform baseline correction to remove any constant offsets or drifts in the EEG signals, ensuring that the baseline amplitude is zero. Second, rereferencing: Re-referencing employs the average of M1 and M2 mastoid electrodes as the new reference, minimizing potential bias and improving the signal-to-noise ratio. Third, filtering: Filtering applies a 0.5 Hz to 50 Hz band-pass filter to remove low-frequency drifts (¡0.5 Hz) and high-frequency noise (¿50 Hz), as well as 50 Hz powerline interference. Last, artifact removal: Artifact removal eliminates abnormal amplitude signals and artifacts induced by eye blinks or head movements.

After the preprocessing steps, we proceed to extract features from the cleaned EEG signals. In this study, we focus on the extraction of differential entropy (DE) as a feature, which has been shown to be useful in characterizing the complexity and information content of EEG signals (Duan et al., 2013). Firstly, we estimate power spectral density (noted as P(f)) using Welch’s method (Welch, 1967) (sampling frequency is 1000) based on sliding window. The window length is two divided by the lower bound of the frequency band. Secondly, we normalized for each band and calculated DE using the following formula:

(1) DE=P(f)log(P(f))𝑑f𝐷𝐸𝑃𝑓𝑃𝑓differential-d𝑓DE=-\int P(f)\log(P(f))\,dfitalic_D italic_E = - ∫ italic_P ( italic_f ) roman_log ( italic_P ( italic_f ) ) italic_d italic_f

The frequency bands are delta (0.5-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), beta (13-30 Hz), and gamma (25-50 Hz). Finally, for each second of EEG signals, we extract a DE of each electrode and each frequency band.

4.2. User Behavior log and self-assessment of MAES

Refer to caption
Figure 3. (a) Proportion of likes for short videos: overall and across three session modes (personalized, randomized, and mixed). (b) View percentage distribution across different session modes (View percentage is the viewing duration divided by the video duration. 1.0 represents viewing the video once.)

After integrating the log and label files and corresponding them to the EEG via timestamps, we obtained each subject’s interaction behavior (liking and viewing duration) and MAES for their video viewing. For each interaction, the UNIX timestamps of browsing are aligned with the start and end time of the corresponding piece of the psychological signals. The video sequence and session mode are also important. Thus, we provide the order of the video in the interaction sequence and session mode (Randomized, Personalized, and Mixed). As for the Mixed mode, we use further distinguish the personalized recommendation video from the random one.

In Figure 3 (a), we present the distribution of the proportion of likes for short videos in both the overall context and across three distinct session modes: personalized, randomized, and mixed (a combination of personalized and randomized). Notably, the like rate in the personalized mode (35.9%) and the mixed mode (35.4%) are relatively similar. In contrast, the like rate in the randomized mode (21.4%) falls below. Same as the like rate, view percentage in personalized mode and mixed mode in Figure 3 (b) is higher than randomized overall. It’s surprising that the performance of mixed mode is relatively similar to the personalized mode. Likes and view percentages are presumably influenced by contexts in the session. Focusing on user experience from behavior may shed a little light on recommender systems.

Refer to caption
Figure 4. The distribution for six MAES (valence, arousal, immersion, interest, visual and auditory).

In Figure 4, it can be observed that the distribution of the six MAES exhibits noticeable differences. It can be observed that valence and arousal, commonly utilized as two-dimensional indices in the field of emotion recognition, both exhibit a distinct distribution with 3 being the highest point. Immersion, interest, visual and auditory demonstrate a relatively uniform distribution compared to the former two, indicating a more effective differentiation in representing video content.

4.3. Characteristic of Short Videos

We meticulously extracted comprehensive video features, encompassing both visual and auditory aspects, to further investigate components related to audio and visual ratings.

For video featurization, we sampled each frame per second and computed an array of features. Specifically, we determined the mean (representing brightness) and standard deviation (representing contrast) by converting each frame to grayscale. Additionally, we assessed hue, saturation, value (in terms of HSV), Laplace variation, and color cast for each frame. Regarding audio, we initially extracted audio signals from the short videos using their native sampling rate. We then employed openSMILE to compute features from the ComParE2016 acoustic feature set (Schuller et al., 2016), maintaining the same sampling rate. Subsequently, we utilized the Audio Spectrogram Transformer (Gong et al., 2021), trained on AudioSet, to classify audio events with a sampling rate of 16,000 to comply with the classifier. If the event was classified as music, we employed Librosa to detect beats and determine the tempo.

5. Example applications

5.1. Impact on User Understanding in Recommendation

Refer to caption
Figure 5. Heatmap presents the correlations of behavior (liking, and view percentage) and MAES (valence, arousal, immersion, interest, visual, and auditory).

5.1.1. Analysis of MAES and Browsing Behavior

Refer to caption
Figure 6. The mean correlations (overall participants) of the MAES (emotion of valence and arousal, immersion, interest, and rating of visual and auditory, ranged 1-5) and behaviors (like and view percentage) with DE in the broad frequency bands of theta (4-8 Hz), alpha (8-12 Hz), beta (12-30 Hz), and gamma (30-45 Hz). The white circle marks the significant correlation (p <0.05).

Figure 5 presents the correlation between behavioral and MAES attributes. It can be observed that Liking has the strongest correlation with Interest (0.56), followed by Immersion (0.53) and Valence (0.51). This suggests that the users’ preferences are more closely related to their interest in the content and the degree of immersion they experience while viewing the video, rather than simply the valence or arousal induced by the content. On the other hand, the View Percentage attribute exhibits the highest correlation with Immersion (0.50) and Interest (0.52), indicating that the percentage of browsing is more likely to be influenced by their interest in the content and the level of immersion they experience. This further highlights the importance of considering users’ interests and immersion levels when designing recommender systems to improve user engagement and browsing experience. The above findings emphasize the need to consider users’ interests and the degree of immersion they experience when designing effective recommendation algorithms. We are expecting more findings to be discovered by the researchers.

5.1.2. The Relation of EEG with MAE and Behaviors

Figure 6 displays the topographical maps illustrating the correlations between EEG signals and the six MAES as well as the two behaviors. These maps reveal distinct correlation patterns for each MAES and behavior. Furthermore, some unique findings emerge, such as the consistent presence of strong correlations between gamma-band electrodes in the frontal lobe area across all six emotion annotations.

Gamma waves are known to play a critical role in numerous brain functions and cognitive processes, including attention, memory, perception, and consciousness (Li and Lu, 2009; Meador et al., 2002). The activation of gamma waves in the frontal lobe suggests the involvement of this region in the associated cognitive processes. As a key area of the brain, the frontal lobe is closely linked to higher cognitive functions, such as decision-making, planning, problem-solving, working memory, and attention control (Fuster, 2002). The observed activation of gamma waves in the frontal lobe may be indicative of the engagement of these higher cognitive functions during the tasks.

5.2. Recommendation in Terms of Various User Feedback Signals

Table 3. The recommendation performance (in terms of AUC) that leverages liking, interest, immersion, visual preference, and auditory preference as user feedback respectively. The two-sided t-test is conducted . indicates p-value ¡ 0.05. bold shows the higher result of the two settings.
Model Feature Like Immersion Interest Valence Arousal VisualPref AudioPref
FM id 0.7152 0.6776 0.6950 0.6348 0.6917 0.6685 0.6419
id+EEG 0.7312 0.6857 0.6933 0.6492 0.6929 0.6690 0.6675
DeepFM id 0.7331 0.6869 0.7005 0.6379 0.6930 0.6691 0.6600
id+EEG 0.7368 0.6927 0.7010 0.6586 0.7077 0.6711 0.6608
AFM id 0.7188 0.6774 0.6935 0.6406 0.6962 0.6736 0.6251
id+EEG 0.7236 0.6955 0.6910 0.6583 0.6898 0.6688 0.6578
WideDeep id 0.7324 0.7033 0.7027 0.6651 0.7066 0.6718 0.6735
id+EEG 0.7387 0.7056 0.7121 0.6660 0.7094 0.6978 0.6767
DCN-V2 id 0.6937 0.6190 0.6698 0.5855 0.6340 0.6443 0.6585
id+EEG 0.6924 0.6582 0.6802 0.6249 0.6715 0.6585 0.6440

The proposed EEG-SVRec is also feasible for personalized recommendation tasks. Beyond the traditional way of taking liking as the user feedback signal, various user feedback signals provided in the dataset can be leveraged as the ground truth. We conduct experiments for item recommendation task while leveraging liking, immersion, interest, valence, arousal, visual preference, and auditory preference as user feedback signals, respectively. As an example, we provide the benchmark for item recommendation with and without EEG information. We use the popular recommendation toolkit Recbole (Zhao et al., 2021) for different algorithms, which only support point-wise evaluations for context-aware recommendation models rather than ranking-based evaluation, and report AUC scores but not NDCG performances.

The dataset is split into training set, validation set, and test set by 7:1:2. As for the EEG data, we utilize the 310-dimensional (62 channels * 5 frequency bands) DE (EEG feature described in Section 4.1) corresponding to interactions and project them into an embedding through a fully connected layer. We tune hyperparameters and choose the best result for each setting (id and id+EEG).

From the Table 3, it is observed that in most instances, models incorporating EEG signals achieve superior results, suggesting the general potential of EEG signals in recommendation tasks. It is worth noting that only simple way of introducing EEG information is implemented in the benchmark experiments, which directly embeds EEG signals as features, and has already effectively enhanced recommendation performance. This verifies that EEG contains additional valuable information. Thus, leveraging EEG signals presumably assists recommender systems in better understanding user multidimensional affective engagement and behaviors, thereby providing better personalized recommendations.

EEG reflects the cognitive activity of viewing short videos, which can be used as auxiliary information to enhance representations. Thus, a natural idea is to enhance user and item embeddings with their corresponding EEG signals. The idea is widely used in existing recommendation models, such as review-based (Chen et al., 2018; Sun et al., 2020), social-based (Wu et al., 2019), knowledge graph-based (Chen et al., 2019), and visual-based (He and McAuley, 2016) models. However, EEG directly reflects the user’s brain activities, which can bring more in-depth user understanding beyond the above auxiliary information. This opens a novel avenueto enrich the representation of items and further help the recommender systems understand the users with incognizable, subject, and direct feedback with cognitive information. The comparison of EEG data and other information, as well as more sophisticated recommendation models, are left as future work.

6. Discussions and Limitations

6.1. Possible Research Directions

In this section, we discuss the potential applications of the EEG-SVRec dataset in various aspects of short video recommendation systems and beyond.

(1) Human-centric Evaluation Metrics: The dataset offers a more human-centric perspective on evaluation metrics, going beyond traditional measures such as dwell time and likes. It enables researchers to assess recommender systems based on their ability to enhance users’ overall experience, considering multidimensional aspects of user engagement, rather than merely maximizing utility metrics.

(2) Uncovering the Relationship Between User Behavior and Cognitive Acitvity: Utilizing the dataset to study user behavior and cognitive activities during the recommendation process can reveal insights into how brain activity can inform adjustments in recommendations. This knowledge potentially helps reduce information echo chambers and enhance content diversity, leading to a more balanced and varied user experience.

(3) EEG-guided Recommendation Algorithms: The EEG-SVRec dataset opens up opportunities to explore the development of EEG-guided recommendation algorithms that incorporate EEG signals for a deeper understanding of user preferences and behavior. By leveraging a smaller labeled EEG dataset alongside a larger unlabeled dataset, algorithms can potentially learn more accurate and personalized recommendations by generalizing the knowledge gained from EEG signals across a broader user base. Furthermore, EEG reflects the cognitive activity of viewing short videos which can be used as auxiliary information to enhance representation.

(4) Accessibility for Users with Disabilities in Short Video Streaming: The EEG-SVRec dataset has the potential to facilitate the development of more inclusive recommendation systems tailored for individuals with disabilities. By analyzing the unique cognitive and emotional experiences of these users through EEG data, algorithms can be adapted to better cater to their needs and preferences, ultimately improving their experience with short video recommendations.

In summary, the EEG-SVRec dataset presents an array of potential applications that can contribute to the development of more effective, personalized, and inclusive recommendation algorithms. By focusing on a more human-centric approach and leveraging the rich information provided by EEG signals, researchers and practitioners can drive innovation in the field of recommender systems and enhance user experiences across various contexts.

6.2. Limitations

In this study, we present the EEG-SVRec dataset. Despite its potential value for the recommender systems community, there are some limitations that should be considered:

(1) Sample Size: The dataset was constructed with a scale of 30 participants from the university, which may not fully capture the diversity of users on social media platforms. Although the sample size might seem limited, it is important to note that the high cost associated with EEG data collection can hinder the ability to gather larger sample sizes. Many published EEG datasets are with the same scale of participants universities (Koelstra et al., 2011; Savran et al., 2006; Zheng and Lu, 2015).

(2) Generalizability: EEG’s applicability in large-scale real-world scenarios could be challenging due to the required equipment and expertise. Meanwhile, personalized and randomized videos are 30-60s, which may differ from general contexts. The reason to choose 30-60s refers to Section 3.3. Despite this, investigating the temporal dynamics of user behavior and emotions in various recommendation settings would be a valuable direction for future research.

(3) Algorithmic bias: The EEG-SVRec dataset might contain biases from the underlying recommendation algorithms from the platform, which could impact the generalizability of the findings. However, we provide the interaction with randomized video as unbiased data for this purpose. It is essential for future research to identify and address any potential biases present in the dataset.

Despite these limitations, the EEG-SVRec dataset provides a valuable resource for exploring user behavior and emotions in short video recommendations and can inspire further research in this area.

7. Conclusion and Future Work

This paper introduces EEG-SVRec, a novel dataset including EEG and ECG signals, multidimensional affective engagement annotations, and user behavior data for short video recommendation. This dataset bridges a critical gap by providing insights into user intrinsic experience and behavior in real-world short video scenarios. Our key contributions include proposing the first EEG dataset in short video streaming scenario, collecting multidimensional affective engagement scores, and providing both implicit and explicit user feedback. We carried out a rigorous experimental process for 30 participants and obtained a dataset, which is highly versatile and applicable to various research problems. We establish benchmarks for rating prediction by including EEG signals and prevalent recommendation algorithms. Experimental results demonstrate the usefulness of EEG signals in recommendation scenarios. It is worth noting that our current application of EEG signals is primary, leaving room for future improvements.

For future work, it is expected that more sophisticated models, such as DGCNN (Song et al., 2018), could be employed to utilize electrode position information from the EEG signals and further improve recommendation performance on the EEG-SVRec dataset. By leveraging more advanced techniques, deeper insights into the role that EEG signals play in short video recommendation systems could be uncovered. Furthermore, the application of EEG and ECG signals could be expanded to a broader range of research areas, such as develo** more affective-centric evaluation metrics and applications for individuals with disabilities. Lastly, the dataset holds significant societal value in further exploring the occurrence and changes in user emotions and cognitive behavior within short video recommendation scenarios. We anticipate that our work will inspire further exploration and innovation in the field of recommendation and encourage researchers to delve into these potential applications.

References

  • (1)
  • Baeza-Yates (2018) Ricardo Baeza-Yates. 2018. Bias on the web. Commun. ACM 61, 6 (2018), 54–61.
  • Bressan et al. (2016) Marco Bressan, Stefano Leucci, Alessandro Panconesi, Prabhakar Raghavan, and Erisa Terolli. 2016. The limits of popularity-based recommendations, and the role of social ties. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 745–754.
  • Chen et al. (2018) Chong Chen, Min Zhang, Yiqun Liu, and Shao** Ma. 2018. Neural attentional rating regression with review-level explanations. In Proceedings of the 2018 world wide web conference. 1583–1592.
  • Chen et al. (2019) Zhongxia Chen, Xiting Wang, Xing Xie, Tong Wu, Guoqing Bu, Yining Wang, and Enhong Chen. 2019. Co-attentive multi-task learning for explainable recommendation.. In IJCAI. 2137–2143.
  • Davis III et al. (2021) Keith M Davis III, Michiel Spapé, and Tuukka Ruotsalo. 2021. Collaborative filtering with preferences inferred from brain signals. In Proceedings of the Web Conference 2021. 602–611.
  • Duan et al. (2013) Ruo-Nan Duan, Jia-Yi Zhu, and Bao-Liang Lu. 2013. Differential entropy feature for EEG-based emotion classification. In 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER). IEEE, 81–84.
  • Fuster (2002) Joaquín M Fuster. 2002. Frontal lobe and cognitive development. Journal of neurocytology 31, 3-5 (2002), 373–385.
  • Gao et al. (2022) Chongming Gao, Shijun Li, Yuan Zhang, Jiawei Chen, Biao Li, Wenqiang Lei, Peng Jiang, and Xiangnan He. 2022. KuaiRand: An Unbiased Sequential Recommendation Dataset with Randomly Exposed Videos. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3953–3957.
  • Gong et al. (2021) Yuan Gong, Yu-An Chung, and James Glass. 2021. Ast: Audio spectrogram transformer. arXiv preprint arXiv:2104.01778 (2021).
  • Gou et al. (2011) Liang Gou, Fang You, Jun Guo, Luqi Wu, and Xiaolong Zhang. 2011. Sfviz: interest-based friends exploration and recommendation in social networks. In Proceedings of the 2011 Visual Information Communication-International Symposium. 1–10.
  • Harper and Konstan (2015) F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19.
  • He and McAuley (2016) Ruining He and Julian McAuley. 2016. VBPR: visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
  • He et al. (2023) Zhiyu He, Shaorun Zhang, Peijie Sun, Jiayu Li, Xiaohui Xie, Min Zhang, and Yiqun Liu. 2023. Understanding User Immersion in Online Short Video Interaction. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 731–740.
  • Homan et al. (1987) Richard W Homan, John Herman, and Phillip Purdy. 1987. Cerebral location of international 10–20 system electrode placement. Electroencephalography and clinical neurophysiology 66, 4 (1987), 376–382.
  • Hu et al. (2019) Xin Hu, **g**g Chen, Fei Wang, and Dan Zhang. 2019. Ten challenges for EEG-based affective computing. Brain Science Advances 5, 1 (2019), 1–20.
  • Jannach et al. (2018) Dietmar Jannach, Lukas Lerche, and Markus Zanker. 2018. Recommending based on implicit feedback. In Social Information Access: Systems and Technologies. Springer, 510–569.
  • Koelstra et al. (2011) Sander Koelstra, Christian Muhl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. 2011. Deap: A database for emotion analysis; using physiological signals. IEEE transactions on affective computing 3, 1 (2011), 18–31.
  • Li et al. (2022a) Jiayu Li, Zhiyu He, Yumeng Cui, Chenyang Wang, Chong Chen, Chun Yu, Min Zhang, Yiqun Liu, and Shao** Ma. 2022a. Towards Ubiquitous Personalized Music Recommendation with Smart Bracelets. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1–34.
  • Li and Lu (2009) Mu Li and Bao-Liang Lu. 2009. Emotion classification based on gamma-band EEG. In 2009 Annual International Conference of the IEEE Engineering in medicine and biology society. IEEE, 1223–1226.
  • Li et al. (2022b) Xiang Li, Yazhou Zhang, Prayag Tiwari, Dawei Song, Bin Hu, Meihong Yang, Zhigang Zhao, Neeraj Kumar, and Pekka Marttinen. 2022b. EEG based emotion recognition: A tutorial and review. Comput. Surveys 55, 4 (2022), 1–57.
  • Liu et al. (2021) Yiyu Liu, Qian Liu, Yu Tian, Chang** Wang, Yanan Niu, Yang Song, and Chenliang Li. 2021. Concept-aware denoising graph neural network for micro-video recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 1099–1108.
  • Lu et al. (2019) Hongyu Lu, Min Zhang, Weizhi Ma, Yunqiu Shao, Yiqun Liu, and Shao** Ma. 2019. Quality effects on user preferences and behaviorsin mobile news streaming. In The World Wide Web Conference. 1187–1197.
  • Meador et al. (2002) Kimford J Meador, Patty G Ray, Javier R Echauz, David W Loring, and George J Vachtsevanos. 2002. Gamma coherence and conscious perception. Neurology 59, 6 (2002), 847–854.
  • Miranda-Correa et al. (2018) Juan Abdon Miranda-Correa, Mojtaba Khomami Abadi, Nicu Sebe, and Ioannis Patras. 2018. Amigos: A dataset for affect, personality and mood research on individuals and groups. IEEE Transactions on Affective Computing 12, 2 (2018), 479–493.
  • Moshfeghi et al. (2016) Yashar Moshfeghi, Peter Triantafillou, and Frank E Pollick. 2016. Understanding information need: An fMRI study. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 335–344.
  • Ni et al. (2023) Yongxin Ni, Yu Cheng, Xiangyan Liu, Junchen Fu, Youhua Li, Xiangnan He, Yongfeng Zhang, and Fajie Yuan. 2023. A content-driven micro-video recommendation dataset at scale. arXiv preprint arXiv:2309.15379 (2023).
  • Rashid et al. (2020) Mamunur Rashid, Norizam Sulaiman, Anwar PP Abdul Majeed, Rabiu Muazu Musa, Bifta Sama Bari, Sabira Khatun, et al. 2020. Current status, challenges, and possible solutions of EEG-based brain-computer interface: a comprehensive review. Frontiers in neurorobotics (2020), 25.
  • Savran et al. (2006) Arman Savran, Koray Ciftci, Guillaume Chanel, Javier Mota, Luong Hong Viet, Blent Sankur, Lale Akarun, Alice Caplier, and Michele Rombaut. 2006. Emotion detection in the loop from brain signals and facial images. In Proceedings of the eNTERFACE 2006 Workshop. Citeseer.
  • Schuller et al. (2016) Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, and Keelan Evanini. 2016. The interspeech 2016 computational paralinguistics challenge: Deception, sincerity & native language. In 17TH Annual Conference of the International Speech Communication Association (Interspeech 2016), Vols 1-5, Vol. 8. ISCA, 2001–2005.
  • Shani and Gunawardana (2011) Guy Shani and Asela Gunawardana. 2011. Evaluating recommendation systems. Recommender systems handbook (2011), 257–297.
  • Song et al. (2018) Tengfei Song, Wenming Zheng, Peng Song, and Zhen Cui. 2018. EEG emotion recognition using dynamical graph convolutional neural networks. IEEE Transactions on Affective Computing 11, 3 (2018), 532–541.
  • Stober et al. (2015) Sebastian Stober, Avital Sternin, Adrian M Owen, and Jessica A Grahn. 2015. Towards Music Imagery Information Retrieval: Introducing the OpenMIIR Dataset of EEG Recordings from Music Perception and Imagination.. In ISMIR. 763–769.
  • Sun et al. (2023) Peijie Sun, Le Wu, Kun Zhang, Xiangzhi Chen, and Meng Wang. 2023. Neighborhood-Enhanced Supervised Contrastive Learning for Collaborative Filtering. IEEE Transactions on Knowledge and Data Engineering (2023).
  • Sun et al. (2020) Peijie Sun, Le Wu, Kun Zhang, Yanjie Fu, Richang Hong, and Meng Wang. 2020. Dual learning for explainable recommendation: Towards unifying user preference prediction and review generation. In Proceedings of The Web Conference 2020. 837–847.
  • Teplan et al. (2002) Michal Teplan et al. 2002. Fundamentals of EEG measurement. Measurement science review 2, 2 (2002), 1–11.
  • Thayer (1990) Robert E Thayer. 1990. The biopsychology of mood and arousal. Oxford University Press.
  • Wei et al. (2023) Wei Wei, Chao Huang, Lianghao Xia, and Chuxu Zhang. 2023. Multi-Modal Self-Supervised Learning for Recommendation. arXiv preprint arXiv:2302.10632 (2023).
  • Wei et al. (2019) Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM international conference on multimedia. 1437–1445.
  • Welch (1967) Peter Welch. 1967. The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on audio and electroacoustics 15, 2 (1967), 70–73.
  • Wu et al. (2019) Le Wu, Peijie Sun, Yanjie Fu, Richang Hong, Xiting Wang, and Meng Wang. 2019. A neural influence diffusion model for social recommendation. In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. 235–244.
  • Yadava et al. (2017) Mahendra Yadava, Pradeep Kumar, Rajkumar Saini, Partha Pratim Roy, and Debi Prosad Dogra. 2017. Analysis of EEG signals and its application to neuromarketing. Multimedia Tools and Applications 76 (2017), 19087–19111.
  • Yang (2016) JungAe Yang. 2016. Effects of popularity-based news recommendations (“most-viewed”) on users’ exposure to online news. Media Psychology 19, 2 (2016), 243–271.
  • Ye et al. (2011) Mao Ye, Peifeng Yin, Wang-Chien Lee, and Dik-Lun Lee. 2011. Exploiting geographical influence for collaborative point-of-interest recommendation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 325–334.
  • Ye et al. (2022a) Ziyi Ye, ** Ma. 2022a. Towards a Better Understanding of Human Reading Comprehension with Brain Signals. In Proceedings of the ACM Web Conference 2022. 380–391.
  • Ye et al. (2022b) Ziyi Ye, ** Ma. 2022b. Why Don’t You Click: Understanding Non-Click Results in Web Search with Brain Signals. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 633–645.
  • Yuan et al. (2022) Guanghu Yuan, Fajie Yuan, Yudong Li, Beibei Kong, Shujie Li, Lei Chen, Min Yang, Chenyun Yu, Bo Hu, Zang Li, et al. 2022. Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems. arXiv preprint arXiv:2210.10629 (2022).
  • Zhang and Liu (2021) Min Zhang and Yiqun Liu. 2021. A commentary of TikTok recommendation algorithms in MIT Technology Review 2021. Fundamental Research 1, 6 (2021), 846–847.
  • Zhao et al. (2021) Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, et al. 2021. Recbole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4653–4664.
  • Zheng and Lu (2015) Wei-Long Zheng and Bao-Liang Lu. 2015. Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks. IEEE Transactions on autonomous mental development 7, 3 (2015), 162–175.