Search | arXiv e-print repository

Deep Sound Field Reconstruction in Real Rooms: Introducing the ISOBEL Sound Field Dataset

Authors: Miklas Strøm Kristoffersen, Martin Bo Møller, Pablo Martínez-Nuevo, Jan Østergaard

Abstract: Knowledge of loudspeaker responses are useful in a number of applications, where a sound system is located inside a room that alters the listening experience depending on position within the room. Acquisition of sound fields for sound sources located in reverberant rooms can be achieved through labor intensive measurements of impulse response functions covering the room, or alternatively by means… ▽ More Knowledge of loudspeaker responses are useful in a number of applications, where a sound system is located inside a room that alters the listening experience depending on position within the room. Acquisition of sound fields for sound sources located in reverberant rooms can be achieved through labor intensive measurements of impulse response functions covering the room, or alternatively by means of reconstruction methods which can potentially require significantly fewer measurements. This paper extends evaluations of sound field reconstruction at low frequencies by introducing a dataset with measurements from four real rooms. The ISOBEL Sound Field dataset is publicly available, and aims to bridge the gap between synthetic and real-world sound fields in rectangular rooms. Moreover, the paper advances on a recent deep learning-based method for sound field reconstruction using a very low number of microphones, and proposes an approach for modeling both magnitude and phase response in a U-Net-like neural network architecture. The complex-valued sound field reconstruction demonstrates that the estimated room transfer functions are of high enough accuracy to allow for personalized sound zones with contrast ratios comparable to ideal room transfer functions using 15 microphones below 150 Hz. △ Less

Submitted 12 February, 2021; originally announced February 2021.

arXiv:2002.01554 [pdf, other]

Relaxed N-Pairs Loss for Context-Aware Recommendations of Television Content

Authors: Miklas S. Kristoffersen, Sven E. Shepstone, Zheng-Hua Tan

Abstract: This paper studies context-aware recommendations in the television domain by proposing a deep learning-based method for learning joint context-content embeddings (JCCE). The method builds on recent developments within recommendations using latent representations and deep metric learning, in order to effectively represent contextual settings of viewing situations as well as available content in a s… ▽ More This paper studies context-aware recommendations in the television domain by proposing a deep learning-based method for learning joint context-content embeddings (JCCE). The method builds on recent developments within recommendations using latent representations and deep metric learning, in order to effectively represent contextual settings of viewing situations as well as available content in a shared latent space. This embedding space is used for exploring relevant content in various viewing settings by applying an N-pairs loss objective as well as a relaxed variant proposed in this paper. Experiments confirm the recommendation ability of JCCE, achieving improvements when compared to state-of-the-art methods. Further experiments display useful structures in the learned embeddings that can be used for gaining valuable knowledge of underlying variables in the relationship between contextual settings and content properties. △ Less

Submitted 12 February, 2023; v1 submitted 4 February, 2020; originally announced February 2020.

arXiv:1909.06076 [pdf, other]

Deep Joint Embeddings of Context and Content for Recommendation

Authors: Miklas S. Kristoffersen, Jacob L. Wieland, Sven E. Shepstone, Zheng-Hua Tan, Vinoba Vinayagamoorthy

Abstract: This paper proposes a deep learning-based method for learning joint context-content embeddings (JCCE) with a view to context-aware recommendations, and demonstrate its application in the television domain. JCCE builds on recent progress within latent representations for recommendation and deep metric learning. The model effectively groups viewing situations and associated consumed content, based o… ▽ More This paper proposes a deep learning-based method for learning joint context-content embeddings (JCCE) with a view to context-aware recommendations, and demonstrate its application in the television domain. JCCE builds on recent progress within latent representations for recommendation and deep metric learning. The model effectively groups viewing situations and associated consumed content, based on supervision from 2.7 million viewing events. Experiments confirm the recommendation ability of JCCE, achieving improvements when compared to state-of-the-art methods. Furthermore, the approach shows meaningful structures in the learned representations that can be used to gain valuable insights of underlying factors in the relationship between contextual settings and content properties. △ Less

Submitted 12 November, 2019; v1 submitted 13 September, 2019; originally announced September 2019.

Comments: Accepted for CARS 2.0 - Context-Aware Recommender Systems Workshop @ RecSys'19

arXiv:1812.04949 [pdf, other]

Subjective Annotations for Vision-Based Attention Level Estimation

Authors: Andrea Coifman, Péter Rohoska, Miklas S. Kristoffersen, Sven E. Shepstone, Zheng-Hua Tan

Abstract: Attention level estimation systems have a high potential in many use cases, such as human-robot interaction, driver modeling and smart home systems, since being able to measure a person's attention level opens the possibility to natural interaction between humans and computers. The topic of estimating a human's visual focus of attention has been actively addressed recently in the field of HCI. How… ▽ More Attention level estimation systems have a high potential in many use cases, such as human-robot interaction, driver modeling and smart home systems, since being able to measure a person's attention level opens the possibility to natural interaction between humans and computers. The topic of estimating a human's visual focus of attention has been actively addressed recently in the field of HCI. However, most of these previous works do not consider attention as a subjective, cognitive attentive state. New research within the field also faces the problem of the lack of annotated datasets regarding attention level in a certain context. The novelty of our work is two-fold: First, we introduce a new annotation framework that tackles the subjective nature of attention level and use it to annotate more than 100,000 images with three attention levels and second, we introduce a novel method to estimate attention levels, relying purely on extracted geometric features from RGB and depth images, and evaluate it with a deep learning fusion framework. The system achieves an overall accuracy of 80.02%. Our framework and attention level annotations are made publicly available. △ Less

Submitted 24 January, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

Comments: 14th International Conference on Computer Vision Theory and Applications

arXiv:1812.01712 [pdf, other]

Multiview Based 3D Scene Understanding On Partial Point Sets

Authors: Ye Zhu, Sven Ewan Shepstone, Pablo Martínez-Nuevo, Miklas Strøm Kristoffersen, Fabien Moutarde, Zhuang Fu

Abstract: Deep learning within the context of point clouds has gained much research interest in recent years mostly due to the promising results that have been achieved on a number of challenging benchmarks, such as 3D shape recognition and scene semantic segmentation. In many realistic settings however, snapshots of the environment are often taken from a single view, which only contains a partial set of th… ▽ More Deep learning within the context of point clouds has gained much research interest in recent years mostly due to the promising results that have been achieved on a number of challenging benchmarks, such as 3D shape recognition and scene semantic segmentation. In many realistic settings however, snapshots of the environment are often taken from a single view, which only contains a partial set of the scene due to the field of view restriction of commodity cameras. 3D scene semantic understanding on partial point clouds is considered as a challenging task. In this work, we propose a processing approach for 3D point cloud data based on a multiview representation of the existing 360° point clouds. By fusing the original 360° point clouds and their corresponding 3D multiview representations as input data, a neural network is able to recognize partial point sets while improving the general performance on complete point sets, resulting in an overall increase of 31.9% and 4.3% in segmentation accuracy for partial and complete scene semantic understanding, respectively. This method can also be applied in a wider 3D recognition context such as 3D part segmentation. △ Less

Submitted 30 November, 2018; originally announced December 2018.

Comments: This paper has been submitted to IEEE Transactions on Neural Networks and Learning Systems

arXiv:1808.00337 [pdf, other]

doi 10.1109/TMM.2019.2944214

The Importance of Context When Recommending TV Content: Dataset and Algorithms

Authors: Miklas S. Kristoffersen, Sven E. Shepstone, Zheng-Hua Tan

Abstract: Home entertainment systems feature in a variety of usage scenarios with one or more simultaneous users, for whom the complexity of choosing media to consume has increased rapidly over the last decade. Users' decision processes are complex and highly influenced by contextual settings, but data supporting the development and evaluation of context-aware recommender systems are scarce. In this paper w… ▽ More Home entertainment systems feature in a variety of usage scenarios with one or more simultaneous users, for whom the complexity of choosing media to consume has increased rapidly over the last decade. Users' decision processes are complex and highly influenced by contextual settings, but data supporting the development and evaluation of context-aware recommender systems are scarce. In this paper we present a dataset of self-reported TV consumption enriched with contextual information of viewing situations. We show how choice of genre associates with, among others, the number of present users and users' attention levels. Furthermore, we evaluate the performance of predicting chosen genres given different configurations of contextual information, and compare the results to contextless predictions. The results suggest that including contextual features in the prediction cause notable improvements, and both temporal and social context show significant contributions. △ Less

Submitted 30 September, 2019; v1 submitted 30 July, 2018; originally announced August 2018.

Showing 1–6 of 6 results for author: Kristoffersen, M S