-
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low latency Encoding
Authors:
Vignesh V Menon,
**gwen Zhu,
Prajit T Rajendran,
Samira Afzal,
Klaus Schoeffmann,
Patrick Le Callet,
Christian Timmerer
Abstract:
In HTTP adaptive live streaming applications, video segments are encoded at a fixed set of bitrate-resolution pairs known as bitrate ladder. Live encoders use the fastest available encoding configuration, referred to as preset, to ensure the minimum possible latency in video encoding. However, an optimized preset and optimized number of CPU threads for each encoding instance may result in (i) incr…
▽ More
In HTTP adaptive live streaming applications, video segments are encoded at a fixed set of bitrate-resolution pairs known as bitrate ladder. Live encoders use the fastest available encoding configuration, referred to as preset, to ensure the minimum possible latency in video encoding. However, an optimized preset and optimized number of CPU threads for each encoding instance may result in (i) increased quality and (ii) efficient CPU utilization while encoding. For low latency live encoders, the encoding speed is expected to be more than or equal to the video framerate. To this light, this paper introduces a Just Noticeable Difference (JND)-Aware Low latency Encoding Scheme (JALE), which uses random forest-based models to jointly determine the optimized encoder preset and thread count for each representation, based on video complexity features, the target encoding speed, the total number of available CPU threads, and the target encoder. Experimental results show that, on average, JALE yield a quality improvement of 1.32 dB PSNR and 5.38 VMAF points with the same bitrate, compared to the fastest preset encoding of the HTTP Live Streaming (HLS) bitrate ladder using x265 HEVC open-source encoder with eight CPU threads used for each representation. These enhancements are achieved while maintaining the desired encoding speed. Furthermore, on average, JALE results in an overall storage reduction of 72.70 %, a reduction in the total number of CPU threads used by 63.83 %, and a 37.87 % reduction in the overall encoding time, considering a JND of six VMAF points.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
A Survey on Energy Consumption and Environmental Impact of Video Streaming
Authors:
Samira Afzal,
Narges Mehran,
Zoha Azimi Ourimi,
Farzad Tashtarian,
Hadi Amirpour,
Radu Prodan,
Christian Timmerer
Abstract:
Climate change challenges require a notable decrease in worldwide greenhouse gas (GHG) emissions across technology sectors. Digital technologies, especially video streaming, accounting for most Internet traffic, make no exception. Video streaming demand increases with remote working, multimedia communication services (e.g., WhatsApp, Skype), video streaming content (e.g., YouTube, Netflix), video…
▽ More
Climate change challenges require a notable decrease in worldwide greenhouse gas (GHG) emissions across technology sectors. Digital technologies, especially video streaming, accounting for most Internet traffic, make no exception. Video streaming demand increases with remote working, multimedia communication services (e.g., WhatsApp, Skype), video streaming content (e.g., YouTube, Netflix), video resolution (4K/8K, 50 fps/60 fps), and multi-view video, making energy consumption and environmental footprint critical. This survey contributes to a better understanding of sustainable and efficient video streaming technologies by providing insights into the state-of-the-art and potential future directions for researchers, developers, and engineers, service providers, hosting platforms, and consumers. We widen this survey's focus on content provisioning and content consumption based on the observation that continuously active network equipment underneath video streaming consumes substantial energy independent of the transmitted data type. We propose a taxonomy of factors that affect the energy consumption in video streaming, such as encoding schemes, resource requirements, storage, content retrieval, decoding, and display. We identify notable weaknesses in video streaming that require further research for improved energy efficiency: (1) fixed bitrate ladders in HTTP live streaming; (2) inefficient hardware utilization of existing video players; (3) lack of comprehensive open energy measurement dataset covering various device types and coding parameters for reproducible research.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Content-Adaptive Variable Framerate Encoding Scheme for Green Live Streaming
Authors:
Vignesh V Menon,
Samira Afzal,
Prajit T Rajendran,
Klaus Schoeffmann,
Radu Prodan,
Christian Timmerer
Abstract:
Adaptive live video streaming applications use a fixed predefined configuration for the bitrate ladder with constant framerate and encoding presets in a session. However, selecting optimized framerates and presets for every bitrate ladder representation can enhance perceptual quality, improve computational resource allocation, and thus, the streaming energy efficiency. In particular, low framerate…
▽ More
Adaptive live video streaming applications use a fixed predefined configuration for the bitrate ladder with constant framerate and encoding presets in a session. However, selecting optimized framerates and presets for every bitrate ladder representation can enhance perceptual quality, improve computational resource allocation, and thus, the streaming energy efficiency. In particular, low framerates for low-bitrate representations reduce compression artifacts and decrease encoding energy consumption. In addition, an optimized preset may lead to improved compression efficiency. To this light, this paper proposes a Content-adaptive Variable Framerate (CVFR) encoding scheme, which offers two modes of operation: ecological (ECO) and high-quality (HQ). CVFR-ECO optimizes for the highest encoding energy savings by predicting the optimized framerate for each representation in the bitrate ladder. CVFR-HQ takes it further by predicting each representation's optimized framerate-encoding preset pair using low-complexity discrete cosine transform energy-based spatial and temporal features for compression efficiency and sustainable storage. We demonstrate the advantage of CVFR using the x264 open-source video encoder. The results show that CVFR-ECO yields an average PSNR and VMAF increase of 0.02 dB and 2.50 points, respectively, for the same bitrate, compared to the fastest preset highest framerate encoding. CVFR-ECO also yields an average encoding and storage energy consumption reduction of 34.54% and 76.24%, considering a just noticeable difference (JND) of six VMAF points. In comparison, CVFR-HQ yields an average increase in PSNR and VMAF of 2.43 dB and 10.14 points, respectively, for the same bitrate. Finally, CVFR-HQ resulted in an average reduction in storage energy consumption of 83.18%, considering a JND of six VMAF points.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Streaming
Authors:
Vignesh V Menon,
Reza Farahani,
Prajit T Rajendran,
Samira Afzal,
Klaus Schoeffmann,
Christian Timmerer
Abstract:
With the emergence of multiple modern video codecs, streaming service providers are forced to encode, store, and transmit bitrate ladders of multiple codecs separately, consequently suffering from additional energy costs for encoding, storage, and transmission. To tackle this issue, we introduce an online energy-efficient Multi-Codec Bitrate ladder Estimation scheme (MCBE) for adaptive video strea…
▽ More
With the emergence of multiple modern video codecs, streaming service providers are forced to encode, store, and transmit bitrate ladders of multiple codecs separately, consequently suffering from additional energy costs for encoding, storage, and transmission. To tackle this issue, we introduce an online energy-efficient Multi-Codec Bitrate ladder Estimation scheme (MCBE) for adaptive video streaming applications. In MCBE, quality representations within the bitrate ladder of new-generation codecs (e.g., High Efficiency Video Coding (HEVC), Alliance for Open Media Video 1 (AV1)) that lie below the predicted rate-distortion curve of the Advanced Video Coding (AVC) codec are removed. Moreover, perceptual redundancy between representations of the bitrate ladders of the considered codecs is also minimized based on a Just Noticeable Difference (JND) threshold. Therefore, random forest-based models predict the VMAF score of bitrate ladder representations of each codec. In a live streaming session where all clients support the decoding of AVC, HEVC, and AV1, MCBE achieves impressive results, reducing cumulative encoding energy by 56.45%, storage energy usage by 94.99%, and transmission energy usage by 77.61% (considering a JND of six VMAF points). These energy reductions are in comparison to a baseline bitrate ladder encoding based on current industry practice.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
A Comprehensive Survey on Affective Computing; Challenges, Trends, Applications, and Future Directions
Authors:
Sitara Afzal,
Haseeb Ali Khan,
Imran Ullah Khan,
Md. Jalil Piran,
Jong Weon Lee
Abstract:
As the name suggests, affective computing aims to recognize human emotions, sentiments, and feelings. There is a wide range of fields that study affective computing, including languages, sociology, psychology, computer science, and physiology. However, no research has ever been done to determine how machine learning (ML) and mixed reality (XR) interact together. This paper discusses the significan…
▽ More
As the name suggests, affective computing aims to recognize human emotions, sentiments, and feelings. There is a wide range of fields that study affective computing, including languages, sociology, psychology, computer science, and physiology. However, no research has ever been done to determine how machine learning (ML) and mixed reality (XR) interact together. This paper discusses the significance of affective computing, as well as its ideas, conceptions, methods, and outcomes. By using approaches of ML and XR, we survey and discuss recent methodologies in affective computing. We survey the state-of-the-art approaches along with current affective data resources. Further, we discuss various applications where affective computing has a significant impact, which will aid future scholars in gaining a better understanding of its significance and practical relevance.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
QVoice: Arabic Speech Pronunciation Learning Application
Authors:
Yassine El Kheir,
Fouad Khnaisser,
Shammur Absar Chowdhury,
Hamdy Mubarak,
Shazia Afzal,
Ahmed Ali
Abstract:
This paper introduces a novel Arabic pronunciation learning application QVoice, powered with end-to-end mispronunciation detection and feedback generator module. The application is designed to support non-native Arabic speakers in enhancing their pronunciation skills, while also hel** native speakers mitigate any potential influence from regional dialects on their Modern Standard Arabic (MSA) pr…
▽ More
This paper introduces a novel Arabic pronunciation learning application QVoice, powered with end-to-end mispronunciation detection and feedback generator module. The application is designed to support non-native Arabic speakers in enhancing their pronunciation skills, while also hel** native speakers mitigate any potential influence from regional dialects on their Modern Standard Arabic (MSA) pronunciation. QVoice employs various learning cues to aid learners in comprehending meaning, drawing connections with their existing knowledge of English language, and offers detailed feedback for pronunciation correction, along with contextual examples showcasing word usage. The learning cues featured in QVoice encompass a wide range of meaningful information, such as visualizations of phrases/words and their translations, as well as phonetic transcriptions and transliterations. QVoice provides pronunciation feedback at the character level and assesses performance at the word level.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
AutoDOViz: Human-Centered Automation for Decision Optimization
Authors:
Daniel Karl I. Weidele,
Shazia Afzal,
Abel N. Valente,
Cole Makuch,
Owen Cornec,
Long Vu,
Dharmashankar Subramanian,
Werner Geyer,
Rahul Nair,
Inge Vejsbjerg,
Radu Marinescu,
Paulito Palmes,
Elizabeth M. Daly,
Loraine Franke,
Daniel Haehn
Abstract:
We present AutoDOViz, an interactive user interface for automated decision optimization (AutoDO) using reinforcement learning (RL). Decision optimization (DO) has classically being practiced by dedicated DO researchers where experts need to spend long periods of time fine tuning a solution through trial-and-error. AutoML pipeline search has sought to make it easier for a data scientist to find the…
▽ More
We present AutoDOViz, an interactive user interface for automated decision optimization (AutoDO) using reinforcement learning (RL). Decision optimization (DO) has classically being practiced by dedicated DO researchers where experts need to spend long periods of time fine tuning a solution through trial-and-error. AutoML pipeline search has sought to make it easier for a data scientist to find the best machine learning pipeline by leveraging automation to search and tune the solution. More recently, these advances have been applied to the domain of AutoDO, with a similar goal to find the best reinforcement learning pipeline through algorithm selection and parameter tuning. However, Decision Optimization requires significantly more complex problem specification when compared to an ML problem. AutoDOViz seeks to lower the barrier of entry for data scientists in problem specification for reinforcement learning problems, leverage the benefits of AutoDO algorithms for RL pipeline search and finally, create visualizations and policy insights in order to facilitate the typical interactive nature when communicating problem formulation and solution proposals between DO experts and domain experts. In this paper, we report our findings from semi-structured expert interviews with DO practitioners as well as business consultants, leading to design requirements for human-centered automation for DO with RL. We evaluate a system implementation with data scientists and find that they are significantly more open to engage in DO after using our proposed solution. AutoDOViz further increases trust in RL agent models and makes the automated training and evaluation process more comprehensible. As shown for other automation in ML tasks, we also conclude automation of RL for DO can benefit from user and vice-versa when the interface promotes human-in-the-loop.
△ Less
Submitted 19 February, 2023;
originally announced February 2023.
-
SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation
Authors:
Yassine El Kheir,
Shammur Absar Chowdhury,
Ahmed Ali,
Hamdy Mubarak,
Shazia Afzal
Abstract:
The lack of labeled second language (L2) speech data is a major challenge in designing mispronunciation detection models. We introduce SpeechBlender - a fine-grained data augmentation pipeline for generating mispronunciation errors to overcome such data scarcity. The SpeechBlender utilizes varieties of masks to target different regions of phonetic units, and use the mixing factors to linearly inte…
▽ More
The lack of labeled second language (L2) speech data is a major challenge in designing mispronunciation detection models. We introduce SpeechBlender - a fine-grained data augmentation pipeline for generating mispronunciation errors to overcome such data scarcity. The SpeechBlender utilizes varieties of masks to target different regions of phonetic units, and use the mixing factors to linearly interpolate raw speech signals while augmenting pronunciation. The masks facilitate smooth blending of the signals, generating more effective samples than the `Cut/Paste' method. Our proposed technique achieves state-of-the-art results, with Speechocean762, on ASR dependent mispronunciation detection models at phoneme level, with a 2.0% gain in Pearson Correlation Coefficient (PCC) compared to the previous state-of-the-art [1]. Additionally, we demonstrate a 5.0% improvement at the phoneme level compared to our baseline. We also observed a 4.6% increase in F1-score with Arabic AraVoiceL2 testset.
△ Less
Submitted 12 July, 2023; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Towards Battery-Free Machine Learning and Inference in Underwater Environments
Authors:
Yuchen Zhao,
Sayed Saad Afzal,
Waleed Akbar,
Osvy Rodriguez,
Fan Mo,
David Boyle,
Fadel Adib,
Hamed Haddadi
Abstract:
This paper is motivated by a simple question: Can we design and build battery-free devices capable of machine learning and inference in underwater environments? An affirmative answer to this question would have significant implications for a new generation of underwater sensing and monitoring applications for environmental monitoring, scientific exploration, and climate/weather prediction.
To an…
▽ More
This paper is motivated by a simple question: Can we design and build battery-free devices capable of machine learning and inference in underwater environments? An affirmative answer to this question would have significant implications for a new generation of underwater sensing and monitoring applications for environmental monitoring, scientific exploration, and climate/weather prediction.
To answer this question, we explore the feasibility of bridging advances from the past decade in two fields: battery-free networking and low-power machine learning. Our exploration demonstrates that it is indeed possible to enable battery-free inference in underwater environments. We designed a device that can harvest energy from underwater sound, power up an ultra-low-power microcontroller and on-board sensor, perform local inference on sensed measurements using a lightweight Deep Neural Network, and communicate the inference result via backscatter to a receiver. We tested our prototype in an emulated marine bioacoustics application, demonstrating the potential to recognize underwater animal sounds without batteries. Through this exploration, we highlight the challenges and opportunities for making underwater battery-free inference and machine learning ubiquitous.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
Data Quality Toolkit: Automatic assessment of data quality and remediation for machine learning datasets
Authors:
Nitin Gupta,
Hima Patel,
Shazia Afzal,
Naveen Panwar,
Ruhi Sharma Mittal,
Shanmukha Guttula,
Abhinav Jain,
Lokesh Nagalapatti,
Sameep Mehta,
Sandeep Hans,
Pranay Lohia,
Aniya Aggarwal,
Diptikalyan Saha
Abstract:
The quality of training data has a huge impact on the efficiency, accuracy and complexity of machine learning tasks. Various tools and techniques are available that assess data quality with respect to general cleaning and profiling checks. However these techniques are not applicable to detect data issues in the context of machine learning tasks, like noisy labels, existence of overlap** classes…
▽ More
The quality of training data has a huge impact on the efficiency, accuracy and complexity of machine learning tasks. Various tools and techniques are available that assess data quality with respect to general cleaning and profiling checks. However these techniques are not applicable to detect data issues in the context of machine learning tasks, like noisy labels, existence of overlap** classes etc. We attempt to re-look at the data quality issues in the context of building a machine learning pipeline and build a tool that can detect, explain and remediate issues in the data, and systematically and automatically capture all the changes applied to the data. We introduce the Data Quality Toolkit for machine learning as a library of some key quality metrics and relevant remediation techniques to analyze and enhance the readiness of structured training datasets for machine learning projects. The toolkit can reduce the turn-around times of data preparation pipelines and streamline the data quality assessment process. Our toolkit is publicly available via IBM API Hub [1] platform, any developer can assess the data quality using the IBM's Data Quality for AI apis [2]. Detailed tutorials are also available on IBM Learning Path [3].
△ Less
Submitted 5 September, 2021; v1 submitted 12 August, 2021;
originally announced August 2021.
-
A low-overhead approach for self-sovereign identity in IoT
Authors:
Geovane Fedrecheski,
Laisa C. P. Costa,
Samira Afzal,
Jan M. Rabaey,
Roseli D. Lopes,
Marcelo K. Zuffo
Abstract:
We present a low-overhead mechanism for self-sovereign identification and communication of IoT agents in constrained networks. Our main contribution is to enable native use of Decentralized Identifiers (DIDs) and DID-based secure communication on constrained networks, whereas previous works either did not consider the issue or relied on proxy-based architectures. We propose a new extension to DIDs…
▽ More
We present a low-overhead mechanism for self-sovereign identification and communication of IoT agents in constrained networks. Our main contribution is to enable native use of Decentralized Identifiers (DIDs) and DID-based secure communication on constrained networks, whereas previous works either did not consider the issue or relied on proxy-based architectures. We propose a new extension to DIDs along with a more concise serialization method for DID metadata. Moreover, in order to reduce the security overhead over transmitted messages, we adopted a binary message envelope. We implemented these proposals within the context of Swarm Computing, an approach for decentralized IoT. Results showed that our proposal reduces the size of identity metadata in almost four times and security overhead up to five times. We observed that both techniques are required to enable operation on constrained networks.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
A Visual Analytics Based Decision Making Environment for COVID-19 Modeling and Visualization
Authors:
Shehzad Afzal,
Sohaib Ghani,
Hank C. Jenkins-Smith,
David S. Ebert,
Markus Hadwiger,
Ibrahim Hoteit
Abstract:
Public health officials dealing with pandemics like COVID-19 have to evaluate and prepare response plans. This planning phase requires not only looking into the spatiotemporal dynamics and impact of the pandemic using simulation models, but they also need to plan and ensure the availability of resources under different spread scenarios. To this end, we have developed a visual analytics environment…
▽ More
Public health officials dealing with pandemics like COVID-19 have to evaluate and prepare response plans. This planning phase requires not only looking into the spatiotemporal dynamics and impact of the pandemic using simulation models, but they also need to plan and ensure the availability of resources under different spread scenarios. To this end, we have developed a visual analytics environment that enables public health officials to model, simulate, and explore the spread of COVID-19 by supplying county-level information such as population, demographics, and hospital beds. This environment facilitates users to explore spatiotemporal model simulation data relevant to COVID-19 through a geospatial map with linked statistical views, apply different decision measures at different points in time, and understand their potential impact. Users can drill-down to county-level details such as the number of sicknesses, deaths, needs for hospitalization, and variations in these statistics over time. We demonstrate the usefulness of this environment through a use case study and also provide feedback from domain experts. We also provide details about future extensions and potential applications of this work.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Data Readiness Report
Authors:
Shazia Afzal,
Rajmohan C,
Manish Kesarwani,
Sameep Mehta,
Hima Patel
Abstract:
Data exploration and quality analysis is an important yet tedious process in the AI pipeline. Current practices of data cleaning and data readiness assessment for machine learning tasks are mostly conducted in an arbitrary manner which limits their reuse and results in loss of productivity. We introduce the concept of a Data Readiness Report as an accompanying documentation to a dataset that allow…
▽ More
Data exploration and quality analysis is an important yet tedious process in the AI pipeline. Current practices of data cleaning and data readiness assessment for machine learning tasks are mostly conducted in an arbitrary manner which limits their reuse and results in loss of productivity. We introduce the concept of a Data Readiness Report as an accompanying documentation to a dataset that allows data consumers to get detailed insights into the quality of input data. Data characteristics and challenges on various quality dimensions are identified and documented kee** in mind the principles of transparency and explainability. The Data Readiness Report also serves as a record of all data assessment operations including applied transformations. This provides a detailed lineage for the purpose of data governance and management. In effect, the report captures and documents the actions taken by various personas in a data readiness and assessment workflow. Overtime this becomes a repository of best practices and can potentially drive a recommendation system for building automated data readiness workflows on the lines of AutoML [8]. We anticipate that together with the Datasheets [9], Dataset Nutrition Label [11], FactSheets [1] and Model Cards [15], the Data Readiness Report makes significant progress towards Data and AI lifecycle documentation.
△ Less
Submitted 15 October, 2020; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Route Packing: Geospatially-Accurate Visualization of Route Networks
Authors:
Jieqiong Zhao,
Morteza Karimzadeh,
Hanye Xu,
Abish Malik,
Shehzad Afzal,
Guizhen Wang,
Niklas Elmqvist,
David S. Ebert
Abstract:
We present route packing, a novel (geo)visualization technique for displaying several routes simultaneously on a geographic map while preserving the geospatial layout, identity, directionality, and volume of individual routes. The technique collects variable-width route lines side by side while minimizing crossings, encodes them with categorical colors, and decorates them with glyphs to show their…
▽ More
We present route packing, a novel (geo)visualization technique for displaying several routes simultaneously on a geographic map while preserving the geospatial layout, identity, directionality, and volume of individual routes. The technique collects variable-width route lines side by side while minimizing crossings, encodes them with categorical colors, and decorates them with glyphs to show their directions. Furthermore, nodes representing sources and sinks use glyphs to indicate whether routes stop at the node or merely pass through it. We conducted a crowd-sourced user study investigating route tracing performance with road networks visualized using our route packing technique. Our findings highlight the visual parameters under which the technique yields optimal performance.
△ Less
Submitted 23 September, 2019;
originally announced September 2019.
-
A Holistic Survey of Wireless Multipath Video Streaming
Authors:
Samira Afzal,
Vanessa Testoni,
Christian Esteve Rothenberg,
Prakash Kolan,
Imed Bouazizi
Abstract:
Most of today's mobile devices are equipped with multiple network interfaces and one of the main bandwidth-hungry applications that would benefit from multipath communications is wireless video streaming. However, most of the current transport protocols do not match the requirements of video streaming applications or are not designed to address relevant issues, such as delay constraints, networks…
▽ More
Most of today's mobile devices are equipped with multiple network interfaces and one of the main bandwidth-hungry applications that would benefit from multipath communications is wireless video streaming. However, most of the current transport protocols do not match the requirements of video streaming applications or are not designed to address relevant issues, such as delay constraints, networks heterogeneity, and head-of-line blocking issues. This survey provides a holistic literature review of multipath wireless video streaming, shedding light on the different alternatives from an end-to-end layered stack perspective, unveiling trade-offs of each approach, and presenting a suitable taxonomy to classify the state-of-the-art. Finally, we discuss open issues and avenues for future work.
△ Less
Submitted 21 September, 2021; v1 submitted 14 June, 2019;
originally announced June 2019.