Search | arXiv e-print repository

Encoder-Quantization-Motion-based Video Quality Metrics

Authors: Yixu Chen, Zaixi Shang, Hai Wei, Yongjun Wu, Sriram Sethuraman

Abstract: In an adaptive bitrate streaming application, the efficiency of video compression and the encoded video quality depend on both the video codec and the quality metric used to perform encoding optimization. The development of such a quality metric need large scale subjective datasets. In this work we merge several datasets into one to support the creation of a metric tailored for video compression a… ▽ More In an adaptive bitrate streaming application, the efficiency of video compression and the encoded video quality depend on both the video codec and the quality metric used to perform encoding optimization. The development of such a quality metric need large scale subjective datasets. In this work we merge several datasets into one to support the creation of a metric tailored for video compression and scaling. We proposed a set of HEVC lightweight features to boost performance of the metrics. Our metrics can be computed from tightly coupled encoding process with 4% compute overhead or from the decoding process in real-time. The proposed method can achieve better correlation than VMAF and P.1204.3. It can extrapolate to different dynamic ranges, and is suitable for real-time video quality metrics delivery in the bitstream. The performance is verified by in-distribution and cross-dataset tests. This work paves the way for adaptive client-side heuristics, real-time segment optimization, dynamic bitrate cap**, and quality-dependent post-processing neural network switching, etc. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: Accepted at Picture Coding Symposium 2024

arXiv:2311.14316 [pdf, other]

Windformer:Bi-Directional Long-Distance Spatio-Temporal Network For Wind Speed Prediction

Authors: Xuewei Li, Zewen Shang, Zhiqiang Liu, Jian Yu, Wei Xiong, Mei Yu

Abstract: Wind speed prediction is critical to the management of wind power generation. Due to the large range of wind speed fluctuations and wake effect, there may also be strong correlations between long-distance wind turbines. This difficult-to-extract feature has become a bottleneck for improving accuracy. History and future time information includes the trend of airflow changes, whether this dynamic in… ▽ More Wind speed prediction is critical to the management of wind power generation. Due to the large range of wind speed fluctuations and wake effect, there may also be strong correlations between long-distance wind turbines. This difficult-to-extract feature has become a bottleneck for improving accuracy. History and future time information includes the trend of airflow changes, whether this dynamic information can be utilized will also affect the prediction effect. In response to the above problems, this paper proposes Windformer. First, Windformer divides the wind turbine cluster into multiple non-overlap** windows and calculates correlations inside the windows, then shifts the windows partially to provide connectivity between windows, and finally fuses multi-channel features based on detailed and global information. To dynamically model the change process of wind speed, this paper extracts time series in both history and future directions simultaneously. Compared with other current-advanced methods, the Mean Square Error (MSE) of Windformer is reduced by 0.5\% to 15\% on two datasets from NERL. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2310.12014 [pdf, other]

Enhancing Spoofing Speech Detection Using Rhythm Information

Authors: **gze Lu, Yuxiang Zhang, Wenchao Wang, Zengqiang Shang, Pengyuan Zhang

Abstract: Current spoofing speech detection systems need more convincing evidence. In this paper, the flaws of rhythm information inherent in the TTS-generated speech are analyzed to increase the reliability of detection systems. TTS models take text as input and utilize acoustic models to predict rhythm information, which introduces artifacts in the rhythm information. By filtering out vocal tract response… ▽ More Current spoofing speech detection systems need more convincing evidence. In this paper, the flaws of rhythm information inherent in the TTS-generated speech are analyzed to increase the reliability of detection systems. TTS models take text as input and utilize acoustic models to predict rhythm information, which introduces artifacts in the rhythm information. By filtering out vocal tract response, the remaining glottal flow with rhythm information retains detection ability for TTS-generated speech. Based on these analyses, a rhythm perturbation module is proposed to enhance the copy-synthesis data augmentation method. Fake utterances generated by the proposed method force the detecting model to pay attention to the artifacts in rhythm information and effectively improve the ability to detect TTS-generated speech of the anti-spoofing countermeasures. △ Less

Submitted 25 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: Five pages, two figures

arXiv:2309.08285 [pdf, other]

One-Class Knowledge Distillation for Spoofing Speech Detection

Authors: **gze Lu, Yuxiang Zhang, Wenchao Wang, Zengqiang Shang, Pengyuan Zhang

Abstract: The detection of spoofing speech generated by unseen algorithms remains an unresolved challenge. One reason for the lack of generalization ability is traditional detecting systems follow the binary classification paradigm, which inherently assumes the possession of prior knowledge of spoofing speech. One-class methods attempt to learn the distribution of bonafide speech and are inherently suited t… ▽ More The detection of spoofing speech generated by unseen algorithms remains an unresolved challenge. One reason for the lack of generalization ability is traditional detecting systems follow the binary classification paradigm, which inherently assumes the possession of prior knowledge of spoofing speech. One-class methods attempt to learn the distribution of bonafide speech and are inherently suited to the task where spoofing speech exhibits significant differences. However, training a one-class system using only bonafide speech is challenging. In this paper, we introduce a teacher-student framework to provide guidance for the training of a one-class model. The proposed one-class knowledge distillation method outperforms other state-of-the-art methods on the ASVspoof 21DF dataset and InTheWild dataset, which demonstrates its superior generalization ability. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: submitted to icassp 2024

arXiv:2309.08279 [pdf, other]

Improving Short Utterance Anti-Spoofing with AASIST2

Authors: Yuxiang Zhang, **gze Lu, Zengqiang Shang, Wenchao Wang, Pengyuan Zhang

Abstract: The wav2vec 2.0 and integrated spectro-temporal graph attention network (AASIST) based countermeasure achieves great performance in speech anti-spoofing. However, current spoof speech detection systems have fixed training and evaluation durations, while the performance degrades significantly during short utterance evaluation. To solve this problem, AASIST can be improved to AASIST2 by modifying th… ▽ More The wav2vec 2.0 and integrated spectro-temporal graph attention network (AASIST) based countermeasure achieves great performance in speech anti-spoofing. However, current spoof speech detection systems have fixed training and evaluation durations, while the performance degrades significantly during short utterance evaluation. To solve this problem, AASIST can be improved to AASIST2 by modifying the residual blocks to Res2Net blocks. The modified Res2Net blocks can extract multi-scale features and improve the detection performance for speech of different durations, thus improving the short utterance evaluation performance. On the other hand, adaptive large margin fine-tuning (ALMFT) has achieved performance improvement in short utterance speaker verification. Therefore, we apply Dynamic Chunk Size (DCS) and ALMFT training strategies in speech anti-spoofing to further improve the performance of short utterance evaluation. Experiments demonstrate that the proposed AASIST2 improves the performance of short utterance evaluation while maintaining the performance of regular evaluation on different datasets. △ Less

Submitted 3 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: 5 pages, 2 figures, accepted by ICASSP

arXiv:2308.13365 [pdf, ps, other]

Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder

Authors: Xuyuan Li, Zengqiang Shang, Peiyang Shi, Hua Hua, Ta Li, Pengyuan Zhang

Abstract: Neural networks have been able to generate high-quality single-sentence speech. However, it remains a challenge concerning audio-book speech synthesis due to the intra-paragraph correlation of semantic and acoustic features as well as variable styles. In this paper, we propose a highly expressive paragraph speech synthesis system with a multi-step variational autoencoder, called EP-MSTTS. EP-MSTTS… ▽ More Neural networks have been able to generate high-quality single-sentence speech. However, it remains a challenge concerning audio-book speech synthesis due to the intra-paragraph correlation of semantic and acoustic features as well as variable styles. In this paper, we propose a highly expressive paragraph speech synthesis system with a multi-step variational autoencoder, called EP-MSTTS. EP-MSTTS is the first VITS-based paragraph speech synthesis model and models the variable style of paragraph speech at five levels: frame, phoneme, word, sentence, and paragraph. We also propose a series of improvements to enhance the performance of this hierarchical model. In addition, we directly train EP-MSTTS on speech sliced by paragraph rather than sentence. Experiment results on the single-speaker French audiobook corpus released at Blizzard Challenge 2023 show EP-MSTTS obtains better performance than baseline models. △ Less

Submitted 11 June, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

Comments: accepted at Interspeech 2024

arXiv:2304.13162 [pdf, other]

HDR or SDR? A Subjective and Objective Study of Scaled and Compressed Videos

Authors: Joshua P. Ebenezer, Zaixi Shang, Yixu Chen, Yongjun Wu, Hai Wei, Sriram Sethuraman, Alan C. Bovik

Abstract: We conducted a large-scale study of human perceptual quality judgments of High Dynamic Range (HDR) and Standard Dynamic Range (SDR) videos subjected to scaling and compression levels and viewed on three different display devices. HDR videos are able to present wider color gamuts, better contrasts, and brighter whites and darker blacks than SDR videos. While conventional expectations are that HDR q… ▽ More We conducted a large-scale study of human perceptual quality judgments of High Dynamic Range (HDR) and Standard Dynamic Range (SDR) videos subjected to scaling and compression levels and viewed on three different display devices. HDR videos are able to present wider color gamuts, better contrasts, and brighter whites and darker blacks than SDR videos. While conventional expectations are that HDR quality is better than SDR quality, we have found subject preference of HDR versus SDR depends heavily on the display device, as well as on resolution scaling and bitrate. To study this question, we collected more than 23,000 quality ratings from 67 volunteers who watched 356 videos on OLED, QLED, and LCD televisions. Since it is of interest to be able to measure the quality of videos under these scenarios, e.g. to inform decisions regarding scaling, compression, and SDR vs HDR, we tested several well-known full-reference and no-reference video quality models on the new database. Towards advancing progress on this problem, we also developed a novel no-reference model called HDRPatchMAX, that uses both classical and bit-depth sensitive distortion statistics more accurately than existing metrics. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.13156 [pdf, other]

HDR-ChipQA: No-Reference Quality Assessment on High Dynamic Range Videos

Authors: Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Sriram Sethuraman, Alan C. Bovik

Abstract: We present a no-reference video quality model and algorithm that delivers standout performance for High Dynamic Range (HDR) videos, which we call HDR-ChipQA. HDR videos represent wider ranges of luminances, details, and colors than Standard Dynamic Range (SDR) videos. The growing adoption of HDR in massively scaled video networks has driven the need for video quality assessment (VQA) algorithms th… ▽ More We present a no-reference video quality model and algorithm that delivers standout performance for High Dynamic Range (HDR) videos, which we call HDR-ChipQA. HDR videos represent wider ranges of luminances, details, and colors than Standard Dynamic Range (SDR) videos. The growing adoption of HDR in massively scaled video networks has driven the need for video quality assessment (VQA) algorithms that better account for distortions on HDR content. In particular, standard VQA models may fail to capture conspicuous distortions at the extreme ends of the dynamic range, because the features that drive them may be dominated by distortions {that pervade the mid-ranges of the signal}. We introduce a new approach whereby a local expansive nonlinearity emphasizes distortions occurring at the higher and lower ends of the {local} luma range, allowing for the definition of additional quality-aware features that are computed along a separate path. These features are not HDR-specific, and also improve VQA on SDR video contents, albeit to a reduced degree. We show that this preprocessing step significantly boosts the power of distortion-sensitive natural video statistics (NVS) features when used to predict the quality of HDR content. In similar manner, we separately compute novel wide-gamut color features using the same nonlinear processing steps. We have found that our model significantly outperforms SDR VQA algorithms on the only publicly available, comprehensive HDR database, while also attaining state-of-the-art performance on SDR content. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.13092 [pdf, other]

doi 10.1109/LSP.2023.3268602

Making Video Quality Assessment Models Robust to Bit Depth

Authors: Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Sriram Sethuraman, Alan C. Bovik

Abstract: We introduce a novel feature set, which we call HDRMAX features, that when included into Video Quality Assessment (VQA) algorithms designed for Standard Dynamic Range (SDR) videos, sensitizes them to distortions of High Dynamic Range (HDR) videos that are inadequately accounted for by these algorithms. While these features are not specific to HDR, and also augment the equality prediction performan… ▽ More We introduce a novel feature set, which we call HDRMAX features, that when included into Video Quality Assessment (VQA) algorithms designed for Standard Dynamic Range (SDR) videos, sensitizes them to distortions of High Dynamic Range (HDR) videos that are inadequately accounted for by these algorithms. While these features are not specific to HDR, and also augment the equality prediction performances of VQA models on SDR content, they are especially effective on HDR. HDRMAX features modify powerful priors drawn from Natural Video Statistics (NVS) models by enhancing their measurability where they visually impact the brightest and darkest local portions of videos, thereby capturing distortions that are often poorly accounted for by existing VQA models. As a demonstration of the efficacy of our approach, we show that, while current state-of-the-art VQA models perform poorly on 10-bit HDR databases, their performances are greatly improved by the inclusion of HDRMAX features when tested on HDR and 10-bit distorted videos. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: Published in IEEE Signal Processing Letters 2023

arXiv:2302.13207 [pdf, other]

Stereo X-ray Tomography

Authors: Zhenduo Shang, Thomas Blumensath

Abstract: X-ray tomography is a powerful volumetric imaging technique, but detailed three dimensional (3D) imaging requires the acquisition of a large number of individual X-ray images, which is time consuming. For applications where spatial information needs to be collected quickly, for example, when studying dynamic processes, standard X-ray tomography is therefore not applicable. Inspired by stereo visio… ▽ More X-ray tomography is a powerful volumetric imaging technique, but detailed three dimensional (3D) imaging requires the acquisition of a large number of individual X-ray images, which is time consuming. For applications where spatial information needs to be collected quickly, for example, when studying dynamic processes, standard X-ray tomography is therefore not applicable. Inspired by stereo vision, in this paper, we develop X-ray imaging methods that work with two X-ray projection images. In this setting, without the use of additional strong prior information, we no longer have enough information to fully recover the 3D tomographic images. However, up to a point, we are nevertheless able to extract spatial locations of point and line features. From stereo vision, it is well known that, for a known imaging geometry, once the same point is identified in two images taken from different directions, then the point's location in 3D space is exactly specified. The challenge is the matching of points between images. As X-ray transmission images are fundamentally different from the surface reflection images used in standard computer vision, we here develop a different feature identification and matching approach. In fact, once point like features are identified, if there are limited points in the image, then they can often be matched exactly. In fact, by utilising a third observation from an appropriate direction, matching becomes unique. Once matched, point locations in 3D space are easily computed using geometric considerations. Linear features, with clear end points, can be located using a similar approach. △ Less

Submitted 25 February, 2023; originally announced February 2023.

arXiv:2212.07020 [pdf, other]

Object Delineation in Satellite Images

Authors: Zhuocheng Shang, Ahmed Eldawy

Abstract: Machine learning is being widely applied to analyze satellite data with problems such as classification and feature detection. Unlike traditional image processing algorithms, geospatial applications need to convert the detected objects from a raster form to a geospatial vector form to further analyze it. This gem delivers a simple and light-weight algorithm for delineating the pixels that are mark… ▽ More Machine learning is being widely applied to analyze satellite data with problems such as classification and feature detection. Unlike traditional image processing algorithms, geospatial applications need to convert the detected objects from a raster form to a geospatial vector form to further analyze it. This gem delivers a simple and light-weight algorithm for delineating the pixels that are marked by ML algorithms to extract geospatial objects from satellite images. The proposed algorithm is exact and users can further apply simplification and approximation based on the application needs. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: 7 Pages, 4 Figures, 1 Table, to be submitted to the 4th ACM SIGSPATIAL International Workshop on Spatial Gems (SpatialGems 2022)

MSC Class: H.4

arXiv:2209.10005 [pdf, other]

Subjective Assessment of High Dynamic Range Videos Under Different Ambient Conditions

Authors: Zaixi Shang, Joshua P. Ebenezer, Alan C. Bovik, Yongjun Wu, Hai Wei, Sriram Sethuraman

Abstract: High Dynamic Range (HDR) videos can represent a much greater range of brightness and color than Standard Dynamic Range (SDR) videos and are rapidly becoming an industry standard. HDR videos have more challenging capture, transmission, and display requirements than legacy SDR videos. With their greater bit depth, advanced electro-optical transfer functions, and wider color gamuts, comes the need fo… ▽ More High Dynamic Range (HDR) videos can represent a much greater range of brightness and color than Standard Dynamic Range (SDR) videos and are rapidly becoming an industry standard. HDR videos have more challenging capture, transmission, and display requirements than legacy SDR videos. With their greater bit depth, advanced electro-optical transfer functions, and wider color gamuts, comes the need for video quality algorithms that are specifically designed to predict the quality of HDR videos. Towards this end, we present the first publicly released large-scale subjective study of HDR videos. We study the effect of distortions such as compression and aliasing on the quality of HDR videos. We also study the effect of ambient illumination on perceptual quality of HDR videos by conducting the study in both a dark lab environment and a brighter living-room environment. A total of 66 subjects participated in the study and more than 20,000 opinion scores were collected, which makes this the largest in-lab study of HDR video quality ever. We anticipate that the dataset will be a valuable resource for researchers to develop better models of perceptual quality for HDR videos. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2203.07659 [pdf]

Breast Cancer Molecular Subtypes Prediction on Pathological Images with Discriminative Patch Selecting and Multi-Instance Learning

Authors: Hong Liu, Wen-Dong Xu, Zi-Hao Shang, Xiang-Dong Wang, Hai-Yan Zhou, Ke-Wen Ma, Huan Zhou, Jia-Lin Qi, Jia-Rui Jiang, Li-Lan Tan, Hui-Min Zeng, Hui-Juan Cai, Kuan-Song Wang, Yue-Liang Qian

Abstract: Molecular subtypes of breast cancer are important references to personalized clinical treatment. For cost and labor savings, only one of the patient's paraffin blocks is usually selected for subsequent immunohistochemistry (IHC) to obtain molecular subtypes. Inevitable sampling error is risky due to tumor heterogeneity and could result in a delay in treatment. Molecular subtype prediction from con… ▽ More Molecular subtypes of breast cancer are important references to personalized clinical treatment. For cost and labor savings, only one of the patient's paraffin blocks is usually selected for subsequent immunohistochemistry (IHC) to obtain molecular subtypes. Inevitable sampling error is risky due to tumor heterogeneity and could result in a delay in treatment. Molecular subtype prediction from conventional H&E pathological whole slide images (WSI) using AI method is useful and critical to assist pathologists pre-screen proper paraffin block for IHC. It's a challenging task since only WSI level labels of molecular subtypes can be obtained from IHC. Gigapixel WSIs are divided into a huge number of patches to be computationally feasible for deep learning. While with coarse slide-level labels, patch-based methods may suffer from abundant noise patches, such as folds, overstained regions, or non-tumor tissues. A weakly supervised learning framework based on discriminative patch selecting and multi-instance learning was proposed for breast cancer molecular subtype prediction from H&E WSIs. Firstly, co-teaching strategy was adopted to learn molecular subtype representations and filter out noise patches. Then, a balanced sampling strategy was used to handle the imbalance in subtypes in the dataset. In addition, a noise patch filtering algorithm that used local outlier factor based on cluster centers was proposed to further select discriminative patches. Finally, a loss function integrating patch with slide constraint information was used to finetune MIL framework on obtained discriminative patches and further improve the performance of molecular subty**. The experimental results confirmed the effectiveness of the proposed method and our models outperformed even senior pathologists, with potential to assist pathologists to pre-screen paraffin blocks for IHC in clinic. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2109.08726 [pdf, other]

doi 10.1109/TIP.2021.3112055

ChipQA: No-Reference Video Quality Prediction via Space-Time Chips

Authors: Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Sriram Sethuraman, Alan C. Bovik

Abstract: We propose a new model for no-reference video quality assessment (VQA). Our approach uses a new idea of highly-localized space-time (ST) slices called Space-Time Chips (ST Chips). ST Chips are localized cuts of video data along directions that \textit{implicitly} capture motion. We use perceptually-motivated bandpass and normalization models to first process the video data, and then select oriente… ▽ More We propose a new model for no-reference video quality assessment (VQA). Our approach uses a new idea of highly-localized space-time (ST) slices called Space-Time Chips (ST Chips). ST Chips are localized cuts of video data along directions that \textit{implicitly} capture motion. We use perceptually-motivated bandpass and normalization models to first process the video data, and then select oriented ST Chips based on how closely they fit parametric models of natural video statistics. We show that the parameters that describe these statistics can be used to reliably predict the quality of videos, without the need for a reference video. The proposed method implicitly models ST video naturalness, and deviations from naturalness. We train and test our model on several large VQA databases, and show that our model achieves state-of-the-art performance at reduced cost, without requiring motion computation. △ Less

Submitted 17 September, 2021; originally announced September 2021.

Comments: To appear in IEEE Transactions on Image Processing in Sep 2021

arXiv:2106.08431 [pdf, ps, other]

Assessment of Subjective and Objective Quality of Live Streaming Sports Videos

Authors: Zaixi Shang, Joshua P. Ebenezer, Alan C. Bovik, Yongjun Wu, Hai Wei, Sriram Sethuraman

Abstract: Video live streaming is gaining prevalence among video streaming services, especially for the delivery of popular sporting events. Many objective Video Quality Assessment (VQA) models have been developed to predict the perceptual quality of videos. Appropriate databases that exemplify the distortions encountered in live streaming videos are important to designing and learning objective VQA models.… ▽ More Video live streaming is gaining prevalence among video streaming services, especially for the delivery of popular sporting events. Many objective Video Quality Assessment (VQA) models have been developed to predict the perceptual quality of videos. Appropriate databases that exemplify the distortions encountered in live streaming videos are important to designing and learning objective VQA models. Towards making progress in this direction, we built a video quality database specifically designed for live streaming VQA research. The new video database is called the Laboratory for Image and Video Engineering (LIVE) Live stream Database. The LIVE Livestream Database includes 315 videos of 45 contents impaired by 6 types of distortions. We also performed a subjective quality study using the new database, whereby more than 12,000 human opinions were gathered from 40 subjects. We demonstrate the usefulness of the new resource by performing a holistic evaluation of the performance of current state-of-the-art (SOTA) VQA models. The LIVE Livestream database is being made publicly available for these purposes at https://live.ece.utexas.edu/research/LIVE_APV_Study/apv_index.html. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2105.13282 [pdf, ps, other]

Detection of a rank-one signal with limited training data

Authors: Weijian Liu, Zhaojian Zhang, Jun Liu, Zheran Shang, Yong-Liang Wang

Abstract: In this paper, we reconsider the problem of detecting a matrix-valued rank-one signal in unknown Gaussian noise, which was previously addressed for the case of sufficient training data. We relax the above assumption to the case of limited training data. We re-derive the corresponding generalized likelihood ratio test (GLRT) and two-step GLRT (2S--GLRT) based on certain unitary transformation on th… ▽ More In this paper, we reconsider the problem of detecting a matrix-valued rank-one signal in unknown Gaussian noise, which was previously addressed for the case of sufficient training data. We relax the above assumption to the case of limited training data. We re-derive the corresponding generalized likelihood ratio test (GLRT) and two-step GLRT (2S--GLRT) based on certain unitary transformation on the test data. It is shown that the re-derived detectors can work with low sample support. Moreover, in sample-abundant environments the re-derived GLRT is the same as the previously proposed GLRT and the re-derived 2S--GLRT has better detection performance than the previously proposed 2S--GLRT. Numerical examples are provided to demonstrate the effectiveness of the re-derived detectors. △ Less

Submitted 13 April, 2021; originally announced May 2021.

Comments: This manuscript is accepted by Signal Processing

Report number: SIGPRO_108120

arXiv:2008.00031 [pdf, other]

No-Reference Video Quality Assessment Using Space-Time Chips

Authors: Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Alan C. Bovik

Abstract: We propose a new prototype model for no-reference video quality assessment (VQA) based on the natural statistics of space-time chips of videos. Space-time chips (ST-chips) are a new, quality-aware feature space which we define as space-time localized cuts of video data in directions that are determined by the local motion flow. We use parametrized distribution fits to the bandpass histograms of sp… ▽ More We propose a new prototype model for no-reference video quality assessment (VQA) based on the natural statistics of space-time chips of videos. Space-time chips (ST-chips) are a new, quality-aware feature space which we define as space-time localized cuts of video data in directions that are determined by the local motion flow. We use parametrized distribution fits to the bandpass histograms of space-time chips to characterize quality, and show that the parameters from these models are affected by distortion and can hence be used to objectively predict the quality of videos. Our prototype method, which we call ChipQA-0, is agnostic to the types of distortion affecting the video, and is based on identifying and quantifying deviations from the expected statistics of natural, undistorted ST-chips in order to predict video quality. We train and test our resulting model on several large VQA databases and show that our model achieves high correlation against human judgments of video quality and is competitive with state-of-the-art models. △ Less

Submitted 23 August, 2020; v1 submitted 31 July, 2020; originally announced August 2020.

Comments: Accepted and to be published at IEEE MMSP (International Workshop on Multimedia Signal Processing) 2020

arXiv:1906.03723 [pdf]

doi 10.1016/j.infrared.2019.03.018

Bridge Deck Delamination Segmentation Based on Aerial Thermography Through Regularized Grayscale Morphological Reconstruction and Gradient Statistics

Authors: Chongsheng Cheng, Zhexiong Shang, Zhigang Shen

Abstract: Environmental and surface texture-induced temperature variation across the bridge deck is a major source of errors in delamination detection through thermography. This type of external noise poses a significant challenge for conventional quantitative methods such as global thresholding and k-means clustering. An iterative top-down approach is proposed for delamination segmentation based on graysca… ▽ More Environmental and surface texture-induced temperature variation across the bridge deck is a major source of errors in delamination detection through thermography. This type of external noise poses a significant challenge for conventional quantitative methods such as global thresholding and k-means clustering. An iterative top-down approach is proposed for delamination segmentation based on grayscale morphological reconstruction. A weight-decay function was used to regularize the reconstruction for regional maxima extraction. The mean and coefficient of variation of temperature gradient estimated from delamination boundaries were used for discrimination. The proposed approach was tested on a lab experiment and an in-service bridge deck. The result showed the ability of the framework to handle the non-uniform background situation which often occurred in practice and thus eliminates the need for inferencing the background required by existing methods. The gradient statistics of the delamination boundary in the thermal image were indicated as the valid criterion refine the segmentation under the proposed framework. Thus, improved performance was achieved compared to conventional methods. The parameter selection and the limitation of this approach were also discussed. △ Less

Submitted 9 June, 2019; originally announced June 2019.

Comments: arXiv admin note: text overlap with arXiv:1904.05723

Journal ref: Infrared Physics & Technology, Volume 98, May 2019, Pages 240-249

arXiv:1904.05723 [pdf]

Enhancing Bridge Deck Delamination Detection Based on Aerial Thermography Through Grayscale Morphologic Reconstruction: A Case Study

Authors: Chongsheng Cheng, Zhexiong Shang, Zhigang Shen

Abstract: Environmental-induced temperature variations across the bridge deck were one of the major factors that degraded the performance of delamination detection through thermography. The non-uniformly distributed thermal background yields the assumption of most conventional quantitative methods used in practice such as global thresholding and k-means clustering. This study proposed a pre-processing metho… ▽ More Environmental-induced temperature variations across the bridge deck were one of the major factors that degraded the performance of delamination detection through thermography. The non-uniformly distributed thermal background yields the assumption of most conventional quantitative methods used in practice such as global thresholding and k-means clustering. This study proposed a pre-processing method to estimate the thermal background through iterative grayscale morphologic reconstruction based on a pre-selected temperature contrast. After the estimation of the background, the thermal feature of delamination was kept in the residual image. A UAV-based nondestructive survey was carried out on an in-service bridge for a case study and two delamination quantization methods (threshold-based and clustering-based) were applied on both raw and residual thermal image. Results were compared and evaluated based on the hammer sounding test on the same bridge. The performance of detectability was noticeably improved while direct implementation of post-processing on raw image exhibited over- and under-estimation of delamination. The selection of pre-defined temperature contrast and stop** criterion of iteration were discussed. The study concluded the usefulness of the proposed method for the case study and further evaluation and parameter tuning are expected to generalize the method and procedure. △ Less

Submitted 11 April, 2019; originally announced April 2019.

Comments: Accepted as the presentation for 98th Annual Meeting of the Transportation Research Board (TRB)

arXiv:1904.05509 [pdf]

CNN-Based Deep Architecture for Reinforced Concrete Delamination Segmentation Through Thermography

Authors: Chongsheng Cheng, Zhexiong Shang, Zhigang Shen

Abstract: Delamination assessment of the bridge deck plays a vital role for bridge health monitoring. Thermography as one of the nondestructive technologies for delamination detection has the advantage of efficient data acquisition. But there are challenges on the interpretation of data for accurate delamination shape profiling. Due to the environmental variation and the irregular presence of delamination s… ▽ More Delamination assessment of the bridge deck plays a vital role for bridge health monitoring. Thermography as one of the nondestructive technologies for delamination detection has the advantage of efficient data acquisition. But there are challenges on the interpretation of data for accurate delamination shape profiling. Due to the environmental variation and the irregular presence of delamination size and depth, conventional processing methods based on temperature contrast fall short in accurate segmentation of delamination. Inspired by the recent development of deep learning architecture for image segmentation, the Convolutional Neural Network (CNN) based framework was investigated for the applicability of delamination segmentation under variations in temperature contrast and shape diffusion. The models were developed based on Dense Convolutional Network (DenseNet) and trained on thermal images collected for mimicked delamination in concrete slabs with different depths under experimental setup. The results suggested satisfactory performance of accurate profiling the delamination shapes. △ Less

Submitted 10 April, 2019; originally announced April 2019.

Comments: Accepted for the 2019 ASCE International Conference on Computing in Civil Engineering

arXiv:1904.05271 [pdf]

Indoor Testing and Simulation Platform for Close-distance Visual Inspection of Complex Structures using Micro Quadrotor UAV

Authors: Zhexiong Shang, Zhigang Shen

Abstract: In recent years, using drone, also known as unmanned aerial vehicle (UAV), in close-distance visual inspection has became an active area in many disciplines. However, many challenges still remain before we can achieve autonomous inspection, especially when inspecting complex structures. The complex civil structures, such as bridges, dams and wind turbines, are large-scale and geometrical complicat… ▽ More In recent years, using drone, also known as unmanned aerial vehicle (UAV), in close-distance visual inspection has became an active area in many disciplines. However, many challenges still remain before we can achieve autonomous inspection, especially when inspecting complex structures. The complex civil structures, such as bridges, dams and wind turbines, are large-scale and geometrical complicated. It requires sophisticated path planning algorithms to achieve close-distance inspection and, at the same time, avoid collisions. In practice, directly deploying the path planning result on such structures is error prone, costly, and full of hazards. In this paper, rely on micro quadrotor UAV, the authors present an affordable experimental platform for testing drone-based path planning result. The platform allows the users to conduct many path planning experiments at any time without worrying expensive and time consuming outdoor test flying. This platform is developed based on the bundle of Crazyflie, which includes Crazyflie 2.0 quadrotor, Crazyradio and loco positioning system (LPS). Equipped with an onboard micro FPV camera, the visual data can be lively streamed to the host computer during flight. The functions of manual configuration and waypoints control are explicitly designed in this platform to increase its flexibility and performance on path following and debugging. To evaluate the practicability of the proposed test platform, two existing drone-based path planning algorithms are tested. The results show that even though certain level of error existed, the quality of visual data and accuracy of path following are high enough for simulating most practical inspection applications. △ Less

Submitted 10 April, 2019; originally announced April 2019.

Comments: 6 pages, 6 figures, accepted in ICCCBE 2018

Showing 1–21 of 21 results for author: Shang, Z