Skip to main content

Showing 1–21 of 21 results for author: Shang, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.06620  [pdf, other

    eess.IV

    Encoder-Quantization-Motion-based Video Quality Metrics

    Authors: Yixu Chen, Zaixi Shang, Hai Wei, Yongjun Wu, Sriram Sethuraman

    Abstract: In an adaptive bitrate streaming application, the efficiency of video compression and the encoded video quality depend on both the video codec and the quality metric used to perform encoding optimization. The development of such a quality metric need large scale subjective datasets. In this work we merge several datasets into one to support the creation of a metric tailored for video compression a… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted at Picture Coding Symposium 2024

  2. arXiv:2311.14316  [pdf, other

    eess.SP cs.AI

    Windformer:Bi-Directional Long-Distance Spatio-Temporal Network For Wind Speed Prediction

    Authors: Xuewei Li, Zewen Shang, Zhiqiang Liu, Jian Yu, Wei Xiong, Mei Yu

    Abstract: Wind speed prediction is critical to the management of wind power generation. Due to the large range of wind speed fluctuations and wake effect, there may also be strong correlations between long-distance wind turbines. This difficult-to-extract feature has become a bottleneck for improving accuracy. History and future time information includes the trend of airflow changes, whether this dynamic in… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  3. arXiv:2310.12014  [pdf, other

    eess.AS

    Enhancing Spoofing Speech Detection Using Rhythm Information

    Authors: **gze Lu, Yuxiang Zhang, Wenchao Wang, Zengqiang Shang, Pengyuan Zhang

    Abstract: Current spoofing speech detection systems need more convincing evidence. In this paper, the flaws of rhythm information inherent in the TTS-generated speech are analyzed to increase the reliability of detection systems. TTS models take text as input and utilize acoustic models to predict rhythm information, which introduces artifacts in the rhythm information. By filtering out vocal tract response… ▽ More

    Submitted 25 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Five pages, two figures

  4. arXiv:2309.08285  [pdf, other

    eess.AS cs.SD

    One-Class Knowledge Distillation for Spoofing Speech Detection

    Authors: **gze Lu, Yuxiang Zhang, Wenchao Wang, Zengqiang Shang, Pengyuan Zhang

    Abstract: The detection of spoofing speech generated by unseen algorithms remains an unresolved challenge. One reason for the lack of generalization ability is traditional detecting systems follow the binary classification paradigm, which inherently assumes the possession of prior knowledge of spoofing speech. One-class methods attempt to learn the distribution of bonafide speech and are inherently suited t… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: submitted to icassp 2024

  5. arXiv:2309.08279  [pdf, other

    eess.AS cs.SD

    Improving Short Utterance Anti-Spoofing with AASIST2

    Authors: Yuxiang Zhang, **gze Lu, Zengqiang Shang, Wenchao Wang, Pengyuan Zhang

    Abstract: The wav2vec 2.0 and integrated spectro-temporal graph attention network (AASIST) based countermeasure achieves great performance in speech anti-spoofing. However, current spoof speech detection systems have fixed training and evaluation durations, while the performance degrades significantly during short utterance evaluation. To solve this problem, AASIST can be improved to AASIST2 by modifying th… ▽ More

    Submitted 3 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: 5 pages, 2 figures, accepted by ICASSP

  6. arXiv:2308.13365  [pdf, ps, other

    cs.SD eess.AS

    Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder

    Authors: Xuyuan Li, Zengqiang Shang, Peiyang Shi, Hua Hua, Ta Li, Pengyuan Zhang

    Abstract: Neural networks have been able to generate high-quality single-sentence speech. However, it remains a challenge concerning audio-book speech synthesis due to the intra-paragraph correlation of semantic and acoustic features as well as variable styles. In this paper, we propose a highly expressive paragraph speech synthesis system with a multi-step variational autoencoder, called EP-MSTTS. EP-MSTTS… ▽ More

    Submitted 11 June, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: accepted at Interspeech 2024

  7. arXiv:2304.13162  [pdf, other

    eess.IV cs.CV cs.MM

    HDR or SDR? A Subjective and Objective Study of Scaled and Compressed Videos

    Authors: Joshua P. Ebenezer, Zaixi Shang, Yixu Chen, Yongjun Wu, Hai Wei, Sriram Sethuraman, Alan C. Bovik

    Abstract: We conducted a large-scale study of human perceptual quality judgments of High Dynamic Range (HDR) and Standard Dynamic Range (SDR) videos subjected to scaling and compression levels and viewed on three different display devices. HDR videos are able to present wider color gamuts, better contrasts, and brighter whites and darker blacks than SDR videos. While conventional expectations are that HDR q… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

  8. arXiv:2304.13156  [pdf, other

    eess.IV cs.CV

    HDR-ChipQA: No-Reference Quality Assessment on High Dynamic Range Videos

    Authors: Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Sriram Sethuraman, Alan C. Bovik

    Abstract: We present a no-reference video quality model and algorithm that delivers standout performance for High Dynamic Range (HDR) videos, which we call HDR-ChipQA. HDR videos represent wider ranges of luminances, details, and colors than Standard Dynamic Range (SDR) videos. The growing adoption of HDR in massively scaled video networks has driven the need for video quality assessment (VQA) algorithms th… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

  9. Making Video Quality Assessment Models Robust to Bit Depth

    Authors: Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Sriram Sethuraman, Alan C. Bovik

    Abstract: We introduce a novel feature set, which we call HDRMAX features, that when included into Video Quality Assessment (VQA) algorithms designed for Standard Dynamic Range (SDR) videos, sensitizes them to distortions of High Dynamic Range (HDR) videos that are inadequately accounted for by these algorithms. While these features are not specific to HDR, and also augment the equality prediction performan… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: Published in IEEE Signal Processing Letters 2023

  10. arXiv:2302.13207  [pdf, other

    eess.IV cs.CV

    Stereo X-ray Tomography

    Authors: Zhenduo Shang, Thomas Blumensath

    Abstract: X-ray tomography is a powerful volumetric imaging technique, but detailed three dimensional (3D) imaging requires the acquisition of a large number of individual X-ray images, which is time consuming. For applications where spatial information needs to be collected quickly, for example, when studying dynamic processes, standard X-ray tomography is therefore not applicable. Inspired by stereo visio… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

  11. arXiv:2212.07020  [pdf, other

    cs.CV eess.IV

    Object Delineation in Satellite Images

    Authors: Zhuocheng Shang, Ahmed Eldawy

    Abstract: Machine learning is being widely applied to analyze satellite data with problems such as classification and feature detection. Unlike traditional image processing algorithms, geospatial applications need to convert the detected objects from a raster form to a geospatial vector form to further analyze it. This gem delivers a simple and light-weight algorithm for delineating the pixels that are mark… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

    Comments: 7 Pages, 4 Figures, 1 Table, to be submitted to the 4th ACM SIGSPATIAL International Workshop on Spatial Gems (SpatialGems 2022)

    MSC Class: H.4

  12. arXiv:2209.10005  [pdf, other

    eess.IV cs.CV

    Subjective Assessment of High Dynamic Range Videos Under Different Ambient Conditions

    Authors: Zaixi Shang, Joshua P. Ebenezer, Alan C. Bovik, Yongjun Wu, Hai Wei, Sriram Sethuraman

    Abstract: High Dynamic Range (HDR) videos can represent a much greater range of brightness and color than Standard Dynamic Range (SDR) videos and are rapidly becoming an industry standard. HDR videos have more challenging capture, transmission, and display requirements than legacy SDR videos. With their greater bit depth, advanced electro-optical transfer functions, and wider color gamuts, comes the need fo… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

  13. arXiv:2203.07659  [pdf

    eess.IV cs.CV

    Breast Cancer Molecular Subtypes Prediction on Pathological Images with Discriminative Patch Selecting and Multi-Instance Learning

    Authors: Hong Liu, Wen-Dong Xu, Zi-Hao Shang, Xiang-Dong Wang, Hai-Yan Zhou, Ke-Wen Ma, Huan Zhou, Jia-Lin Qi, Jia-Rui Jiang, Li-Lan Tan, Hui-Min Zeng, Hui-Juan Cai, Kuan-Song Wang, Yue-Liang Qian

    Abstract: Molecular subtypes of breast cancer are important references to personalized clinical treatment. For cost and labor savings, only one of the patient's paraffin blocks is usually selected for subsequent immunohistochemistry (IHC) to obtain molecular subtypes. Inevitable sampling error is risky due to tumor heterogeneity and could result in a delay in treatment. Molecular subtype prediction from con… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

  14. ChipQA: No-Reference Video Quality Prediction via Space-Time Chips

    Authors: Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Sriram Sethuraman, Alan C. Bovik

    Abstract: We propose a new model for no-reference video quality assessment (VQA). Our approach uses a new idea of highly-localized space-time (ST) slices called Space-Time Chips (ST Chips). ST Chips are localized cuts of video data along directions that \textit{implicitly} capture motion. We use perceptually-motivated bandpass and normalization models to first process the video data, and then select oriente… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: To appear in IEEE Transactions on Image Processing in Sep 2021

  15. arXiv:2106.08431  [pdf, ps, other

    eess.IV

    Assessment of Subjective and Objective Quality of Live Streaming Sports Videos

    Authors: Zaixi Shang, Joshua P. Ebenezer, Alan C. Bovik, Yongjun Wu, Hai Wei, Sriram Sethuraman

    Abstract: Video live streaming is gaining prevalence among video streaming services, especially for the delivery of popular sporting events. Many objective Video Quality Assessment (VQA) models have been developed to predict the perceptual quality of videos. Appropriate databases that exemplify the distortions encountered in live streaming videos are important to designing and learning objective VQA models.… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

  16. arXiv:2105.13282  [pdf, ps, other

    eess.SP cs.IT stat.AP

    Detection of a rank-one signal with limited training data

    Authors: Weijian Liu, Zhaojian Zhang, Jun Liu, Zheran Shang, Yong-Liang Wang

    Abstract: In this paper, we reconsider the problem of detecting a matrix-valued rank-one signal in unknown Gaussian noise, which was previously addressed for the case of sufficient training data. We relax the above assumption to the case of limited training data. We re-derive the corresponding generalized likelihood ratio test (GLRT) and two-step GLRT (2S--GLRT) based on certain unitary transformation on th… ▽ More

    Submitted 13 April, 2021; originally announced May 2021.

    Comments: This manuscript is accepted by Signal Processing

    Report number: SIGPRO_108120

  17. arXiv:2008.00031  [pdf, other

    eess.IV

    No-Reference Video Quality Assessment Using Space-Time Chips

    Authors: Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Alan C. Bovik

    Abstract: We propose a new prototype model for no-reference video quality assessment (VQA) based on the natural statistics of space-time chips of videos. Space-time chips (ST-chips) are a new, quality-aware feature space which we define as space-time localized cuts of video data in directions that are determined by the local motion flow. We use parametrized distribution fits to the bandpass histograms of sp… ▽ More

    Submitted 23 August, 2020; v1 submitted 31 July, 2020; originally announced August 2020.

    Comments: Accepted and to be published at IEEE MMSP (International Workshop on Multimedia Signal Processing) 2020

  18. Bridge Deck Delamination Segmentation Based on Aerial Thermography Through Regularized Grayscale Morphological Reconstruction and Gradient Statistics

    Authors: Chongsheng Cheng, Zhexiong Shang, Zhigang Shen

    Abstract: Environmental and surface texture-induced temperature variation across the bridge deck is a major source of errors in delamination detection through thermography. This type of external noise poses a significant challenge for conventional quantitative methods such as global thresholding and k-means clustering. An iterative top-down approach is proposed for delamination segmentation based on graysca… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

    Comments: arXiv admin note: text overlap with arXiv:1904.05723

    Journal ref: Infrared Physics & Technology, Volume 98, May 2019, Pages 240-249

  19. arXiv:1904.05723  [pdf

    eess.IV

    Enhancing Bridge Deck Delamination Detection Based on Aerial Thermography Through Grayscale Morphologic Reconstruction: A Case Study

    Authors: Chongsheng Cheng, Zhexiong Shang, Zhigang Shen

    Abstract: Environmental-induced temperature variations across the bridge deck were one of the major factors that degraded the performance of delamination detection through thermography. The non-uniformly distributed thermal background yields the assumption of most conventional quantitative methods used in practice such as global thresholding and k-means clustering. This study proposed a pre-processing metho… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

    Comments: Accepted as the presentation for 98th Annual Meeting of the Transportation Research Board (TRB)

  20. arXiv:1904.05509  [pdf

    eess.IV cs.CV

    CNN-Based Deep Architecture for Reinforced Concrete Delamination Segmentation Through Thermography

    Authors: Chongsheng Cheng, Zhexiong Shang, Zhigang Shen

    Abstract: Delamination assessment of the bridge deck plays a vital role for bridge health monitoring. Thermography as one of the nondestructive technologies for delamination detection has the advantage of efficient data acquisition. But there are challenges on the interpretation of data for accurate delamination shape profiling. Due to the environmental variation and the irregular presence of delamination s… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: Accepted for the 2019 ASCE International Conference on Computing in Civil Engineering

  21. arXiv:1904.05271  [pdf

    cs.RO eess.SY

    Indoor Testing and Simulation Platform for Close-distance Visual Inspection of Complex Structures using Micro Quadrotor UAV

    Authors: Zhexiong Shang, Zhigang Shen

    Abstract: In recent years, using drone, also known as unmanned aerial vehicle (UAV), in close-distance visual inspection has became an active area in many disciplines. However, many challenges still remain before we can achieve autonomous inspection, especially when inspecting complex structures. The complex civil structures, such as bridges, dams and wind turbines, are large-scale and geometrical complicat… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: 6 pages, 6 figures, accepted in ICCCBE 2018