Skip to main content

Showing 1–13 of 13 results for author: Jiang, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.07662  [pdf, other

    eess.IV cs.AI cs.CV cs.LG q-bio.NC

    Progress Towards Decoding Visual Imagery via fNIRS

    Authors: Michel Adamic, Wellington Avelino, Anna Brandenberger, Bryan Chiang, Hunter Davis, Stephen Fay, Andrew Gregory, Aayush Gupta, Raphael Hotter, Grace Jiang, Fiona Leng, Stephen Polcyn, Thomas Ribeiro, Paul Scotti, Michelle Wang, Marley Xiong, Jonathan Xu

    Abstract: We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 2… ▽ More

    Submitted 22 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  2. arXiv:2308.09493  [pdf, other

    eess.AS cs.SD

    Generative Machine Listener

    Authors: Guanxin Jiang, Lars Villemoes, Arijit Biswas

    Abstract: We show how a neural network can be trained on individual intrusive listening test scores to predict a distribution of scores for each pair of reference and coded input stereo or binaural signals. We nickname this method the Generative Machine Listener (GML), as it is capable of generating an arbitrary amount of simulated listening test data. Compared to a baseline system using regression over mea… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted to 155th Audio Engineering Society (AES) Convention, New York, NY, USA, October 2023

  3. arXiv:2209.11666  [pdf, other

    eess.AS eess.SP

    Stereo InSE-NET: Stereo Audio Quality Predictor Transfer Learned from Mono InSE-NET

    Authors: Arijit Biswas, Guanxin Jiang

    Abstract: Automatic coded audio quality predictors are typically designed for evaluating single channels without considering any spatial aspects. With InSE-NET [1], we demonstrated mimicking a state-of-the-art coded audio quality metric (ViSQOL-v3 [2]) with deep neural networks (DNN) and subsequently improving it - completely with programmatically generated data. In this study, we take steps towards buildin… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Comments: Accepted to 153rd Audio Engineering Society (AES), New York, NY, USA, October 2022

  4. arXiv:2205.00434  [pdf, other

    cs.CV eess.IV

    Reinforced Swin-Convs Transformer for Underwater Image Enhancement

    Authors: Tingdi Ren, Haiyong Xu, Gangyi Jiang, Mei Yu, Ting Luo

    Abstract: Underwater Image Enhancement (UIE) technology aims to tackle the challenge of restoring the degraded underwater images due to light absorption and scattering. To address problems, a novel U-Net based Reinforced Swin-Convs Transformer for the Underwater Image Enhancement method (URSCT-UIE) is proposed. Specifically, with the deficiency of U-Net based on pure convolutions, we embedded the Swin Trans… ▽ More

    Submitted 1 May, 2022; originally announced May 2022.

    Comments: Submitted by NeurIPS 2022

  5. arXiv:2204.04059  [pdf, other

    eess.IV cs.CV cs.MM

    Deep Learning-Based Intra Mode Derivation for Versatile Video Coding

    Authors: Linwei Zhu, Yun Zhang, Na Li, Gangyi Jiang, Sam Kwong

    Abstract: In intra coding, Rate Distortion Optimization (RDO) is performed to achieve the optimal intra mode from a pre-defined candidate list. The optimal intra mode is also required to be encoded and transmitted to the decoder side besides the residual signal, where lots of coding bits are consumed. To further improve the performance of intra coding in Versatile Video Coding (VVC), an intelligent intra mo… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: 19 pages, 7 figures, submitted to ACM TOMM

  6. arXiv:2203.02794  [pdf

    cs.LG cs.CV eess.IV q-bio.GN

    Machine Learning Applications in Lung Cancer Diagnosis, Treatment and Prognosis

    Authors: Yawei Li, Xin Wu, ** Yang, Guoqian Jiang, Yuan Luo

    Abstract: The recent development of imaging and sequencing technologies enables systematic advances in the clinical study of lung cancer. Meanwhile, the human mind is limited in effectively handling and fully utilizing the accumulation of such enormous amounts of data. Machine learning-based approaches play a critical role in integrating and analyzing these large and complex datasets, which have extensively… ▽ More

    Submitted 25 March, 2022; v1 submitted 5 March, 2022; originally announced March 2022.

  7. arXiv:2112.12284  [pdf, other

    cs.MM eess.IV

    A Survey on Perceptually Optimized Video Coding

    Authors: Yun Zhang, Linwei Zhu, Gangyi Jiang, Sam Kwong, C. -C. Jay Kuo

    Abstract: To provide users with more realistic visual experiences, videos are develo** in the trends of Ultra High Definition (UHD), High Frame Rate (HFR), High Dynamic Range (HDR), Wide Color Gammut (WCG) and high clarity. However, the data amount of videos increases exponentially, which requires high efficiency video compression for storage and network transmission. Perceptually optimized video coding a… ▽ More

    Submitted 15 November, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: 36 pages, 12 figures, 6 tables, accepted by ACM Computing Surveys

  8. arXiv:2111.10099  [pdf, other

    eess.IV

    Varifocal Multiview Images: Capturing and Visual Tasks

    Authors: Kejun Wu, Qiong Liu, Guoan Li, Gangyi Jiang, You Yang

    Abstract: Multiview images have flexible field of view (FoV) but inflexible depth of field (DoF). To overcome the limitation of multiview images on visual tasks, in this paper, we present varifocal multiview (VFMV) images with flexible DoF. VFMV images are captured by focusing a scene on distinct depths by varying focal planes, and each view only focused on one single plane.Therefore, VFMV images contain mo… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

  9. arXiv:2108.13087  [pdf, other

    eess.AS

    InSE-NET: A Perceptually Coded Audio Quality Model based on CNN

    Authors: Guanxin Jiang, Arijit Biswas, Christian Bergler, Andreas Maier

    Abstract: Automatic coded audio quality assessment is an important task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen codecs, bitrates, content-types, and a lack of flexibility of existing approaches. One of the typical human-perception-related metrics, ViSQOL v3 (ViV3), has been proven to provide a high correlation to the quality scores rated by humans. In t… ▽ More

    Submitted 30 August, 2021; originally announced August 2021.

    Comments: Accepted to 151st Audio Engineering Society (AES), Las Vegas, NV, USA, October 2021

  10. arXiv:2011.13090  [pdf, other

    eess.AS

    Multi-QuartzNet: Multi-Resolution Convolution for Speech Recognition with Multi-Layer Feature Fusion

    Authors: Jian Luo, Jianzong Wang, Ning Cheng, Guilin Jiang, **g Xiao

    Abstract: In this paper, we propose an end-to-end speech recognition network based on Nvidia's previous QuartzNet model. We try to promote the model performance, and design three components: (1) Multi-Resolution Convolution Module, replaces the original 1D time-channel separable convolution with multi-stream convolutions. Each stream has a unique dilated stride on convolutional operations. (2) Channel-Wise… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

    Comments: will be presented in SLT 2021

  11. arXiv:2011.11315  [pdf, other

    eess.AS cs.SD

    End-to-end Silent Speech Recognition with Acoustic Sensing

    Authors: Jian Luo, Jianzong Wang, Ning Cheng, Guilin Jiang, **g Xiao

    Abstract: Silent speech interfaces (SSI) has been an exciting area of recent interest. In this paper, we present a non-invasive silent speech interface that uses inaudible acoustic signals to capture people's lip movements when they speak. We exploit the speaker and microphone of the smartphone to emit signals and listen to their reflections, respectively. The extracted phase features of these reflections a… ▽ More

    Submitted 23 November, 2020; originally announced November 2020.

    Comments: will be presented in SLT 2021

  12. arXiv:2007.03851  [pdf, other

    cs.CV cs.LG eess.IV

    SiENet: Siamese Expansion Network for Image Extrapolation

    Authors: Xiaofeng Zhang, Feng Chen, Cailing Wang, Songsong Wu, Ming Tao, Guo** Jiang

    Abstract: Different from image inpainting, image outpainting has relative less context in the image center to capture and more content at the image border to predict. Therefore, classical encoder-decoder pipeline of existing methods may not predict the outstretched unknown content perfectly. In this paper, a novel two-stage siamese adversarial model for image extrapolation, named Siamese Expansion Network (… ▽ More

    Submitted 7 July, 2020; originally announced July 2020.

  13. arXiv:2003.07139  [pdf, other

    cs.CV eess.IV

    Discriminative Feature and Dictionary Learning with Part-aware Model for Vehicle Re-identification

    Authors: Huibing Wang, **jia Peng, Guangqi Jiang, Fengqiang Xu, ** Fu

    Abstract: With the development of smart cities, urban surveillance video analysis will play a further significant role in intelligent transportation systems. Identifying the same target vehicle in large datasets from non-overlap** cameras should be highlighted, which has grown into a hot topic in promoting intelligent transportation systems. However, vehicle re-identification (re-ID) technology is a chall… ▽ More

    Submitted 16 March, 2020; originally announced March 2020.