Skip to main content

Showing 1–12 of 12 results for author: Shan, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.12300  [pdf

    eess.IV cs.CV q-bio.NC

    IR2QSM: Quantitative Susceptibility Map** via Deep Neural Networks with Iterative Reverse Concatenations and Recurrent Modules

    Authors: Min Li, Chen Chen, Zhuang Xiong, Ying Liu, Pengfei Rong, Shanshan Shan, Feng Liu, Hongfu Sun, Yang Gao

    Abstract: Quantitative susceptibility map** (QSM) is an MRI phase-based post-processing technique to extract the distribution of tissue susceptibilities, demonstrating significant potential in studying neurological diseases. However, the ill-conditioned nature of dipole inversion makes QSM reconstruction from the tissue field prone to noise and artifacts. In this work, we propose a novel deep learning-bas… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 9 figures

  2. arXiv:2404.09436  [pdf

    physics.med-ph eess.IV

    Image Reconstruction with B0 Inhomogeneity using an Interpretable Deep Unrolled Network on an Open-bore MRI-Linac

    Authors: Shanshan Shan, Yang Gao, David E. J. Waddington, Hongli Chen, Brendan Whelan, Paul Z. Y. Liu, Yaohui Wang, Chunyi Liu, Hong** Gan, Mingyuan Gao, Feng Liu

    Abstract: MRI-Linac systems require fast image reconstruction with high geometric fidelity to localize and track tumours for radiotherapy treatments. However, B0 field inhomogeneity distortions and slow MR acquisition potentially limit the quality of the image guidance and tumour treatments. In this study, we develop an interpretable unrolled network, referred to as RebinNet, to reconstruct distortion-free… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  3. arXiv:2311.14275  [pdf, other

    cs.CV cs.SD eess.AS

    Cooperative Dual Attention for Audio-Visual Speech Enhancement with Facial Cues

    Authors: Feixiang Wang, Shuang Yang, Shiguang Shan, Xilin Chen

    Abstract: In this work, we focus on leveraging facial cues beyond the lip region for robust Audio-Visual Speech Enhancement (AVSE). The facial region, encompassing the lip region, reflects additional speech-related attributes such as gender, skin color, nationality, etc., which contribute to the effectiveness of AVSE. However, static and dynamic speech-unrelated attributes also exist, causing appearance cha… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: Accepted to BMVC 2023 15 pages, 2 figures

  4. Plug-and-Play Latent Feature Editing for Orientation-Adaptive Quantitative Susceptibility Map** Neural Networks

    Authors: Yang Gao, Zhuang Xiong, Shanshan Shan, Yin Liu, Pengfei Rong, Min Li, Alan H Wilman, G. Bruce Pike, Feng Liu, Hongfu Sun

    Abstract: Quantitative susceptibility map** (QSM) is a post-processing technique for deriving tissue magnetic susceptibility distribution from MRI phase measurements. Deep learning (DL) algorithms hold great potential for solving the ill-posed QSM reconstruction problem. However, a significant challenge facing current DL-QSM approaches is their limited adaptability to magnetic dipole field orientation var… ▽ More

    Submitted 26 March, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: 13pages, 9figures

  5. arXiv:2310.05058  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading

    Authors: Songtao Luo, Shuang Yang, Shiguang Shan, Xilin Chen

    Abstract: In this paper, we propose a novel method for speaker adaptation in lip reading, motivated by two observations. Firstly, a speaker's own characteristics can always be portrayed well by his/her few facial images or even a single image with shallow networks, while the fine-grained dynamic features associated with speech content expressed by the talking face always need deep sequential networks to rep… ▽ More

    Submitted 30 April, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to BMVC 2023 20pages

  6. arXiv:2308.06382  [pdf, other

    cs.SD cs.LG eess.AS

    Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion

    Authors: Siyuan Shan, Yang Li, Amartya Banerjee, Junier B. Oliva

    Abstract: Voice conversion (VC) aims at altering a person's voice to make it sound similar to the voice of another person while preserving linguistic content. Existing methods suffer from a dilemma between content intelligibility and speaker similarity; i.e., methods with higher intelligibility usually have a lower speaker similarity, while methods with higher speaker similarity usually require plenty of ta… ▽ More

    Submitted 30 December, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: AAAI 2024 Demo, Codes: https://phonemehallucinator.github.io/

  7. arXiv:2206.10861  [pdf, other

    cs.CV cs.SD eess.AS

    UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022

    Authors: Yuanhang Zhang, Susan Liang, Shuang Yang, Shiguang Shan

    Abstract: This report presents a brief description of our winning solution to the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2022. Our underlying model UniCon+ continues to build on our previous work, the Unified Context Network (UniCon) and Extended UniCon which are designed for robust scene-level ASD. We augment the architecture with a simple GRU-based module that allows information… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: 5 pages, 3 figures; technical report for AVA Challenge (see https://research.google.com/ava/challenge.html) at the International Challenge on Activity Recognition (ActivityNet), CVPR 2022

  8. arXiv:2205.10993  [pdf

    physics.med-ph eess.IV

    Distortion-Corrected Image Reconstruction with Deep Learning on an MRI-Linac

    Authors: Shanshan Shan, Yang Gao, Paul Z. Y. Liu, Brendan Whelan, Hongfu Sun, Bin Dong, Feng Liu, David E. J. Waddington

    Abstract: Magnetic resonance imaging (MRI) is increasingly utilized for image-guided radiotherapy due to its outstanding soft-tissue contrast and lack of ionizing radiation. However, geometric distortions caused by gradient nonlinearity (GNL) limit anatomical accuracy, potentially compromising the quality of tumour treatments. In addition, slow MR acquisition and reconstruction limit the potential for real-… ▽ More

    Submitted 20 March, 2023; v1 submitted 22 May, 2022; originally announced May 2022.

  9. arXiv:2111.10003  [pdf, other

    cs.SD cs.LG eess.AS

    Differentiable Wavetable Synthesis

    Authors: Siyuan Shan, Lamtharn Hantrakul, Jitong Chen, Matt Avent, David Trevelyan

    Abstract: Differentiable Wavetable Synthesis (DWTS) is a technique for neural audio synthesis which learns a dictionary of one-period waveforms i.e. wavetables, through end-to-end training. We achieve high-fidelity audio synthesis with as little as 10 to 20 wavetables and demonstrate how a data-driven dictionary of waveforms opens up unprecedented one-shot learning paradigms on short audio clips. Notably, w… ▽ More

    Submitted 13 February, 2022; v1 submitted 18 November, 2021; originally announced November 2021.

    Comments: Accepted by ICASSP 2022, Demo: https://lamtharnhantrakul.github.io/diffwts.github.io/

  10. arXiv:2108.02607  [pdf, other

    cs.CV cs.MM cs.SD eess.AS eess.IV

    UniCon: Unified Context Network for Robust Active Speaker Detection

    Authors: Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan, Xilin Chen

    Abstract: We introduce a new efficient framework, the Unified Context Network (UniCon), for robust active speaker detection (ASD). Traditional methods for ASD usually operate on each candidate's pre-cropped face track separately and do not sufficiently consider the relationships among the candidates. This potentially limits performance, especially in challenging scenarios with low-resolution faces, multiple… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: 10 pages, 6 figures; to appear at ACM Multimedia 2021

  11. arXiv:2003.03983  [pdf, other

    cs.CV cs.LG eess.IV

    Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading

    Authors: Mingshuang Luo, Shuang Yang, Shiguang Shan, Xilin Chen

    Abstract: Lip-reading aims to infer the speech content from the lip movement sequence and can be seen as a typical sequence-to-sequence (seq2seq) problem which translates the input image sequence of lip movements to the text sequence of the speech content. However, the traditional learning process of seq2seq models always suffers from two problems: the exposure bias resulted from the strategy of "teacher-fo… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

    Comments: 8 pages, Accepted in the 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)

  12. arXiv:2002.02957  [pdf, other

    cs.CV cs.SD eess.AS eess.IV

    $M^3$T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild

    Authors: Yuan-Hang Zhang, Rulin Huang, Jiabei Zeng, Shiguang Shan, Xilin Chen

    Abstract: This report describes a multi-modal multi-task ($M^3$T) approach underlying our submission to the valence-arousal estimation track of the Affective Behavior Analysis in-the-wild (ABAW) Challenge, held in conjunction with the IEEE International Conference on Automatic Face and Gesture Recognition (FG) 2020. In the proposed $M^3$T framework, we fuse both visual features from videos and acoustic feat… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

    Comments: 6 pages, technical report; submission to ABAW Challenge at FG 2020