Skip to main content

Showing 1–46 of 46 results for author: Tian, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14264  [pdf, other

    eess.IV cs.CV

    Zero-Shot Image Denoising for High-Resolution Electron Microscopy

    Authors: Xuanyu Tian, Zhuoya Dong, Xiyue Lin, Yue Gao, Hongjiang Wei, Yanhang Ma, **gyi Yu, Yuyao Zhang

    Abstract: High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 12 figures

  2. arXiv:2406.13340  [pdf, other

    cs.CL cs.SD eess.AS

    SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

    Authors: Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu

    Abstract: Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, includin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2406.08782  [pdf, other

    eess.IV cs.CV

    Hybrid Spatial-spectral Neural Network for Hyperspectral Image Denoising

    Authors: Hao Liang, Chengjie, Kun Li, Xin Tian

    Abstract: Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs. To address these problems, we propose a hybrid… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2404.17890  [pdf, other

    eess.IV cs.AI cs.CV

    DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction

    Authors: Chenhe Du, Xiyue Lin, Qing Wu, Xuanyu Tian, Ying Su, Zhe Luo, Hongjiang Wei, S. Kevin Zhou, **gyi Yu, Yuyao Zhang

    Abstract: Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse artifacts in the reconstructed CT images. Emerging implicit neural representation (INR) techniques, such as NeRF, NeAT, and NeRP, have shown promise in under-determined CT imaging recon… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 15 pages, 10 figures

    ACM Class: I.2.10; I.4.5

  5. arXiv:2403.11405  [pdf, other

    eess.SP

    A Deep Learning Method for Beat-Level Risk Analysis and Interpretation of Atrial Fibrillation Patients during Sinus Rhythm

    Authors: Jun Lei, Yuxi Zhou, Xue Tian, Qinghao Zhao, Qi Zhang, Shijia Geng, Qingbo Wu, Shenda Hong

    Abstract: Atrial Fibrillation (AF) is a common cardiac arrhythmia. Many AF patients experience complications such as stroke and other cardiovascular issues. Early detection of AF is crucial. Existing algorithms can only distinguish ``AF rhythm in AF patients'' from ``sinus rhythm in normal individuals'' . However, AF patients do not always exhibit AF rhythm, posing a challenge for diagnosis when the AF rhyt… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  6. arXiv:2401.12264  [pdf, other

    eess.AS cs.MM cs.SD eess.IV

    CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

    Authors: Xianghu Yue, Xiaohai Tian, Lu Lu, Malu Zhang, Zhizheng Wu, Haizhou Li

    Abstract: There has been a long-standing quest for a unified audio-visual-text model to enable various multimodal understanding tasks, which mimics the listening, seeing and reading process of human beings. Humans tends to represent knowledge using two separate systems: one for representing verbal (textual) information and one for representing non-verbal (visual and auditory) information. These two systems… ▽ More

    Submitted 21 February, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  7. arXiv:2311.10902  [pdf, other

    eess.IV cs.CV

    OCT2Confocal: 3D CycleGAN based Translation of Retinal OCT Images to Confocal Microscopy

    Authors: Xin Tian, Nantheera Anantrasirichai, Lindsay Nicholson, Alin Achim

    Abstract: Optical coherence tomography (OCT) and confocal microscopy are pivotal in retinal imaging, each presenting unique benefits and limitations. In-vivo OCT offers rapid, non-invasive imaging but can be hampered by clarity issues and motion artifacts. Ex-vivo confocal microscopy provides high-resolution, cellular detailed color images but is invasive and poses ethical concerns and potential tissue dama… ▽ More

    Submitted 16 February, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: 4pages, 5 figures

  8. arXiv:2310.09625  [pdf, other

    eess.IV cs.CV

    JSMoCo: Joint Coil Sensitivity and Motion Correction in Parallel MRI with a Self-Calibrating Score-Based Diffusion Model

    Authors: Lixuan Chen, Xuanyu Tian, Jiangjie Wu, Ruimin Feng, Guoyan Lao, Yuyao Zhang, Hongjiang Wei

    Abstract: Magnetic Resonance Imaging (MRI) stands as a powerful modality in clinical diagnosis. However, it is known that MRI faces challenges such as long acquisition time and vulnerability to motion-induced artifacts. Despite the success of many existing motion correction algorithms, there has been limited research focused on correcting motion artifacts on the estimated coil sensitivity maps for fast MRI… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: 10 pages,8 figures, journal

  9. arXiv:2309.06409  [pdf, other

    eess.SP

    Design and Implementation of DC-to-5~MHz Wide-Bandwidth High-Power High-Fidelity Converter

    Authors: **shui Zhang, Boshuo Wang, Xiaoyang Tian, Angel Peterchev, Stefan Goetz

    Abstract: Advances in power electronics have made it possible to achieve high power levels, e.g., reaching GW in grids, or alternatively high output bandwidths, e.g., beyond MHz in communication. Achieving both simultaneously, however, remains challenging. Various applications, ranging from efficient multichannel wireless power transfer to cutting-edge medical and neuroscience applications, are demanding bo… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: 8 pages, 11 figures

  10. arXiv:2309.05208  [pdf, other

    eess.SP

    Quaternion MLP Neural Networks Based on the Maximum Correntropy Criterion

    Authors: Gang Wang, Xinyu Tian, Zuxuan Zhang

    Abstract: We propose a gradient ascent algorithm for quaternion multilayer perceptron (MLP) networks based on the cost function of the maximum correntropy criterion (MCC). In the algorithm, we use the split quaternion activation function based on the generalized Hamilton-real quaternion gradient. By introducing a new quaternion operator, we first rewrite the early quaternion single layer perceptron algorith… ▽ More

    Submitted 13 September, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

  11. arXiv:2305.11438  [pdf, other

    cs.CL eess.AS

    Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring

    Authors: Kaiqi Fu, Shaojun Gao, Shuju Shi, Xiaohai Tian, Wei Li, Zejun Ma

    Abstract: Speech fluency/disfluency can be evaluated by analyzing a range of phonetic and prosodic features. Deep neural networks are commonly trained to map fluency-related features into the human scores. However, the effectiveness of deep learning-based models is constrained by the limited amount of labeled training samples. To address this, we introduce a self-supervised learning (SSL) approach that take… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  12. arXiv:2302.14751  [pdf

    eess.SP physics.optics

    High speed free-space optical communication using standard fiber communication component without optical amplification

    Authors: Yao Zhang, Hua-Ying Liu, Xiaoyi Liu, Peng Xu, Xiang Dong, Pengfei Fan, Xiaohui Tian, Hua Yu, Dong Pan, Zhijun Yin, Guilu Long, Shi-Ning Zhu, Zhenda Xie

    Abstract: Free-space optical communication (FSO) can achieve fast, secure and license-free communication without need for physical cables, making it a cost-effective, energy-efficient and flexible solution when the fiber connection is unavailable. To establish FSO connection on-demand, it is essential to build portable FSO devices with compact structure and light weight. Here, we develop a miniaturized FSO… ▽ More

    Submitted 16 April, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: 7 pages, 5 figures

  13. arXiv:2302.10444  [pdf, other

    eess.AS cs.SD

    Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring

    Authors: Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee

    Abstract: Recent studies on pronunciation scoring have explored the effect of introducing phone embeddings as reference pronunciation, but mostly in an implicit manner, i.e., addition or concatenation of reference phone embedding and actual pronunciation of the target phone as the phone-level pronunciation quality representation. In this paper, we propose to use linguistic-acoustic similarity to explicitly… ▽ More

    Submitted 13 March, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  14. arXiv:2302.09928  [pdf, other

    eess.AS

    An ASR-free Fluency Scoring Approach with Self-Supervised Learning

    Authors: Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee

    Abstract: A typical fluency scoring system generally relies on an automatic speech recognition (ASR) system to obtain time stamps in input speech for either the subsequent calculation of fluency-related features or directly modeling speech fluency with an end-to-end approach. This paper describes a novel ASR-free approach for automatic fluency assessment using self-supervised learning (SSL). Specifically, w… ▽ More

    Submitted 13 March, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  15. TTS-Guided Training for Accent Conversion Without Parallel Data

    Authors: Yi Zhou, Zhizheng Wu, Mingyang Zhang, Xiaohai Tian, Haizhou Li

    Abstract: Accent Conversion (AC) seeks to change the accent of speech from one (source) to another (target) while preserving the speech content and speaker identity. However, many AC approaches rely on source-target parallel speech data. We propose a novel accent conversion framework without the need of parallel data. Specifically, a text-to-speech (TTS) system is first pretrained with target-accented speec… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: 5 pages, 4 figures, submitted to signal processing letter

  16. arXiv:2209.06411  [pdf, other

    eess.IV cs.CV cs.LG

    Noise2SR: Learning to Denoise from Super-Resolved Single Noisy Fluorescence Image

    Authors: Xuanyu Tian, Qing Wu, Hongjiang Wei, Yuyao Zhang

    Abstract: Fluorescence microscopy is a key driver to promote discoveries of biomedical research. However, with the limitation of microscope hardware and characteristics of the observed samples, the fluorescence microscopy images are susceptible to noise. Recently, a few self-supervised deep learning (DL) denoising methods have been proposed. However, the training efficiency and denoising performance of exis… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

    Comments: 12 pages, 6 figures

    Journal ref: MICCAI 2022

  17. arXiv:2205.15528  [pdf, other

    eess.SP

    Enabling NLoS LEO Satellite Communications with Reconfigurable Intelligent Surfaces

    Authors: Xiaowen Tian, Nuria Gonzalez-Prelcic, Takayuki Shimizu

    Abstract: Low Earth Orbit (LEO) satellite communications (SatCom) are considered a promising solution to provide uninterrupted services in cellular networks. Line-of-sight (LoS) links between the LEO satellites and the ground users are, however, easily blocked in urban scenarios. In this paper, we propose to enable LEO SatCom in non-line-of-sight (NLoS) channels, as those corresponding to links to users in… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Comments: 6 pages, 6 figures, submitted to Globecom 2022

  18. arXiv:2205.15520  [pdf, other

    eess.SP

    Optimizing the Deployment of Reconfigurable Intelligent Surfaces in MmWave Vehicular Systems

    Authors: Xiaowen Tian, Nuria Gonzalez-Prelcic, Robert W. Heath Jr

    Abstract: Millimeter wave (MmWave) systems are vulnerable to blockages, which cause signal drop and link outage. One solution is to deploy reconfigurable intelligent surfaces (RISs) to add a strong non-line-of-sight path from the transmitter to receiver. To achieve the best performance, the location of the deployed RIS should be optimized for a given site, considering the distribution of potential users and… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

    Comments: 6 pages, 5 figures, submitted to Globecom 2022

  19. arXiv:2204.01708   

    eess.IV cs.CV

    MRI-based Multi-task Decoupling Learning for Alzheimer's Disease Detection and MMSE Score Prediction: A Multi-site Validation

    Authors: Xu Tian, ** Liu, Hulin Kuang, Yu Sheng, Jianxin Wang, The Alzheimer's Disease Neuroimaging Initiative

    Abstract: Accurately detecting Alzheimer's disease (AD) and predicting mini-mental state examination (MMSE) score are important tasks in elderly health by magnetic resonance imaging (MRI). Most of the previous methods on these two tasks are based on single-task learning and rarely consider the correlation between them. Since the MMSE score, which is an important basis for AD diagnosis, can also reflect the… ▽ More

    Submitted 7 July, 2023; v1 submitted 2 April, 2022; originally announced April 2022.

    Comments: There are some misstatements in the related work section of the paper. In the methods section, there are also errors in the description of some modules

  20. arXiv:2203.01826  [pdf, other

    eess.AS cs.LG

    Improving Non-native Word-level Pronunciation Scoring with Phone-level Mixup Data Augmentation and Multi-source Information

    Authors: Kaiqi Fu, Shaojun Gao, Kai Wang, Wei Li, Xiaohai Tian, Zejun Ma

    Abstract: Deep learning-based pronunciation scoring models highly rely on the availability of the annotated non-native data, which is costly and has scalability issues. To deal with the data scarcity problem, data augmentation is commonly used for model pretraining. In this paper, we propose a phone-level mixup, a simple yet effective data augmentation method, to improve the performance of word-level pronun… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: 5 pages, 2 figures. This paper is submitted to INTERSPEECH 2022

  21. Optimal Transport-based Graph Matching for 3D retinal OCT image registration

    Authors: Xin Tian, Nantheera Anantrasirichai, Lindsay Nicholson, Alin Achim

    Abstract: Registration of longitudinal optical coherence tomography (OCT) images assists disease monitoring and is essential in image fusion applications. Mouse retinal OCT images are often collected for longitudinal study of eye disease models such as uveitis, but their quality is often poor compared with human imaging. This paper presents a novel but efficient framework involving an optimal transport base… ▽ More

    Submitted 28 February, 2022; originally announced March 2022.

  22. arXiv:2108.13141  [pdf, other

    eess.IV cs.CV cs.MM

    Robust Privacy-Preserving Motion Detection and Object Tracking in Encrypted Streaming Video

    Authors: Xianhao Tian, Peijia Zheng, Jiwu Huang

    Abstract: Video privacy leakage is becoming an increasingly severe public problem, especially in cloud-based video surveillance systems. It leads to the new need for secure cloud-based video applications, where the video is encrypted for privacy protection. Despite some methods that have been proposed for encrypted video moving object detection and tracking, none has robust performance against complex and d… ▽ More

    Submitted 30 August, 2021; originally announced August 2021.

  23. A Multi-Channel Ratio-of-Ratios Method for Noncontact Hand Video Based SpO$_2$ Monitoring Using Smartphone Cameras

    Authors: Xin Tian, Chau-Wai Wong, Sushant M. Ranadive, Min Wu

    Abstract: Blood oxygen saturation (SpO$_2$) is an important indicator for pulmonary and respiratory functionalities. Clinical findings on COVID-19 show that many patients had dangerously low blood oxygen levels not long before conditions worsened. It is therefore recommended, especially for the vulnerable population, to regularly monitor the blood oxygen level for precaution. Recent works have investigated… ▽ More

    Submitted 18 July, 2021; originally announced July 2021.

  24. arXiv:2107.05087  [pdf, other

    cs.LG cs.CV eess.IV

    Remote Blood Oxygen Estimation From Videos Using Neural Networks

    Authors: Joshua Mathew, Xin Tian, Min Wu, Chau-Wai Wong

    Abstract: Blood oxygen saturation (SpO$_2$) is an essential indicator of respiratory functionality and is receiving increasing attention during the COVID-19 pandemic. Clinical findings show that it is possible for COVID-19 patients to have significantly low SpO$_2$ before any obvious symptoms. The prevalence of cameras has motivated researchers to investigate methods for monitoring SpO$_2$ using videos. Mos… ▽ More

    Submitted 5 May, 2022; v1 submitted 11 July, 2021; originally announced July 2021.

  25. arXiv:2105.08511  [pdf, ps, other

    cs.LG cs.CR eess.IV

    Privacy-Preserving Constrained Domain Generalization via Gradient Alignment

    Authors: Chris Xing Tian, Haoliang Li, Yufei Wang, Shiqi Wang

    Abstract: Deep neural networks (DNN) have demonstrated unprecedented success for medical imaging applications. However, due to the issue of limited dataset availability and the strict legal and ethical requirements for patient privacy protection, the broad applications of medical imaging classification driven by DNN with large-scale training data have been largely hindered. For example, when training the DN… ▽ More

    Submitted 18 September, 2023; v1 submitted 14 May, 2021; originally announced May 2021.

  26. arXiv:2104.02602  [pdf, other

    eess.IV cs.CV

    Pathological Image Segmentation with Noisy Labels

    Authors: Li Xiao, Yinhao Li, Luxi Qv, Xinxia Tian, Yijie Peng, S. Kevin Zhou

    Abstract: Segmentation of pathological images is essential for accurate disease diagnosis. The quality of manual labels plays a critical role in segmentation accuracy; yet, in practice, the labels between pathologists could be inconsistent, thus confusing the training process. In this work, we propose a novel label re-weighting framework to account for the reliability of different experts' labels on each pi… ▽ More

    Submitted 19 March, 2021; originally announced April 2021.

  27. arXiv:2104.01818  [pdf, other

    eess.AS

    The Multi-speaker Multi-style Voice Cloning Challenge 2021

    Authors: Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu

    Abstract: The Multi-speaker Multi-style Voice Cloning Challenge (M2VoC) aims to provide a common sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning task. Specifically, we formulate the challenge to adapt an average TTS model to the stylistic target voice with limited data from target speaker, evaluated by speaker identity and style similarity. The challenge consists… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

    Comments: has been accepted to ICASSP 2021

  28. arXiv:2103.15683  [pdf, other

    eess.IV cs.CV

    Omniscient Video Super-Resolution

    Authors: Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Tao Lu, Xin Tian, Jiayi Ma

    Abstract: Most recent video super-resolution (SR) methods either adopt an iterative manner to deal with low-resolution (LR) frames from a temporally sliding window, or leverage the previously estimated SR output to help reconstruct the current frame recurrently. A few studies try to combine these two structures to form a hybrid framework but have failed to give full play to it. In this paper, we propose an… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

  29. arXiv:2102.11414  [pdf, other

    eess.SP

    Fast Beam Tracking for Reconfigurable Intelligent Surface Assisted Mobile mmWave Networks

    Authors: Xiaowen Tian, Zhi Sun

    Abstract: Millimeter wave (mmWave) communications are vulnerable to blockages and node mobility due to the highly directional signal beams. The emerging Reconfigurable Intelligent Surfaces (RISs) technique can effectively mitigate the blockage problem by exploring the non-line-of-sight (NLOS) path, where the beam switching is realized by digitally configuring the phases of RIS elements. To date, most effort… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: 11 pages, 11 figures. This work has been submitted to the Elsevier Computer Networks for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  30. Cross-domain Joint Dictionary Learning for ECG Inference from PPG

    Authors: Xin Tian, Qiang Zhu, Yuenan Li, Min Wu

    Abstract: The inverse problem of inferring electrocardiogram (ECG) from photoplethysmogram (PPG) is an emerging research direction that combines the easy measurability of PPG and the rich clinical knowledge of ECG for long-term continuous cardiac monitoring. The prior art for reconstruction using a universal basis has limited fidelity for uncommon ECG waveform shapes due to the lack of rich representative p… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

  31. Inferring ECG from PPG for Continuous Cardiac Monitoring Using Lightweight Neural Network

    Authors: Yuenan Li, Xin Tian, Qiang Zhu, Min Wu

    Abstract: This paper presents a computational solution that enables continuous cardiac monitoring through cross-modality inference of electrocardiogram (ECG). While some smartwatches now allow users to obtain a 30-second ECG test by tap** a built-in bio-sensor, these short-term ECG tests often miss intermittent and asymptomatic abnormalities of cardiac functions. It is also infeasible to expect persistent… ▽ More

    Submitted 10 May, 2024; v1 submitted 9 December, 2020; originally announced December 2020.

    ACM Class: J.3; I.2.6

  32. arXiv:2012.00337  [pdf, other

    cs.SD cs.HC eess.AS

    NHSS: A Speech and Singing Parallel Database

    Authors: Bidisha Sharma, Xiaoxue Gao, Karthika Vijayan, Xiaohai Tian, Haizhou Li

    Abstract: We present a database of parallel recordings of speech and singing, collected and released by the Human Language Technology (HLT) laboratory at the National University of Singapore (NUS), that is called NUS-HLT Speak-Sing (NHSS) database. We release this database to the public to support research activities, that include, but not limited to comparative studies of acoustic attributes of speech and… ▽ More

    Submitted 5 August, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: Accepted to Speech Communication

  33. arXiv:2011.08548  [pdf, other

    cs.SD eess.AS

    Optimizing voice conversion network with cycle consistency loss of speaker identity

    Authors: Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li

    Abstract: We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the same speaker identity as reference speech at utterance level. While the proposed training scheme is… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

  34. arXiv:2011.05122  [pdf

    eess.IV physics.optics

    Scannerless non-line-of-sight three dimensional imaging with a 32x32 SPAD array

    Authors: Chenfei **, Meng Tang, Legeng Jia, Xiaorui Tian, Jie Yang, Kai Qiao, Siqi Zhang

    Abstract: We develop a scannerless non-line-of-sight three dimensional imaging system based on a commercial 32x32 SPAD camera combined with a 70 ps pulsed laser. In our experiment, 1024 time histograms can be achieved synchronously in 3s with an average time resolution of about 165 ps. The result with filtered back projection shows a discernable reconstruction while the result using virtual wave field demon… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: 10 pages, 8 figures

  35. arXiv:2009.03554  [pdf, other

    eess.AS cs.SD

    Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions

    Authors: Rohan Kumar Das, Tomi Kinnunen, Wen-Chin Huang, Zhenhua Ling, Junichi Yamagishi, Yi Zhao, Xiaohai Tian, Tomoki Toda

    Abstract: The Voice Conversion Challenge 2020 is the third edition under its flagship that promotes intra-lingual semiparallel and cross-lingual voice conversion (VC). While the primary evaluation of the challenge submissions was done through crowd-sourced listening tests, we also performed an objective assessment of the submitted systems. The aim of the objective assessment is to provide complementary perf… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

    Comments: Submitted to ISCA Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

  36. arXiv:2008.12527  [pdf, other

    eess.AS cs.SD

    Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

    Authors: Yi Zhao, Wen-Chin Huang, Xiaohai Tian, Junichi Yamagishi, Rohan Kumar Das, Tomi Kinnunen, Zhenhua Ling, Tomoki Toda

    Abstract: The voice conversion challenge is a bi-annual scientific event held to compare and understand different voice conversion (VC) systems built on a common dataset. In 2020, we organized the third edition of the challenge and constructed and distributed a new database for two tasks, intra-lingual semi-parallel and cross-lingual VC. After a two-month challenge period, we received 33 submissions, includ… ▽ More

    Submitted 28 August, 2020; originally announced August 2020.

    Comments: Submitted to ISCA Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

  37. arXiv:2008.02519  [pdf

    eess.AS cs.SD

    Spectral-change enhancement with prior SNR for the hearing impaired

    Authors: Xiang Li, Xin Tian, Henry Luo, **yu Qian, Xihong Wu, Dingsheng Luo, **g Chen

    Abstract: A previous signal processing algorithm that aimed to enhance spectral changes (SCE) over time showed benefit for hearing-impaired (HI) listeners to recognize speech in background noise. In this work, the previous SCE was manipulated to perform on target-dominant segments, rather than treating all frames equally. Instantaneous signal-to-noise ratios (SNRs) were calculated to determine whether the s… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: Accepted by 23rd International Congress on Acoustics (ICA 2019), see http://pub.dega-akustik.de/ICA2019/data/articles/000051.pdf

  38. arXiv:2004.08849  [pdf, other

    eess.AS cs.CR

    The Attacker's Perspective on Automatic Speaker Verification: An Overview

    Authors: Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li

    Abstract: Security of automatic speaker verification (ASV) systems is compromised by various spoofing attacks. While many types of non-proactive attacks (and their defenses) have been studied in the past, attacker's perspective on ASV, represents a far less explored direction. It can potentially help to identify the weakest parts of ASV systems and be used to develop attacker-aware systems. We present an ov… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

    Comments: 5 pages, 1 figure, Submitted to Interspeech 2020

  39. arXiv:1912.01447  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Transform-Invariant Convolutional Neural Networks for Image Classification and Search

    Authors: Xu Shen, Xinmei Tian, Anfeng He, Shaoyan Sun, Dacheng Tao

    Abstract: Convolutional neural networks (CNNs) have achieved state-of-the-art results on many visual recognition tasks. However, current CNN models still exhibit a poor ability to be invariant to spatial transformations of images. Intuitively, with sufficient layers and parameters, hierarchical combinations of convolution (matrix multiplication and non-linear activation) and pooling operations should be abl… ▽ More

    Submitted 28 November, 2019; originally announced December 2019.

    Comments: Accepted by ACM Multimedia. arXiv admin note: text overlap with arXiv:1911.12682

  40. arXiv:1911.12682  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Patch Reordering: a Novel Way to Achieve Rotation and Translation Invariance in Convolutional Neural Networks

    Authors: Xu Shen, Xinmei Tian, Shaoyan Sun, Dacheng Tao

    Abstract: Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance on many visual recognition tasks. However, the combination of convolution and pooling operations only shows invariance to small local location changes in meaningful objects in input. Sometimes, such networks are trained using data augmentation to encode this invariance into the parameters, which restricts the capac… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

    Comments: Accepted AAAI17

  41. arXiv:1911.03461  [pdf, other

    eess.IV cs.CV

    AIM 2019 Challenge on Image Demoireing: Methods and Results

    Authors: Shanxin Yuan, Radu Timofte, Gregory Slabaugh, Ales Leonardis, Bolun Zheng, Xin Ye, Xiang Tian, Yaowu Chen, Xi Cheng, Zhenyong Fu, Jian Yang, Ming Hong, Wenying Lin, Wen** Yang, Yanyun Qu, Hong-Kyu Shin, Joon-Yeon Kim, Sung-Jea Ko, Hang Dong, Yu Guo, Jie Wang, Xuan Ding, Zongyan Han, Sourya Dipta Das, Kuldeep Purohit , et al. (3 additional authors not shown)

    Abstract: This paper reviews the first-ever image demoireing challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ICCV 2019. This paper describes the challenge, and focuses on the proposed solutions and their results. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. A new dataset, called LCDMoire wa… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

    Comments: arXiv admin note: text overlap with arXiv:1911.02498

  42. arXiv:1910.00496  [pdf, other

    eess.AS

    A Modularized Neural Network with Language-Specific Output Layers for Cross-lingual Voice Conversion

    Authors: Yi Zhou, Xiaohai Tian, Emre Yılmaz, Rohan Kumar Das, Haizhou Li

    Abstract: This paper presents a cross-lingual voice conversion framework that adopts a modularized neural network. The modularized neural network has a common input structure that is shared for both languages, and two separate output modules, one for each language. The idea is motivated by the fact that phonetic systems of languages are similar because humans share a common vocal production system, but acou… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted for publication at IEEE ASRU Workshop 2019

  43. arXiv:1909.07655  [pdf, other

    eess.AS

    Black-box Attacks on Automatic Speaker Verification using Feedback-controlled Voice Conversion

    Authors: Xiaohai Tian, Rohan Kumar Das, Haizhou Li

    Abstract: Automatic speaker verification (ASV) systems in practice are greatly vulnerable to spoofing attacks. The latest voice conversion technologies are able to produce perceptually natural sounding speech that mimics any target speakers. However, the perceptual closeness to a speaker's identity may not be enough to deceive an ASV system. In this work, we propose a framework that uses the output scores o… ▽ More

    Submitted 29 October, 2019; v1 submitted 17 September, 2019; originally announced September 2019.

    Comments: 6 pages, 3 figures, This paper is submitted to ICASSP 2020

  44. ECG Reconstruction via PPG: A Pilot Study

    Authors: Qiang Zhu, Xin Tian, Chau-Wai Wong, Min Wu

    Abstract: In this paper, the relation between electrocardiogram (ECG) and photoplethysmogram (PPG) signals is studied, and the waveform of ECG is inferred via the PPG signals. In order to address this inverse problem, a transform is proposed to map the discrete cosine transform (DCT) coefficients of each PPG cycle to those of the corresponding ECG cycle. The resulting DCT coefficients of the ECG cycle are i… ▽ More

    Submitted 23 April, 2019; originally announced April 2019.

  45. arXiv:1902.03705  [pdf, other

    eess.AS cs.SD

    A Vocoder-free WaveNet Voice Conversion with Non-Parallel Data

    Authors: Xiaohai Tian, Eng Siong Chng, Haizhou Li

    Abstract: In a typical voice conversion system, vocoder is commonly used for speech-to-features analysis and features-to-speech synthesis. However, vocoder can be a source of speech quality degradation. This paper presents a vocoder-free voice conversion approach using WaveNet for non-parallel training data. Instead of dealing with the intermediate features, the proposed approach utilizes the WaveNet to map… ▽ More

    Submitted 17 September, 2019; v1 submitted 10 February, 2019; originally announced February 2019.

    Comments: 5 pages, 4 figures, This paper is submitted to INTERSPEECH 2019

  46. An adaptive software defined radio design based on a standard space telecommunication radio system API

    Authors: Wenhao Xiong, Xin Tian, Genshe Chen, Khanh Pham, Erik Blasch

    Abstract: Software defined radio (SDR) has become a popular tool for the implementation and testing for communications performance. The advantage of the SDR approach includes: a re-configurable design, adaptive response to changing conditions, efficient development, and highly versatile implementation. In order to understand the benefits of SDR, the space telecommunication radio system (STRS) was proposed b… ▽ More

    Submitted 25 November, 2017; originally announced November 2017.