Skip to main content

Showing 1–50 of 52 results for author: Qian, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.16058  [pdf, other

    eess.AS

    Text-Queried Target Sound Event Localization

    Authors: **zheng Zhao, Xinyuan Qian, Yong Xu, Haohe Liu, Yin Cao, Davide Berghi, Wenwu Wang

    Abstract: Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classes in DCASE challenges. In this paper, we propose text-queried target sound event localization (SEL), a new paradigm that allows the user to input the… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted by EUSIPCO 2024

  2. arXiv:2406.11401  [pdf, other

    eess.AS

    An Exploration of Length Generalization in Transformer-Based Speech Enhancement

    Authors: Qiquan Zhang, Hongxu Zhu, Xinyuan Qian, Eliathamby Ambikairajah, Haizhou Li

    Abstract: The use of Transformer architectures has facilitated remarkable progress in speech enhancement. Training Transformers using substantially long speech utterances is often infeasible as self-attention suffers from quadratic complexity. It is a critical and unexplored challenge for a Transformer-based speech enhancement model to learn from short speech utterances and generalize to longer ones. In thi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  3. arXiv:2405.12609  [pdf, other

    eess.AS cs.SD

    Mamba in Speech: Towards an Alternative to Self-Attention

    Authors: Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

    Abstract: Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and comp… ▽ More

    Submitted 30 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  4. arXiv:2405.01104  [pdf, other

    cs.IT eess.SP

    Multi-user ISAC through Stacked Intelligent Metasurfaces: New Algorithms and Experiments

    Authors: Ziqing Wang, Hongzheng Liu, Jianan Zhang, Ru**g Xiong, Kai Wan, Xuewen Qian, Marco Di Renzo, Robert Caiming Qiu

    Abstract: This paper investigates a Stacked Intelligent Metasurfaces (SIM)-assisted Integrated Sensing and Communications (ISAC) system. An extended target model is considered, where the BS aims to estimate the complete target response matrix relative to the SIM. Under the constraints of minimum Signal-to-Interference-plus-Noise Ratio (SINR) for the communication users (CUs) and maximum transmit power, we j… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  5. arXiv:2404.18501  [pdf, other

    eess.AS cs.SD

    Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention

    Authors: Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li

    Abstract: Audio-visual target speaker extraction (AV-TSE) aims to extract the specific person's speech from the audio mixture given auxiliary visual cues. Previous methods usually search for the target voice through speech-lip synchronization. However, this strategy mainly focuses on the existence of target speech, while ignoring the variations of the noise characteristics. That may result in extracting noi… ▽ More

    Submitted 8 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  6. arXiv:2404.13153  [pdf, other

    eess.IV cs.CV

    Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

    Authors: Chengxu Liu, Xuan Wang, Xiangyu Xu, Ruhao Tian, Shuai Li, Xueming Qian, Ming-Hsuan Yang

    Abstract: Eliminating image blur produced by various kinds of motion has been a challenging problem. Dominant approaches rely heavily on model capacity to remove blurring by reconstructing residual from blurry observation in feature space. These practices not only prevent the capture of spatially variable motion in the real world but also ignore the tailored handling of various motions in image space. In th… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  7. arXiv:2404.00861  [pdf, other

    eess.AS eess.IV

    Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training

    Authors: Ruijie Tao, Xinyuan Qian, Rohan Kumar Das, Xiaoxue Gao, Jiadong Wang, Haizhou Li

    Abstract: Audio-visual active speaker detection (AV-ASD) aims to identify which visible face is speaking in a scene with one or more persons. Most existing AV-ASD methods prioritize capturing speech-lip correspondence. However, there is a noticeable gap in addressing the challenges from real-world AV-ASD scenarios. Due to the presence of low-quality noisy videos in such cases, AV-ASD systems without a selec… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 10 pages

  8. arXiv:2403.10012  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    Real-World Computational Aberration Correction via Quantized Domain-Mixing Representation

    Authors: Qi Jiang, Zhonghua Yi, Shaohua Gao, Yao Gao, Xiaolong Qian, Hao Shi, Lei Sun, Zhijie Xu, Kailun Yang, Kaiwei Wang

    Abstract: Relying on paired synthetic data, existing learning-based Computational Aberration Correction (CAC) methods are confronted with the intricate and multifaceted synthetic-to-real domain gap, which leads to suboptimal performance in real-world applications. In this paper, in contrast to improving the simulation pipeline, we deliver a novel insight into real-world CAC from the perspective of Unsupervi… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Codes and datasets will be made publicly available at https://github.com/zju-jiangqi/QDMR

  9. arXiv:2311.12070  [pdf, other

    eess.IV cs.CV

    FDDM: Unsupervised Medical Image Translation with a Frequency-Decoupled Diffusion Model

    Authors: Yunxiang Li, Hua-Chieh Shao, Xiaoxue Qian, You Zhang

    Abstract: Diffusion models have demonstrated significant potential in producing high-quality images in medical image translation to aid disease diagnosis, localization, and treatment. Nevertheless, current diffusion models have limited success in achieving faithful image translations that can accurately preserve the anatomical structures of medical images, especially for unpaired datasets. The preservation… ▽ More

    Submitted 26 June, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

  10. arXiv:2310.14778  [pdf, other

    cs.MM cs.SD eess.AS

    Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

    Authors: **zheng Zhao, Yong Xu, Xinyuan Qian, Davide Berghi, Peipei Wu, Meng Cui, Jianyuan Sun, Philip J. B. Jackson, Wenwu Wang

    Abstract: Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide application. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visual information, the Bayesian-based filter can solve the problem of data association, audio-visual fusion and track management. In this paper, we condu… ▽ More

    Submitted 17 December, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

  11. arXiv:2310.10497  [pdf, other

    cs.SD cs.AI eess.AS

    LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism

    Authors: Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li

    Abstract: The prevailing noise-resistant and reverberation-resistant localization algorithms primarily emphasize separating and providing directional output for each speaker in multi-speaker scenarios, without association with the identity of speakers. In this paper, we present a target speaker localization algorithm with a selective hearing mechanism. Given a reference speech of the target speaker, we firs… ▽ More

    Submitted 17 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP 2024

  12. arXiv:2309.16308  [pdf, other

    cs.MM cs.SD eess.AS

    Audio Visual Speaker Localization from EgoCentric Views

    Authors: **zheng Zhao, Yong Xu, Xinyuan Qian, Wenwu Wang

    Abstract: The use of audio and visual modality for speaker localization has been well studied in the literature by exploiting their complementary characteristics. However, most previous works employ the setting of static sensors mounted at fixed positions. Unlike them, in this work, we explore the ego-centric setting, where the heterogeneous sensors are embodied and could be moving with a human to facilitat… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  13. arXiv:2306.09480  [pdf, ps, other

    cs.IT eess.SP

    Optimization of RIS-Aided MIMO -- A Mutually Coupled Loaded Wire Dipole Model

    Authors: H. El Hassani, X. Qian, S. Jeong, N. S. Perović, M. Di Renzo, P. Mursia, V. Sciancalepore, X. Costa-Pérez

    Abstract: We consider a reconfigurable intelligent surface (RIS) assisted multiple-input multiple-output (MIMO) system in the presence of scattering objects. The MIMO transmitter and receiver, the RIS, and the scattering objects are modeled as mutually coupled thin wires connected to load impedances. We introduce a novel numerical algorithm for optimizing the tunable loads connected to the RIS, which does n… ▽ More

    Submitted 18 September, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

  14. arXiv:2306.04915  [pdf, ps, other

    cs.IT eess.SP

    Sensing-based Beamforming Design for Joint Performance Enhancement of RIS-Aided ISAC Systems

    Authors: Xiaowei Qian, Xiaoling Hu, Chenxi Liu, Mugen Peng, Caijun Zhong

    Abstract: Reconfigurable intelligent surface (RIS) has shown its great potential in facilitating device-based integrated sensing and communication (ISAC), where sensing and communication tasks are mostly conducted on different time-frequency resources. While the more challenging scenarios of simultaneous sensing and communication (SSC) have so far drawn little attention. In this paper, we propose a novel RI… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  15. arXiv:2305.16342  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition

    Authors: Zhi-Hao Lai, Tian-Hao Zhang, Qi Liu, Xinyuan Qian, Li-Fang Wei, Song-Lu Chen, Feng Chen, Xu-Cheng Yin

    Abstract: The local and global features are both essential for automatic speech recognition (ASR). Many recent methods have verified that simply combining local and global features can further promote ASR performance. However, these methods pay less attention to the interaction of local and global features, and their series architectures are rigid to reflect local and global relationships. To address these… ▽ More

    Submitted 29 May, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech 2023

  16. arXiv:2305.14049  [pdf, other

    cs.CL cs.SD eess.AS

    Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

    Authors: Tian-Hao Zhang, Hai-Bo Qin, Zhi-Hao Lai, Song-Lu Chen, Qi Liu, Feng Chen, Xinyuan Qian, Xu-Cheng Yin

    Abstract: Attention-based encoder-decoder (AED) models have shown impressive performance in ASR. However, most existing AED methods neglect to simultaneously leverage both acoustic and semantic features in decoder, which is crucial for generating more accurate and informative semantic states. In this paper, we propose an Acoustic and Semantic Cooperative Decoder (ASCD) for ASR. In particular, unlike vanilla… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech 2023

  17. arXiv:2305.08541  [pdf, other

    cs.SD eess.AS

    Ripple sparse self-attention for monaural speech enhancement

    Authors: Qiquan Zhang, Hongxu Zhu, Qi Song, Xinyuan Qian, Zhaoheng Ni, Haizhou Li

    Abstract: The use of Transformer represents a recent success in speech enhancement. However, as its core component, self-attention suffers from quadratic complexity, which is computationally prohibited for long speech recordings. Moreover, it allows each time frame to attend to all time frames, neglecting the strong local correlations of speech signals. This study presents a simple yet effective sparse self… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: 5 pages, ICASSP 2023 published

  18. arXiv:2303.17480  [pdf, other

    cs.CV cs.AI eess.IV

    Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

    Authors: Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li

    Abstract: Talking face generation, also known as speech-to-lip generation, reconstructs facial motions concerning lips given coherent speech input. The previous studies revealed the importance of lip-speech synchronization and visual quality. Despite much progress, they hardly focus on the content of lip movements i.e., the visual intelligibility of the spoken words, which is an important aspect of generati… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: accepted by CVPR 2023

  19. arXiv:2303.03093  [pdf, other

    eess.SP

    A Miniaturised Camera-based Multi-Modal Tactile Sensor

    Authors: Kaspar Althoefer, Yonggen Ling, Wanlin Li, Xinyuan Qian, Wang Wei Lee, Peng Qi

    Abstract: In conjunction with huge recent progress in camera and computer vision technology, camera-based sensors have increasingly shown considerable promise in relation to tactile sensing. In comparison to competing technologies (be they resistive, capacitive or magnetic based), they offer super-high-resolution, while suffering from fewer wiring problems. The human tactile system is composed of various ty… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  20. arXiv:2302.01972  [pdf, other

    cs.CR eess.SY math.DS math.OC physics.soc-ph

    DCA: Delayed Charging Attack on the Electric Shared Mobility System

    Authors: Shuocheng Guo, Hanlin Chen, Mizanur Rahman, Xinwu Qian

    Abstract: An efficient operation of the electric shared mobility system (ESMS) relies heavily on seamless interconnections among shared electric vehicles (SEV), electric vehicle supply equipment (EVSE), and the grid. Nevertheless, this interconnectivity also makes the ESMS vulnerable to cyberattacks that may cause short-term breakdowns or long-term degradation of the ESMS. This study focuses on one such att… ▽ More

    Submitted 13 June, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Journal ref: IEEE Transactions on Intelligent Transportation Systems, 2023

  21. arXiv:2301.07968  [pdf, other

    cs.IT eess.SP

    On the Degrees of Freedom of RIS-Aided Holographic MIMO Systems

    Authors: Juan Carlos Ruiz-Sicilia, Xuewen Qian, Marco Di Renzo, Vincenzo Sciancalepore, Merouane Debbah, Xavier Costa-Perez

    Abstract: In this paper, we study surface-based communication systems based on different levels of channel state information for system optimization. We analyze the system performance in terms of rate and degrees of freedom (DoF). We show that the deployment of a reconfigurable intelligent surface (RIS) results in increasing the number of DoF, by extending the near-field region. Over Rician fading channels,… ▽ More

    Submitted 27 January, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

  22. arXiv:2212.00661  [pdf, other

    quant-ph eess.SY

    Hybrid Gate-Pulse Model for Variational Quantum Algorithms

    Authors: Zhiding Liang, Zhixin Song, **glei Cheng, Zichang He, Ji Liu, Hanrui Wang, Ruiyang Qin, Yiru Wang, Song Han, Xuehai Qian, Yiyu Shi

    Abstract: Current quantum programs are mostly synthesized and compiled on the gate-level, where quantum circuits are composed of quantum gates. The gate-level workflow, however, introduces significant redundancy when quantum gates are eventually transformed into control signals and applied on quantum devices. For superconducting quantum computers, the control signals are microwave pulses. Therefore, pulse-l… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: 8 pages, 6 figures

  23. DynImp: Dynamic Imputation for Wearable Sensing Data Through Sensory and Temporal Relatedness

    Authors: Zepeng Huo, Taowei Ji, Yifei Liang, Shuai Huang, Zhangyang Wang, Xiaoning Qian, Bobak Mortazavi

    Abstract: In wearable sensing applications, data is inevitable to be irregularly sampled or partially missing, which pose challenges for any downstream application. An unique aspect of wearable data is that it is time-series data and each channel can be correlated to another one, such as x, y, z axis of accelerometer. We argue that traditional methods have rarely made use of both times-series dynamics of th… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 5 pages, 2 figures, accepted in ICASSP'2022

  24. arXiv:2209.01768  [pdf, other

    cs.MM cs.SD eess.AS

    Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception

    Authors: Jiadong Wang, Xinyuan Qian, Haizhou Li

    Abstract: Audio and visual signals complement each other in human speech perception, so do they in speech recognition. The visual hint is less evident than the acoustic hint, but more robust in a complex acoustic environment, as far as speech perception is concerned. It remains a challenge how we effectively exploit the interaction between audio and visual signals for automatic speech recognition. There hav… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

  25. arXiv:2209.01749  [pdf, other

    eess.IV cs.CV

    4D LUT: Learnable Context-Aware 4D Lookup Table for Image Enhancement

    Authors: Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian

    Abstract: Image enhancement aims at improving the aesthetic visual quality of photos by retouching the color and tone, and is an essential technology for professional digital photography. Recent years deep learning-based image enhancement algorithms have achieved promising performance and attracted increasing popularity. However, typical efforts attempt to construct a uniform enhancer for all pixels' color… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

  26. arXiv:2206.12273  [pdf, other

    eess.AS cs.LG

    Iterative Sound Source Localization for Unknown Number of Sources

    Authors: Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang

    Abstract: Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio. For the practical problem of unknown number of sources, existing localization algorithms attempt to predict a likelihood-based coding (i.e., spatial spectrum) and employ a pre-determined threshold to detect the source number and corresponding DOA value. However, these t… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: Accepted by Interspeech 2022

  27. arXiv:2204.10513  [pdf

    eess.IV cs.CV

    MIPR:Automatic Annotation of Medical Images with Pixel Rearrangement

    Authors: **** Dai, Haiming Zhu, Shuang Ge, Ruihan Zhang, Xiang Qian, Xi Li, Kehong Yuan

    Abstract: Most of the state-of-the-art semantic segmentation reported in recent years is based on fully supervised deep learning in the medical domain. How?ever, the high-quality annotated datasets require intense labor and domain knowledge, consuming enormous time and cost. Previous works that adopt semi?supervised and unsupervised learning are proposed to address the lack of anno?tated data through assist… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

  28. arXiv:2204.04216  [pdf, other

    eess.IV cs.CV

    Learning Trajectory-Aware Transformer for Video Super-Resolution

    Authors: Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian

    Abstract: Video super-resolution (VSR) aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts. Although some progress has been made, there are grand challenges to effectively utilize temporal dependency in entire video sequences. Existing approaches usually align and aggregate video frames from limited adjacent frames (e.g., 5 or 7 frames), which prevents these… ▽ More

    Submitted 20 April, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: CVPR 2022 Oral

  29. arXiv:2203.16840  [pdf, other

    eess.AS cs.CV cs.SD

    Speaker Extraction with Co-Speech Gestures Cue

    Authors: Zexu Pan, Xinyuan Qian, Haizhou Li

    Abstract: Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker mixture speech. There have been studies to use a pre-recorded speech sample or face image of the target speaker as the speaker cue. In human communication, co-speech gestures that are naturally timed with speech also contribute to speech perception. In this work, we explore the use of co-speech gestures se… ▽ More

    Submitted 10 May, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by IEEE Signal Processing Letters

  30. arXiv:2110.02265  [pdf, other

    stat.ME eess.SP

    Adaptive Group Testing with Mismatched Models

    Authors: Mingzhou Fan, Byung-Jun Yoon, Francis J. Alexander, Edward R. Dougherty, Xiaoning Qian

    Abstract: Accurate detection of infected individuals is one of the critical steps in stop** any pandemic. When the underlying infection rate of the disease is low, testing people in groups, instead of testing each individual in the population, can be more efficient. In this work, we consider noisy adaptive group testing design with specific test sensitivity and specificity that select the optimal group gi… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: full length version for ICASSP

  31. IEEE BigData 2021 Cup: Soft Sensing at Scale

    Authors: Sergei Petrov, Chao Zhang, Jaswanth Yella, Yu Huang, Xiaoye Qian, Sthitie Bom

    Abstract: IEEE BigData 2021 Cup: Soft Sensing at Scale is a data mining competition organized by Seagate Technology, in association with the IEEE BigData 2021 conference. The scope of this challenge is to tackle the task of classifying soft sensing data with machine learning techniques. In this paper we go into the details of the challenge and describe the data set provided to participants. We define the me… ▽ More

    Submitted 7 September, 2021; originally announced September 2021.

    Comments: 4 pages, 4 figures, for IEEE Big Data Cup challenge 2021

    MSC Class: 68T01

  32. arXiv:2108.02539  [pdf, other

    cs.SD cs.DB eess.AS

    SLoClas: A Database for Joint Sound Localization and Classification

    Authors: Xinyuan Qian, Bidisha Sharma, Amine El Abridi, Haizhou Li

    Abstract: In this work, we present the development of a new database, namely Sound Localization and Classification (SLoClas) corpus, for studying and analyzing sound localization and classification. The corpus contains a total of 23.27 hours of data recorded using a 4-channel microphone array. 10 classes of sounds are played over a loudspeaker at 1.5 meters distance from the array by varying the Direction-o… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: Submitted to O-COCOSDA 2021

  33. arXiv:2107.06592  [pdf, other

    eess.AS cs.SD eess.IV

    Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection

    Authors: Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li

    Abstract: Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers. The successful ASD depends on accurate interpretation of short-term and long-term audio and visual information, as well as audio-visual interaction. Unlike the prior work where systems make decision instantaneously using short-term features, we propose a novel framework, named TalkNet, that ma… ▽ More

    Submitted 25 July, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

    Comments: ACM Multimedia 2021

  34. arXiv:2105.06107  [pdf, other

    cs.SD cs.RO eess.AS

    Multi-target DoA Estimation with an Audio-visual Fusion Mechanism

    Authors: Xinyuan Qian, Maulik Madhavi, Zexu Pan, Jiadong Wang, Haizhou Li

    Abstract: Most of the prior studies in the spatial \ac{DoA} domain focus on a single modality. However, humans use auditory and visual senses to detect the presence of sound sources. With this motivation, we propose to use neural networks with audio and visual signals for multi-speaker localization. The use of heterogeneous sensors can provide complementary information to overcome uni-modal challenges, such… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

    Comments: ICASSP 2021 accepted

  35. arXiv:2103.04235  [pdf

    eess.IV cs.CV

    Graph-based Pyramid Global Context Reasoning with a Saliency-aware Projection for COVID-19 Lung Infections Segmentation

    Authors: Huimin Huang, Ming Cai, Lanfen Lin, **g Zheng, Xiongwei Mao, Xiaohan Qian, Zhiyi Peng, Jianying Zhou, Yutaro Iwamoto, Xian-Hua Han, Yen-Wei Chen, Ruofeng Tong

    Abstract: Coronavirus Disease 2019 (COVID-19) has rapidly spread in 2020, emerging a mass of studies for lung infection segmentation from CT images. Though many methods have been proposed for this issue, it is a challenging task because of infections of various size appearing in different lobe zones. To tackle these issues, we propose a Graph-based Pyramid Global Context Reasoning (Graph-PGCR) module, which… ▽ More

    Submitted 6 March, 2021; originally announced March 2021.

  36. arXiv:2010.11630  [pdf, other

    astro-ph.IM astro-ph.GA eess.IV

    DeepGalaxy: Deducing the Properties of Galaxy Mergers from Images Using Deep Neural Networks

    Authors: Maxwell X. Cai, Jeroen Bédorf, Vikram A. Saletore, Valeriu Codreanu, Damian Podareanu, Adel Chaibi, Penny X. Qian

    Abstract: Galaxy mergers, the dynamical process during which two galaxies collide, are among the most spectacular phenomena in the Universe. During this process, the two colliding galaxies are tidally disrupted, producing significant visual features that evolve as a function of time. These visual features contain valuable clues for deducing the physical properties of the galaxy mergers. In this work, we pro… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: 7 pages, 7 figures. Accepted for publication at the 2020 IEEE/ACM Fifth Workshop on Deep Learning on Supercomputers (DLS)

  37. arXiv:2010.04653  [pdf, other

    math.OC eess.SY stat.ML

    Quantifying the multi-objective cost of uncertainty

    Authors: Byung-Jun Yoon, Xiaoning Qian, Edward R. Dougherty

    Abstract: Various real-world applications involve modeling complex systems with immense uncertainty and optimizing multiple objectives based on the uncertain model. Quantifying the impact of the model uncertainty on the given operational objectives is critical for designing optimal experiments that can most effectively reduce the uncertainty that affect the objectives pertinent to the application at hand. I… ▽ More

    Submitted 30 April, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

  38. arXiv:2010.03201  [pdf, other

    eess.IV cs.CV cs.LG

    M3Lung-Sys: A Deep Learning System for Multi-Class Lung Pneumonia Screening from CT Imaging

    Authors: Xuelin Qian, Huazhu Fu, Weiya Shi, Tao Chen, Yanwei Fu, Fei Shan, Xiangyang Xue

    Abstract: To counter the outbreak of COVID-19, the accurate diagnosis of suspected cases plays a crucial role in timely quarantine, medical treatment, and preventing the spread of the pandemic. Considering the limited training cases and resources (e.g, time and budget), we propose a Multi-task Multi-slice Deep Learning System (M3Lung-Sys) for multi-class lung pneumonia screening from CT imaging, which only… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

    Comments: IEEE Journal of Biomedical and Health Informatics (JBHI), 2020

  39. arXiv:2009.03184  [pdf

    eess.IV cs.CV cs.LG

    A New Screening Method for COVID-19 based on Ocular Feature Recognition by Machine Learning Tools

    Authors: Yanwei Fu, Feng Li, Wenxuan Wang, Haicheng Tang, Xuelin Qian, Mengwei Gu, Xiangyang Xue

    Abstract: The Coronavirus disease 2019 (COVID-19) has affected several million people. With the outbreak of the epidemic, many researchers are devoting themselves to the COVID-19 screening system. The standard practices for rapid risk screening of COVID-19 are the CT imaging or RT-PCR (real-time polymerase chain reaction). However, these methods demand professional efforts of the acquisition of CT images an… ▽ More

    Submitted 3 September, 2020; originally announced September 2020.

    Comments: technical report

  40. arXiv:2008.08278  [pdf, other

    eess.IV cs.CV

    DONet: Dual Objective Networks for Skin Lesion Segmentation

    Authors: Yaxiong Wang, Yunchao Wei, Xueming Qian, Li Zhu, Yi Yang

    Abstract: Skin lesion segmentation is a crucial step in the computer-aided diagnosis of dermoscopic images. In the last few years, deep learning based semantic segmentation methods have significantly advanced the skin lesion segmentation results. However, the current performance is still unsatisfactory due to some challenging factors such as large variety of lesion scale and ambiguous difference between les… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

    Comments: 10 pages

  41. arXiv:1909.08029  [pdf, other

    cs.DC cs.LG eess.SY

    Heterogeneity-Aware Asynchronous Decentralized Training

    Authors: Qinyi Luo, Jiaao He, Youwei Zhuo, Xuehai Qian

    Abstract: Distributed deep learning training usually adopts All-Reduce as the synchronization mechanism for data parallel algorithms due to its high performance in homogeneous environment. However, its performance is bounded by the slowest worker among all workers, and is significantly slower in heterogeneous situations. AD-PSGD, a newly proposed synchronization method which provides numerically fast conver… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.

  42. arXiv:1908.08747  [pdf, other

    eess.SP cs.IT

    Reconfigurable Intelligent Surfaces vs. Relaying: Differences, Similarities, and Performance Comparison

    Authors: M. Di Renzo, K. Ntontin, J. Song, F. H. Danufane, X. Qian, F. Lazarakis, J. de Rosny, D. -T. Phan-Huy, O. Simeone, R. Zhang, M. Debbah, G. Lerosey, M. Fink, S. Tretyakov, S. Shamai

    Abstract: Reconfigurable intelligent surfaces (RISs) have the potential of realizing the emerging concept of smart radio environments by leveraging the unique properties of meta-surfaces. In this article, we discuss the potential applications of RISs in wireless networks that operate at high-frequency bands, e.g., millimeter wave (30-100 GHz) and sub-millimeter wave (greater than 100 GHz) frequencies. When… ▽ More

    Submitted 21 February, 2020; v1 submitted 23 August, 2019; originally announced August 2019.

    Comments: Submitted for journal publication (revised version)

  43. arXiv:1907.09077  [pdf, other

    cs.NE cs.ET cs.LG eess.SP

    A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology

    Authors: Ruizhe Cai, Ao Ren, Olivia Chen, Ning Liu, Caiwen Ding, Xuehai Qian, Jie Han, Wenhui Luo, Nobuyuki Yoshikawa, Yanzhi Wang

    Abstract: The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology has been recently developed, which achieves the highest energy efficiency among superconducting logic families, potentially huge gain compared with state-of-the-art CMOS. In 2016, the successful fabrication and testing of AQFP-based circuits with the scale of 83,000 JJs have demonstrated the scalability and potential of implem… ▽ More

    Submitted 21 July, 2019; originally announced July 2019.

  44. arXiv:1902.06085  [pdf

    cs.CV eess.IV

    DC-AL GAN: Pseudoprogression and True Tumor Progression of Glioblastoma Multiform Image Classification Based on DCGAN and AlexNet

    Authors: Meiyu Li, Hailiang Tang, Michael D. Chan, Xiaobo Zhou, Xiaohua Qian

    Abstract: Pseudoprogression (PsP) occurs in 20-30% of patients with glioblastoma multiforme (GBM) after receiving the standard treatment. In the course of post-treatment magnetic resonance imaging (MRI), PsP exhibits similarities in shape and intensity to the true tumor progression (TTP) of GBM. So, these similarities pose challenges on the differentiation of these types of progression and hence the selecti… ▽ More

    Submitted 18 May, 2019; v1 submitted 16 February, 2019; originally announced February 2019.

  45. arXiv:1902.01064  [pdf, other

    cs.DC cs.LG eess.SY

    Hop: Heterogeneity-Aware Decentralized Training

    Authors: Qinyi Luo, **kun Lin, Youwei Zhuo, Xuehai Qian

    Abstract: Recent work has shown that decentralized algorithms can deliver superior performance over centralized ones in the context of machine learning. The two approaches, with the main difference residing in their distinct communication patterns, are both susceptible to performance degradation in heterogeneous environments. Although vigorous efforts have been devoted to supporting centralized algorithms a… ▽ More

    Submitted 7 February, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

  46. arXiv:1901.08983  [pdf, other

    cs.SD eess.AS

    LOCATA challenge: speaker localization with a planar array

    Authors: Xinyuan Qian, Andrea Cavallaro, Alessio Brutti, Maurizio Omologo

    Abstract: This document describes our submission to the 2018 LOCalization And TrAcking (LOCATA) challenge (Tasks 1, 3, 5). We estimate the 3D position of a speaker using the Global Coherence Field (GCF) computed from multiple microphone pairs of a DICIT planar array. One of the main challenges when using such an array with omnidirectional microphones is the front-back ambiguity, which is particularly eviden… ▽ More

    Submitted 25 January, 2019; originally announced January 2019.

    Comments: In Proceedings of the LOCATA ChallengeWorkshop - a satellite event of IWAENC 2018 (arXiv:1811.08482 )

    Report number: LOCATAchallenge/2018/05

  47. arXiv:1812.07106  [pdf, other

    cs.CV cs.LG eess.SP

    E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs

    Authors: Zhe Li, Caiwen Ding, Siyue Wang, Wujie Wen, Youwei Zhuo, Chang Liu, Qinru Qiu, Wenyao Xu, Xue Lin, Xuehai Qian, Yanzhi Wang

    Abstract: Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The two major types are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision… ▽ More

    Submitted 12 December, 2018; originally announced December 2018.

    Comments: In The 25th International Symposium on High-Performance Computer Architecture (HPCA 2019)

  48. arXiv:1808.01672  [pdf, other

    cs.IT eess.SP

    Model-Aided Wireless Artificial Intelligence: Embedding Expert Knowledge in Deep Neural Networks Towards Wireless Systems Optimization

    Authors: Alessio Zappone, Marco Di Renzo, Mérouane Debbah, Thanh Tu Lam, Xuewen Qian

    Abstract: Deep learning based on artificial neural networks is a powerful machine learning method that, in the last few years, has been successfully used to realize tasks, e.g., image classification, speech recognition, translation of languages, etc., that are usually simple to execute by human beings but extremely difficult to perform by machines. This is one of the reasons why deep learning is considered… ▽ More

    Submitted 15 June, 2019; v1 submitted 5 August, 2018; originally announced August 2018.

    Comments: Accepted for publication on the IEEE Vehicular Technology Magazine

  49. arXiv:1805.01143  [pdf, ps, other

    eess.SP stat.AP

    Experimental Design via Generalized Mean Objective Cost of Uncertainty

    Authors: Shahin Boluki, Xiaoning Qian, Edward R. Dougherty

    Abstract: The mean objective cost of uncertainty (MOCU) quantifies the performance cost of using an operator that is optimal across an uncertainty class of systems as opposed to using an operator that is optimal for a particular system. MOCU-based experimental design selects an experiment to maximally reduce MOCU, thereby gaining the greatest reduction of uncertainty impacting the operational objective. The… ▽ More

    Submitted 3 May, 2018; originally announced May 2018.

  50. arXiv:1407.5813   

    cs.RO eess.SY

    Priority-based coordination of autonomous and legacy vehicles at intersection

    Authors: Xiangjun Qian, Jean Gregoire, Fabien Moutarde, Arnaud De La Fortelle

    Abstract: Recently, researchers have proposed various autonomous intersection management techniques that enable autonomous vehicles to cross the intersection without traffic lights or stop signs. In particular, a priority-based coordination system with provable collision-free and deadlock-free features has been presented. In this paper, we extend the priority-based approach to support legacy vehicles withou… ▽ More

    Submitted 26 September, 2014; v1 submitted 22 July, 2014; originally announced July 2014.

    Comments: put in other preprint server