Search | arXiv e-print repository

arXiv:2406.08374 [pdf, other]

2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

Authors: Tianqi Chen, Jun Hou, Yinchi Zhou, Huidong Xie, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, James S. Duncan, Chi Liu, Bo Zhou

Abstract: Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate t… ▽ More Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate the non-attenuation-corrected low-dose PET (NAC-LDPET) into attenuation-corrected standard-dose PET (AC-SDPET). Recently, diffusion models have emerged as a new state-of-the-art deep learning method for image-to-image translation, better than traditional CNN-based methods. However, due to the high computation cost and memory burden, it is largely limited to 2D applications. To address these challenges, we developed a novel 2.5D Multi-view Averaging Diffusion Model (MADM) for 3D image-to-image translation with application on NAC-LDPET to AC-SDPET translation. Specifically, MADM employs separate diffusion models for axial, coronal, and sagittal views, whose outputs are averaged in each sampling step to ensure the 3D generation quality from multiple views. To accelerate the 3D sampling process, we also proposed a strategy to use the CNN-based 3D generation as a prior for the diffusion model. Our experimental results on human patient studies suggested that MADM can generate high-quality 3D translation images, outperforming previous CNN-based and Diffusion-based baseline methods. △ Less

Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 15 pages, 7 figures

arXiv:2406.06329 [pdf, other]

A Parameter-efficient Language Extension Framework for Multilingual ASR

Authors: Wei Liu, **gyong Hou, Dong Yang, Muyong Cao, Tan Lee

Abstract: Covering all languages with a multilingual speech recognition model (MASR) is very difficult. Performing language extension on top of an existing MASR is a desirable choice. In this study, the MASR continual learning problem is probabilistically decomposed into language identity prediction (LP) and cross-lingual adaptation (XLA) sub-problems. Based on this, we propose an architecture-based framewo… ▽ More Covering all languages with a multilingual speech recognition model (MASR) is very difficult. Performing language extension on top of an existing MASR is a desirable choice. In this study, the MASR continual learning problem is probabilistically decomposed into language identity prediction (LP) and cross-lingual adaptation (XLA) sub-problems. Based on this, we propose an architecture-based framework for language extension that can fundamentally solve catastrophic forgetting, debudded as PELE. PELE is designed to be parameter-efficient, incrementally incorporating an add-on module to adapt to a new language. Specifically, different parameter-efficient fine-tuning (PEFT) modules and their variants are explored as potential candidates to perform XLA. Experiments are carried out on 5 new languages with a wide range of low-resourced data sizes. The best-performing PEFT candidate can achieve satisfactory performance across all languages and demonstrates superiority in three of five languages over the continual joint learning setting. Notably, PEFT methods focusing on weight parameters or input features are revealed to be limited in performance, showing significantly inferior extension capabilities compared to inserting a lightweight module in between layers such as an Adapter. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2405.12223 [pdf, other]

Cascaded Multi-path Shortcut Diffusion Model for Medical Image Translation

Authors: Yinchi Zhou, Tianqi Chen, Jun Hou, Huidong Xie, Nicha C. Dvornek, S. Kevin Zhou, David L. Wilson, James S. Duncan, Chi Liu, Bo Zhou

Abstract: Image-to-image translation is a vital component in medical imaging processing, with many uses in a wide range of imaging modalities and clinical scenarios. Previous methods include Generative Adversarial Networks (GANs) and Diffusion Models (DMs), which offer realism but suffer from instability and lack uncertainty estimation. Even though both GAN and DM methods have individually exhibited their c… ▽ More Image-to-image translation is a vital component in medical imaging processing, with many uses in a wide range of imaging modalities and clinical scenarios. Previous methods include Generative Adversarial Networks (GANs) and Diffusion Models (DMs), which offer realism but suffer from instability and lack uncertainty estimation. Even though both GAN and DM methods have individually exhibited their capability in medical image translation tasks, the potential of combining a GAN and DM to further improve translation performance and to enable uncertainty estimation remains largely unexplored. In this work, we address these challenges by proposing a Cascade Multi-path Shortcut Diffusion Model (CMDM) for high-quality medical image translation and uncertainty estimation. To reduce the required number of iterations and ensure robust performance, our method first obtains a conditional GAN-generated prior image that will be used for the efficient reverse translation with a DM in the subsequent step. Additionally, a multi-path shortcut diffusion strategy is employed to refine translation results and estimate uncertainty. A cascaded pipeline further enhances translation quality, incorporating residual averaging between cascades. We collected three different medical image datasets with two sub-tasks for each dataset to test the generalizability of our approach. Our experimental results found that CMDM can produce high-quality translations comparable to state-of-the-art methods while providing reasonable uncertainty estimations that correlate well with the translation error. △ Less

Submitted 5 April, 2024; originally announced May 2024.

Comments: 15 pages, 5 figures

arXiv:2404.12804 [pdf, other]

Linearly-evolved Transformer for Pan-sharpening

Authors: Junming Hou, Zihan Cao, Naishan Zheng, Xuan Li, Xiaoyu Chen, Xinyang Liu, Xiaofeng Cong, Man Zhou, Danfeng Hong

Abstract: Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of… ▽ More Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of model parameters and FLOPs, thus preventing its application over low-resource satellites.To address this challenge between favorable performance and expensive computation, we tailor an efficient linearly-evolved transformer variant and employ it to construct a lightweight pan-sharpening framework. In detail, we deepen into the popular cascaded transformer modeling with cutting-edge methods and develop the alternative 1-order linearly-evolved transformer variant with the 1-dimensional linear convolution chain to achieve the same function. In this way, our proposed method is capable of benefiting the cascaded modeling rule while achieving favorable performance in the efficient manner. Extensive experiments over multiple satellite datasets suggest that our proposed method achieves competitive performance against other state-of-the-art with fewer computational resources. Further, the consistently favorable performance has been verified over the hyper-spectral image fusion task. Our main focus is to provide an alternative global modeling framework with an efficient structure. The code will be publicly available. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2403.11953 [pdf, other]

Advancing COVID-19 Detection in 3D CT Scans

Authors: Qingqiu Li, Runtian Yuan, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

Abstract: To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior kno… ▽ More To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior knowledge. Our model achieves a Macro F1 Score of 0.94 on the validation set of the 4th COV19D Competition Challenge $\mathrm{I}$, surpassing the baseline by 16%. This indicates its effectiveness in distinguishing between COVID-19 and non-COVID-19 cases, making it a robust method for COVID-19 detection. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11498 [pdf, other]

Domain Adaptation Using Pseudo Labels for COVID-19 Detection

Authors: Runtian Yuan, Qingqiu Li, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

Abstract: In response to the need for rapid and accurate COVID-19 diagnosis during the global pandemic, we present a two-stage framework that leverages pseudo labels for domain adaptation to enhance the detection of COVID-19 from CT scans. By utilizing annotated data from one domain and non-annotated data from another, the model overcomes the challenge of data scarcity and variability, common in emergent he… ▽ More In response to the need for rapid and accurate COVID-19 diagnosis during the global pandemic, we present a two-stage framework that leverages pseudo labels for domain adaptation to enhance the detection of COVID-19 from CT scans. By utilizing annotated data from one domain and non-annotated data from another, the model overcomes the challenge of data scarcity and variability, common in emergent health crises. The innovative approach of generating pseudo labels enables the model to iteratively refine its learning process, thereby improving its accuracy and adaptability across different hospitals and medical centres. Experimental results on COV19-CT-DB database showcase the model's potential to achieve high diagnostic precision, significantly contributing to efficient patient management and alleviating the strain on healthcare systems. Our method achieves 0.92 Macro F1 Score on the validation set of Covid-19 domain adaptation challenge. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2402.19020 [pdf, other]

Unsupervised Learning of High-resolution Light Field Imaging via Beam Splitter-based Hybrid Lenses

Authors: Jianxin Lei, Chengcai Xu, Langqing Shi, Junhui Hou, ** Zhou

Abstract: In this paper, we design a beam splitter-based hybrid light field imaging prototype to record 4D light field image and high-resolution 2D image simultaneously, and make a hybrid light field dataset. The 2D image could be considered as the high-resolution ground truth corresponding to the low-resolution central sub-aperture image of 4D light field image. Subsequently, we propose an unsupervised lea… ▽ More In this paper, we design a beam splitter-based hybrid light field imaging prototype to record 4D light field image and high-resolution 2D image simultaneously, and make a hybrid light field dataset. The 2D image could be considered as the high-resolution ground truth corresponding to the low-resolution central sub-aperture image of 4D light field image. Subsequently, we propose an unsupervised learning-based super-resolution framework with the hybrid light field dataset, which adaptively settles the light field spatial super-resolution problem with a complex degradation model. Specifically, we design two loss functions based on pre-trained models that enable the super-resolution network to learn the detailed features and light field parallax structure with only one ground truth. Extensive experiments demonstrate the same superiority of our approach with supervised learning-based state-of-the-art ones. To our knowledge, it is the first end-to-end unsupervised learning-based spatial super-resolution approach in light field imaging research, whose input is available from our beam splitter-based hybrid light field system. The hardware and software together may help promote the application of light field super-resolution to a great extent. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.16757 [pdf, other]

Towards Environmental Preference Based Speech Enhancement For Individualised Multi-Modal Hearing Aids

Authors: Jasper Kirton-Wingate, Shafique Ahmed, Adeel Hussain, Mandar Gogate, Kia Dashtipour, Jen-Cheng Hou, Tassadaq Hussain, Yu Tsao, Amir Hussain

Abstract: Since the advent of Deep Learning (DL), Speech Enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-da… ▽ More Since the advent of Deep Learning (DL), Speech Enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-day lifestyle. In this paper, we introduce a preference learning based SE (PLSE) model for future multi-modal HAs that can contextually exploit audio information to improve listening comfort, based upon the preferences of the user. The proposed system estimates the Signal-to-noise ratio (SNR) as a basic objective speech quality measure which quantifies the relative amount of background noise present in speech, and directly correlates to the intelligibility of the signal. Additionally, to provide contextual information we predict the acoustic scene in which the user is situated. These tasks are achieved via a multi-task DL model, which surpasses the performance of inferring the acoustic scene or SNR separately, by jointly leveraging a shared encoded feature space. These environmental inferences are exploited in a preference elicitation framework, which linearly learns a set of predictive functions to determine the target SNR of an AV (Audio-Visual) SE system. By greatly reducing noise in challenging listening conditions, and by novelly scaling the output of the SE model, we are able to provide HA users with contextually individualised SE. Preliminary results suggest an improvement over the non-individualised baseline model in some participants. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: This has been submitted to the Trends in Hearing journal

arXiv:2401.14285 [pdf, other]

POUR-Net: A Population-Prior-Aided Over-Under-Representation Network for Low-Count PET Attenuation Map Generation

Authors: Bo Zhou, Jun Hou, Tianqi Chen, Yinchi Zhou, Xiongchao Chen, Huidong Xie, Qiong Liu, Xueqi Guo, Yu-Jung Tsai, Vladimir Y. Panin, Takuya Toyonaga, James S. Duncan, Chi Liu

Abstract: Low-dose PET offers a valuable means of minimizing radiation exposure in PET imaging. However, the prevalent practice of employing additional CT scans for generating attenuation maps (u-map) for PET attenuation correction significantly elevates radiation doses. To address this concern and further mitigate radiation exposure in low-dose PET exams, we propose POUR-Net - an innovative population-prio… ▽ More Low-dose PET offers a valuable means of minimizing radiation exposure in PET imaging. However, the prevalent practice of employing additional CT scans for generating attenuation maps (u-map) for PET attenuation correction significantly elevates radiation doses. To address this concern and further mitigate radiation exposure in low-dose PET exams, we propose POUR-Net - an innovative population-prior-aided over-under-representation network that aims for high-quality attenuation map generation from low-dose PET. First, POUR-Net incorporates an over-under-representation network (OUR-Net) to facilitate efficient feature extraction, encompassing both low-resolution abstracted and fine-detail features, for assisting deep generation on the full-resolution level. Second, complementing OUR-Net, a population prior generation machine (PPGM) utilizing a comprehensive CT-derived u-map dataset, provides additional prior information to aid OUR-Net generation. The integration of OUR-Net and PPGM within a cascade framework enables iterative refinement of $μ$-map generation, resulting in the production of high-quality $μ$-maps. Experimental results underscore the effectiveness of POUR-Net, showing it as a promising solution for accurate CT-free low-count PET attenuation correction, which also surpasses the performance of previous baseline methods. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 10 pages, 5 figures

arXiv:2401.03689 [pdf, other]

LUPET: Incorporating Hierarchical Information Path into Multilingual ASR

Authors: Wei Liu, **gyong Hou, Dong Yang, Muyong Cao, Tan Lee

Abstract: Toward high-performance multilingual automatic speech recognition (ASR), various types of linguistic information and model design have demonstrated their effectiveness independently. They include language identity (LID), phoneme information, language-specific processing modules, and cross-lingual self-supervised speech representation. It is expected that leveraging their benefits synergistically i… ▽ More Toward high-performance multilingual automatic speech recognition (ASR), various types of linguistic information and model design have demonstrated their effectiveness independently. They include language identity (LID), phoneme information, language-specific processing modules, and cross-lingual self-supervised speech representation. It is expected that leveraging their benefits synergistically in a unified solution would further improve the overall system performance. This paper presents a novel design of a hierarchical information path, named LUPET, which sequentially encodes, from the shallow layers to deep layers, multiple aspects of linguistic and acoustic information at diverse granularity scales. The path starts from LID prediction, followed by acoustic unit discovery, phoneme sharing, and finally token recognition routed by a mixture-of-expert. ASR experiments are carried out on 10 languages in the Common Voice corpus. The results demonstrate the superior performance of LUPET as compared to the baseline systems. Most importantly, LUPET effectively mitigates the issue of performance compromise of high-resource languages with low-resource ones in the multilingual setting. △ Less

Submitted 10 June, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: Accepted by Interspeech 2024

arXiv:2401.02046 [pdf, other]

CTC Blank Triggered Dynamic Layer-Skip** for Efficient CTC-based Speech Recognition

Authors: Junfeng Hou, Peiyao Wang, **cheng Zhang, Meng Yang, Minwei Feng, **gcheng Yin

Abstract: Deploying end-to-end speech recognition models with limited computing resources remains challenging, despite their impressive performance. Given the gradual increase in model size and the wide range of model applications, selectively executing model components for different inputs to improve the inference efficiency is of great interest. In this paper, we propose a dynamic layer-skip** method th… ▽ More Deploying end-to-end speech recognition models with limited computing resources remains challenging, despite their impressive performance. Given the gradual increase in model size and the wide range of model applications, selectively executing model components for different inputs to improve the inference efficiency is of great interest. In this paper, we propose a dynamic layer-skip** method that leverages the CTC blank output from intermediate layers to trigger the skip** of the last few encoder layers for frames with high blank probabilities. Furthermore, we factorize the CTC output distribution and perform knowledge distillation on intermediate layers to reduce computation and improve recognition accuracy. Experimental results show that by utilizing the CTC blank, the encoder layer depth can be adjusted dynamically, resulting in 29% acceleration of the CTC model inference with minor performance degradation. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: accepted by ASRU 2023

arXiv:2310.15194 [pdf]

How do the resting EEG preprocessing states affect the outcomes of postprocessing?

Authors: Shiang Hu, Jie Ruan, Juan Hou, Pedro Antonio Valdes-Sosa, Zhao Lv

Abstract: Plenty of artifact removal tools and pipelines have been developed to correct the EEG recordings and discover the values below the waveforms. Without visual inspection from the experts, it is susceptible to derive improper preprocessing states, like the insufficient preprocessed EEG (IPE), and the excessive preprocessed EEG (EPE). However, little is known about the impacts of IPE or EPE on the pos… ▽ More Plenty of artifact removal tools and pipelines have been developed to correct the EEG recordings and discover the values below the waveforms. Without visual inspection from the experts, it is susceptible to derive improper preprocessing states, like the insufficient preprocessed EEG (IPE), and the excessive preprocessed EEG (EPE). However, little is known about the impacts of IPE or EPE on the postprocessing in the frequency, spatial and temporal domains, particularly as to the spectra and the functional connectivity (FC) analysis. Here, the clean EEG (CE) was synthesized as the ground truth based on the New-York head model and the multivariate autoregressive model. Later, the IPE and the EPE were simulated by injecting the Gaussian noise and losing the brain activities, respectively. Then, the impacts on postprocessing were quantified by the deviation caused by the IPE or EPE from the CE as to the 4 temporal statistics, the multichannel power, the cross spectra, the dispersion of source imaging, and the properties of scalp EEG network. Lastly, the association analysis was performed between the PaLOSi metric and the varying trends of postprocessing with the evolution of preprocessing states. This study shed light on how the postprocessing outcomes are affected by the preprocessing states and PaLOSi may be a potential effective quality metric. △ Less

Submitted 12 December, 2023; v1 submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.05021 [pdf, other]

Toward Intelligent Emergency Control for Large-scale Power Systems: Convergence of Learning, Physics, Computing and Control

Authors: Qiuhua Huang, Renke Huang, Tianzhixi Yin, Sohom Datta, Xueqing Sun, Jason Hou, Jie Tan, Wenhao Yu, Yuan Liu, Xinya Li, Bruce Palmer, Ang Li, Xinda Ke, Marianna Vaiman, Song Wang, Yousu Chen

Abstract: This paper has delved into the pressing need for intelligent emergency control in large-scale power systems, which are experiencing significant transformations and are operating closer to their limits with more uncertainties. Learning-based control methods are promising and have shown effectiveness for intelligent power system control. However, when they are applied to large-scale power systems, t… ▽ More This paper has delved into the pressing need for intelligent emergency control in large-scale power systems, which are experiencing significant transformations and are operating closer to their limits with more uncertainties. Learning-based control methods are promising and have shown effectiveness for intelligent power system control. However, when they are applied to large-scale power systems, there are multifaceted challenges such as scalability, adaptiveness, and security posed by the complex power system landscape, which demand comprehensive solutions. The paper first proposes and instantiates a convergence framework for integrating power systems physics, machine learning, advanced computing, and grid control to realize intelligent grid control at a large scale. Our developed methods and platform based on the convergence framework have been applied to a large (more than 3000 buses) Texas power system, and tested with 56000 scenarios. Our work achieved a 26% reduction in load shedding on average and outperformed existing rule-based control in 99.7% of the test scenarios. The results demonstrated the potential of the proposed convergence framework and DRL-based intelligent control for the future grid. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Comments: submitted to PSCC 2024

arXiv:2309.11059 [pdf, other]

Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement

Authors: Shafique Ahmed, Chia-Wei Chen, Wenze Ren, Chin-Jou Li, Ernie Chu, Jun-Cheng Chen, Amir Hussain, Hsin-Min Wang, Yu Tsao, Jen-Cheng Hou

Abstract: Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems. In this paper, we introduce a novel audio-visual SE approach, termed DCUC-Net (deep complex U-Net with conformer network). The proposed DCUC-Net leverages complex domain features and a stack of conformer blocks. The encoder and decoder of DCUC-Net are designed using a com… ▽ More Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems. In this paper, we introduce a novel audio-visual SE approach, termed DCUC-Net (deep complex U-Net with conformer network). The proposed DCUC-Net leverages complex domain features and a stack of conformer blocks. The encoder and decoder of DCUC-Net are designed using a complex U-Net-based framework. The audio and visual signals are processed using a complex encoder and a ResNet-18 model, respectively. These processed signals are then fused using the conformer blocks and transformed into enhanced speech waveforms via a complex decoder. The conformer blocks consist of a combination of self-attention mechanisms and convolutional operations, enabling DCUC-Net to effectively capture both global and local audio-visual dependencies. Our experimental results demonstrate the effectiveness of DCUC-Net, as it outperforms the baseline model from the COG-MHEAR AVSE Challenge 2023 by a notable margin of 0.14 in terms of PESQ. Additionally, the proposed DCUC-Net performs comparably to a state-of-the-art model and outperforms all other compared models on the Taiwan Mandarin speech with video (TMSV) dataset. △ Less

Submitted 8 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

arXiv:2308.05404 [pdf, other]

Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network

Authors: Xianqiang Lyu, Junhui Hou

Abstract: This paper presents a novel and interpretable end-to-end learning framework, called the deep compensation unfolding network (DCUNet), for restoring light field (LF) images captured under low-light conditions. DCUNet is designed with a multi-stage architecture that mimics the optimization process of solving an inverse imaging problem in a data-driven fashion. The framework uses the intermediate enh… ▽ More This paper presents a novel and interpretable end-to-end learning framework, called the deep compensation unfolding network (DCUNet), for restoring light field (LF) images captured under low-light conditions. DCUNet is designed with a multi-stage architecture that mimics the optimization process of solving an inverse imaging problem in a data-driven fashion. The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result. Additionally, DCUNet includes a content-associated deep compensation module at each optimization stage to suppress noise and illumination map estimation errors. To properly mine and leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module that comprehensively exploits redundant information in LF images. The experimental results on both simulated and real datasets demonstrate the superiority of our DCUNet over state-of-the-art methods, both qualitatively and quantitatively. Moreover, DCUNet preserves the essential geometric structure of enhanced LF images much better. The code will be publicly available at https://github.com/lyuxianqiang/LFLL-DCU. △ Less

Submitted 26 June, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

arXiv:2307.07748 [pdf, other]

Audio-Visual Speech Enhancement Using Self-supervised Learning to Improve Speech Intelligibility in Cochlear Implant Simulations

Authors: Richard Lee Lai, Jen-Cheng Hou, Mandar Gogate, Kia Dashtipour, Amir Hussain, Yu Tsao

Abstract: Individuals with hearing impairments face challenges in their ability to comprehend speech, particularly in noisy environments. The aim of this study is to explore the effectiveness of audio-visual speech enhancement (AVSE) in enhancing the intelligibility of vocoded speech in cochlear implant (CI) simulations. Notably, the study focuses on a challenged scenario where there is limited availability… ▽ More Individuals with hearing impairments face challenges in their ability to comprehend speech, particularly in noisy environments. The aim of this study is to explore the effectiveness of audio-visual speech enhancement (AVSE) in enhancing the intelligibility of vocoded speech in cochlear implant (CI) simulations. Notably, the study focuses on a challenged scenario where there is limited availability of training data for the AVSE task. To address this problem, we propose a novel deep neural network framework termed Self-Supervised Learning-based AVSE (SSL-AVSE). The proposed SSL-AVSE combines visual cues, such as lip and mouth movements, from the target speakers with corresponding audio signals. The contextually combined audio and visual data are then fed into a Transformer-based SSL AV-HuBERT model to extract features, which are further processed using a BLSTM-based SE model. The results demonstrate several key findings. Firstly, SSL-AVSE successfully overcomes the issue of limited data by leveraging the AV-HuBERT model. Secondly, by fine-tuning the AV-HuBERT model parameters for the target SE task, significant performance improvements are achieved. Specifically, there is a notable enhancement in PESQ (Perceptual Evaluation of Speech Quality) from 1.43 to 1.67 and in STOI (Short-Time Objective Intelligibility) from 0.70 to 0.74. Furthermore, the performance of the SSL-AVSE was evaluated using CI vocoded speech to assess the intelligibility for CI users. Comparative experimental outcomes reveal that in the presence of dynamic noises encountered during human conversations, SSL-AVSE exhibits a substantial improvement. The NCM (Normal Correlation Matrix) values indicate an increase of 26.5% to 87.2% compared to the noisy baseline. △ Less

Submitted 15 July, 2023; originally announced July 2023.

arXiv:2306.08836 [pdf, other]

doi 10.1007/s11263-023-01974-9

Probabilistic-based Feature Embedding of 4-D Light Fields for Compressive Imaging and Denoising

Authors: Xianqiang Lyu, Junhui Hou

Abstract: The high-dimensional nature of the 4-D light field (LF) poses great challenges in achieving efficient and effective feature embedding, that severely impacts the performance of downstream tasks. To tackle this crucial issue, in contrast to existing methods with empirically-designed architectures, we propose a probabilistic-based feature embedding (PFE), which learns a feature embedding architecture… ▽ More The high-dimensional nature of the 4-D light field (LF) poses great challenges in achieving efficient and effective feature embedding, that severely impacts the performance of downstream tasks. To tackle this crucial issue, in contrast to existing methods with empirically-designed architectures, we propose a probabilistic-based feature embedding (PFE), which learns a feature embedding architecture by assembling various low-dimensional convolution patterns in a probability space for fully capturing spatial-angular information. Building upon the proposed PFE, we then leverage the intrinsic linear imaging model of the coded aperture camera to construct a cycle-consistent 4-D LF reconstruction network from coded measurements. Moreover, we incorporate PFE into an iterative optimization framework for 4-D LF denoising. Our extensive experiments demonstrate the significant superiority of our methods on both real-world and synthetic 4-D LF images, both quantitatively and qualitatively, when compared with state-of-the-art methods. The source code will be publicly available at https://github.com/lyuxianqiang/LFCA-CR-NET. △ Less

Submitted 10 January, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

Journal ref: International Journal of Computer Vision (2023)

arXiv:2304.04774 [pdf, other]

DDRF: Denoising Diffusion Model for Remote Sensing Image Fusion

Authors: ZiHan Cao, ShiQi Cao, Xiao Wu, JunMing Hou, Ran Ran, Liang-Jian Deng

Abstract: Denosing diffusion model, as a generative model, has received a lot of attention in the field of image generation recently, thanks to its powerful generation capability. However, diffusion models have not yet received sufficient research in the field of image fusion. In this article, we introduce diffusion model to the image fusion field, treating the image fusion task as image-to-image translatio… ▽ More Denosing diffusion model, as a generative model, has received a lot of attention in the field of image generation recently, thanks to its powerful generation capability. However, diffusion models have not yet received sufficient research in the field of image fusion. In this article, we introduce diffusion model to the image fusion field, treating the image fusion task as image-to-image translation and designing two different conditional injection modulation modules (i.e., style transfer modulation and wavelet modulation) to inject coarse-grained style information and fine-grained high-frequency and low-frequency information into the diffusion UNet, thereby generating fused images. In addition, we also discussed the residual learning and the selection of training objectives of the diffusion model in the image fusion task. Extensive experimental results based on quantitative and qualitative assessments compared with benchmarks demonstrates state-of-the-art results and good generalization performance in image fusion tasks. Finally, it is hoped that our method can inspire other works and gain insight into this field to better apply the diffusion model to image fusion tasks. Code shall be released for better reproducibility. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2304.02389 [pdf, other]

DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images

Authors: Bo Qian, Hao Chen, Xiangning Wang, Haoxuan Che, Gitaek Kwon, Jaeyoung Kim, Sung** Choi, Seoyoung Shin, Felix Krause, Markus Unterdechler, Junlin Hou, Rui Feng, Yihao Li, Mostafa El Habib Daho, Qiang Wu, ** Zhang, Xiaokang Yang, Yiyu Cai, Wei** Jia, Huating Li, Bin Sheng

Abstract: Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and s… ▽ More Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and scientific benchmarking for diabetic retinopathy analysis using UW-OCTA images, we organized a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). The challenge consists of three tasks: segmentation of DR lesions, image quality assessment and DR grading. The scientific community responded positively to the challenge, with 11, 12, and 13 teams from geographically diverse institutes submitting different solutions in these three tasks, respectively. This paper presents a summary and analysis of the top-performing solutions and results for each task of the challenge. The obtained results from top algorithms indicate the importance of data augmentation, model architecture and ensemble of networks in improving the performance of deep learning models. These findings have the potential to enable new developments in diabetic retinopathy analysis. The challenge remains open for post-challenge registrations and submissions for benchmarking future methodology developments. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2304.00570 [pdf, other]

FedFTN: Personalized Federated Learning with Deep Feature Transformation Network for Multi-institutional Low-count PET Denoising

Authors: Bo Zhou, Huidong Xie, Qiong Liu, Xiongchao Chen, Xueqi Guo, Zhicheng Feng, Jun Hou, S. Kevin Zhou, Biao Li, Axel Rominger, Kuangyu Shi, James S. Duncan, Chi Liu

Abstract: Low-count PET is an efficient way to reduce radiation exposure and acquisition time, but the reconstructed images often suffer from low signal-to-noise ratio (SNR), thus affecting diagnosis and other downstream tasks. Recent advances in deep learning have shown great potential in improving low-count PET image quality, but acquiring a large, centralized, and diverse dataset from multiple institutio… ▽ More Low-count PET is an efficient way to reduce radiation exposure and acquisition time, but the reconstructed images often suffer from low signal-to-noise ratio (SNR), thus affecting diagnosis and other downstream tasks. Recent advances in deep learning have shown great potential in improving low-count PET image quality, but acquiring a large, centralized, and diverse dataset from multiple institutions for training a robust model is difficult due to privacy and security concerns of patient data. Moreover, low-count PET data at different institutions may have different data distribution, thus requiring personalized models. While previous federated learning (FL) algorithms enable multi-institution collaborative training without the need of aggregating local data, addressing the large domain shift in the application of multi-institutional low-count PET denoising remains a challenge and is still highly under-explored. In this work, we propose FedFTN, a personalized federated learning strategy that addresses these challenges. FedFTN uses a local deep feature transformation network (FTN) to modulate the feature outputs of a globally shared denoising network, enabling personalized low-count PET denoising for each institution. During the federated learning process, only the denoising network's weights are communicated and aggregated, while the FTN remains at the local institutions for feature transformation. We evaluated our method using a large-scale dataset of multi-institutional low-count PET imaging data from three medical centers located across three continents, and showed that FedFTN provides high-quality low-count PET images, outperforming previous baseline FL reconstruction methods across all low-count levels at all three institutions. △ Less

Submitted 6 October, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

Comments: 13 pages, 6 figures, Accepted at Medical Image Analysis Journal (MedIA)

arXiv:2303.13764 [pdf, other]

GQE-Net: A Graph-based Quality Enhancement Network for Point Cloud Color Attribute

Authors: **rui Xing, Hui Yuan, Raouf Hamzaoui, Hao Liu, Junhui Hou

Abstract: In recent years, point clouds have become increasingly popular for representing three-dimensional (3D) visual objects and scenes. To efficiently store and transmit point clouds, compression methods have been developed, but they often result in a degradation of quality. To reduce color distortion in point clouds, we propose a graph-based quality enhancement network (GQE-Net) that uses geometry info… ▽ More In recent years, point clouds have become increasingly popular for representing three-dimensional (3D) visual objects and scenes. To efficiently store and transmit point clouds, compression methods have been developed, but they often result in a degradation of quality. To reduce color distortion in point clouds, we propose a graph-based quality enhancement network (GQE-Net) that uses geometry information as an auxiliary input and graph convolution blocks to extract local features efficiently. Specifically, we use a parallel-serial graph attention module with a multi-head graph attention mechanism to focus on important points or features and help them fuse together. Additionally, we design a feature refinement module that takes into account the normals and geometry distance between points. To work within the limitations of GPU memory capacity, the distorted point cloud is divided into overlap-allowed 3D patches, which are sent to GQE-Net for quality enhancement. To account for differences in data distribution among different color components, three models are trained for the three color components. Experimental results show that our method achieves state-of-the-art performance. For example, when implementing GQE-Net on a recent test model of the geometry-based point cloud compression (G-PCC) standard, 0.43 dB, 0.25 dB, and 0.36 dB Bjontegaard delta (BD)-peak-signal-to-noise ratio (PSNR), corresponding to 14.0%, 9.3%, and 14.5% BD-rate savings can be achieved on dense point clouds for the Y, Cb, and Cr components, respectively. The source code of our method is available at https://github.com/xjr998/GQE-Net. △ Less

Submitted 6 November, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE TIP (DOI: 10.1109/TIP.2023.3330086)

arXiv:2303.07623 [pdf, other]

Uncertainty-weighted Multi-tasking for $T_{1ρ}$ and T$_2$ Map** in the Liver with Self-supervised Learning

Authors: Chaoxing Huang, Yurui Qian, Jian Hou, Baiyan Jiang, Queenie Chan, Vincent WS Wong, Winnie CW Chu, Weitian Chen

Abstract: Multi-parametric map** of MRI relaxations in liver has the potential of revealing pathological information of the liver. A self-supervised learning based multi-parametric map** method is proposed to map T$T_{1ρ}$ and T$_2$ simultaneously, by utilising the relaxation constraint in the learning process. Data noise of different map** tasks is utilised to make the model uncertainty-aware, which… ▽ More Multi-parametric map** of MRI relaxations in liver has the potential of revealing pathological information of the liver. A self-supervised learning based multi-parametric map** method is proposed to map T$T_{1ρ}$ and T$_2$ simultaneously, by utilising the relaxation constraint in the learning process. Data noise of different map** tasks is utilised to make the model uncertainty-aware, which adaptively weight different map** tasks during learning. The method was examined on a dataset of 51 patients with non-alcoholic fatter liver disease. Results showed that the proposed method can produce comparable parametric maps to the traditional multi-contrast pixel wise fitting method, with a reduced number of images and less computation time. The uncertainty weighting also improves the model performance. It has the potential of accelerating MRI quantitative imaging. △ Less

Submitted 14 March, 2023; originally announced March 2023.

arXiv:2302.07597 [pdf, other]

Preventive-Corrective Cyber-Defense: Attack-Induced Region Minimization and Cybersecurity Margin Maximization

Authors: Jiazuo Hou, Fei Teng, Wenqian Yin, Yue Song, Yunhe Hou

Abstract: False data injection (FDI) cyber-attacks on power systems can be prevented by strategically selecting and protecting a sufficiently large measurement subset, which, however, requires adequate cyber-defense resources for measurement protection. With any given cyber-defense resource, this paper proposes a preventive-corrective cyber-defense strategy, which minimizes the FDI attack-induced region in… ▽ More False data injection (FDI) cyber-attacks on power systems can be prevented by strategically selecting and protecting a sufficiently large measurement subset, which, however, requires adequate cyber-defense resources for measurement protection. With any given cyber-defense resource, this paper proposes a preventive-corrective cyber-defense strategy, which minimizes the FDI attack-induced region in a preventive manner, followed by maximizing the cybersecurity margin in a corrective manner. First, this paper proposes a preventive cyber-defense strategy that minimizes the volume of the FDI attack-induced region via preventive allocation of any given measurement protection resource. Particularly, a sufficient condition for constructing the FDI unattackable lines is proposed, indicating that the FDI cyber-attack could be locally rather than globally prevented. Then, given a non-empty FDI attack-induced region, this paper proposes a corrective cyber-defense strategy that maximizes the cybersecurity margin, leading to a trade-off between the safest-but-expensive operation point (i.e., Euclidean Chebyshev center) and the cheapest-but-dangerous operation point. Simulation results on a modified IEEE 14 bus system verify the effectiveness and cost-effectiveness of the proposed preventive-corrective cyber-defense strategy. △ Less

Submitted 13 November, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

arXiv:2301.06132 [pdf, other]

Deep Diversity-Enhanced Feature Representation of Hyperspectral Images

Authors: **hui Hou, Zhiyu Zhu, Junhui Hou, Hui Liu, Huanqiang Zeng, Deyu Meng

Abstract: In this paper, we study the problem of efficiently and effectively embedding the high-dimensional spatio-spectral information of hyperspectral (HS) images, guided by feature diversity. Specifically, based on the theoretical formulation that feature diversity is correlated with the rank of the unfolded kernel matrix, we rectify 3D convolution by modifying its topology to enhance the rank upper-boun… ▽ More In this paper, we study the problem of efficiently and effectively embedding the high-dimensional spatio-spectral information of hyperspectral (HS) images, guided by feature diversity. Specifically, based on the theoretical formulation that feature diversity is correlated with the rank of the unfolded kernel matrix, we rectify 3D convolution by modifying its topology to enhance the rank upper-bound. This modification yields a rank-enhanced spatial-spectral symmetrical convolution set (ReS$^3$-ConvSet), which not only learns diverse and powerful feature representations but also saves network parameters. Additionally, we also propose a novel diversity-aware regularization (DA-Reg) term that directly acts on the feature maps to maximize independence among elements. To demonstrate the superiority of the proposed ReS$^3$-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks, including denoising, spatial super-resolution, and classification. Extensive experiments show that the proposed approaches outperform state-of-the-art methods both quantitatively and qualitatively to a significant extent. The code is publicly available at https://github.com/**nh/ReSSS-ConvSet. △ Less

Submitted 9 May, 2024; v1 submitted 15 January, 2023; originally announced January 2023.

Comments: 17 pages, 12 figures. Accepted in TPAMI 2024. arXiv admin note: substantial text overlap with arXiv:2207.04266

arXiv:2211.14559 [pdf, other]

Boosting COVID-19 Severity Detection with Infection-aware Contrastive Mixup Classification

Authors: Junlin Hou, Jilan Xu, Nan Zhang, Yuejie Zhang, Xiaobo Zhang, Rui Feng

Abstract: This paper presents our solution for the 2nd COVID-19 Severity Detection Competition. This task aims to distinguish the Mild, Moderate, Severe, and Critical grades in COVID-19 chest CT images. In our approach, we devise a novel infection-aware 3D Contrastive Mixup Classification network for severity grading. Specifcally, we train two segmentation networks to first extract the lung region and then… ▽ More This paper presents our solution for the 2nd COVID-19 Severity Detection Competition. This task aims to distinguish the Mild, Moderate, Severe, and Critical grades in COVID-19 chest CT images. In our approach, we devise a novel infection-aware 3D Contrastive Mixup Classification network for severity grading. Specifcally, we train two segmentation networks to first extract the lung region and then the inner lesion region. The lesion segmentation mask serves as complementary information for the original CT slices. To relieve the issue of imbalanced data distribution, we further improve the advanced Contrastive Mixup Classification network by weighted cross-entropy loss. On the COVID-19 severity detection leaderboard, our approach won the first place with a Macro F1 Score of 51.76%. It significantly outperforms the baseline method by over 11.46%. △ Less

Submitted 1 December, 2022; v1 submitted 26 November, 2022; originally announced November 2022.

Comments: ECCV AIMIA Workshop 2022

arXiv:2211.14557 [pdf, other]

CMC v2: Towards More Accurate COVID-19 Detection with Discriminative Video Priors

Authors: Junlin Hou, Jilan Xu, Nan Zhang, Yi Wang, Yuejie Zhang, Xiaobo Zhang, Rui Feng

Abstract: This paper presents our solution for the 2nd COVID-19 Competition, occurring in the framework of the AIMIA Workshop at the European Conference on Computer Vision (ECCV 2022). In our approach, we employ the winning solution last year which uses a strong 3D Contrastive Mixup Classifcation network (CMC v1) as the baseline method, composed of contrastive representation learning and mixup classificatio… ▽ More This paper presents our solution for the 2nd COVID-19 Competition, occurring in the framework of the AIMIA Workshop at the European Conference on Computer Vision (ECCV 2022). In our approach, we employ the winning solution last year which uses a strong 3D Contrastive Mixup Classifcation network (CMC v1) as the baseline method, composed of contrastive representation learning and mixup classification. In this paper, we propose CMC v2 by introducing natural video priors to COVID-19 diagnosis. Specifcally, we adapt a pre-trained (on video dataset) video transformer backbone to COVID-19 detection. Moreover, advanced training strategies, including hybrid mixup and cutmix, slicelevel augmentation, and small resolution training are also utilized to boost the robustness and the generalization ability of the model. Among 14 participating teams, CMC v2 ranked 1st in the 2nd COVID-19 Competition with an average Macro F1 Score of 89.11%. △ Less

Submitted 26 November, 2022; originally announced November 2022.

Comments: ECCV AIMIA Workshop 2022

arXiv:2211.05910 [pdf, other]

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

arXiv:2211.04894 [pdf, other]

Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

Authors: Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, **gwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

Abstract: The rapid increase in user-generated-content (UGC) videos calls for the development of effective video quality assessment (VQA) algorithms. However, the objective of the UGC-VQA problem is still ambiguous and can be viewed from two perspectives: the technical perspective, measuring the perception of distortions; and the aesthetic perspective, which relates to preference and recommendation on conte… ▽ More The rapid increase in user-generated-content (UGC) videos calls for the development of effective video quality assessment (VQA) algorithms. However, the objective of the UGC-VQA problem is still ambiguous and can be viewed from two perspectives: the technical perspective, measuring the perception of distortions; and the aesthetic perspective, which relates to preference and recommendation on contents. To understand how these two perspectives affect overall subjective opinions in UGC-VQA, we conduct a large-scale subjective study to collect human quality opinions on overall quality of videos as well as perceptions from aesthetic and technical perspectives. The collected Disentangled Video Quality Database (DIVIDE-3k) confirms that human quality opinions on UGC videos are universally and inevitably affected by both aesthetic and technical perspectives. In light of this, we propose the Disentangled Objective Video Quality Evaluator (DOVER) to learn the quality of UGC videos based on the two perspectives. The DOVER proves state-of-the-art performance in UGC-VQA under very high efficiency. With perspective opinions in DIVIDE-3k, we further propose DOVER++, the first approach to provide reliable clear-cut quality evaluations from a single aesthetic or technical perspective. Code at https://github.com/VQAssessment/DOVER. △ Less

Submitted 7 March, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2211.04098 [pdf, other]

Abstraction-Based Verification of Approximate Pre-Opacity for Control Systems

Authors: Junyao Hou, Siyuan Liu, Xiang Yin, Majid Zamani

Abstract: In this paper, we consider the problem of verifying pre-opacity for discrete-time control systems. Pre-opacity is an important information-flow security property that secures the intention of a system to execute some secret behaviors in the future. Existing works on pre-opacity only consider non-metric discrete systems, where it is assumed that intruders can distinguish different output behaviors… ▽ More In this paper, we consider the problem of verifying pre-opacity for discrete-time control systems. Pre-opacity is an important information-flow security property that secures the intention of a system to execute some secret behaviors in the future. Existing works on pre-opacity only consider non-metric discrete systems, where it is assumed that intruders can distinguish different output behaviors precisely. However, for continuous-space control systems whose output sets are equipped with metrics (which is the case for most real-world applications), it is too restrictive to assume precise measurements from outside observers. In this paper, we first introduce a concept of approximate pre-opacity by capturing the security level of control systems with respect to the measurement precision of the intruder. Based on this new notion of pre-opacity, we propose a verification approach for continuous-space control systems by leveraging abstraction-based techniques. In particular, a new concept of approximate pre-opacity preserving simulation relation is introduced to characterize the distance between two systems in terms of preserving pre-opacity. This new system relation allows us to verify pre-opacity of complex continuous-space control systems using their finite abstractions. We also present a method to construct pre-opacity preserving finite abstractions for a class of discrete-time control systems under certain stability assumptions. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: Discrete Event Systems, Opacity, Formal Abstractions

arXiv:2211.02419 [pdf, other]

High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss

Authors: Yucong Lin, **hua Su, Yuhang Li, Yuhao Wei, Hanchao Yan, Saining Zhang, Jiaan Luo, Danni Ai, Hong Song, **gfan Fan, Tianyu Fu, Deqiang Xiao, Feifei Wang, Jue Hou, Jian Yang

Abstract: Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We develope… ▽ More Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2211.01601 [pdf]

A Fast Solution Method for Large-scale Unit Commitment Based on Lagrangian Relaxation and Dynamic Programming

Authors: Jiangwei Hou, Qiaozhu Zhai, Yuzhou Zhou, Xiaohong Guan

Abstract: The unit commitment problem (UC) is crucial for the operation and market mechanism of power systems. With the development of modern electricity, the scale of power systems is expanding, and solving the UC problem is also becoming more and more difficult. To this end, this paper proposes a new fast solution method based on Lagrangian relaxation and dynamic program-ming. Firstly, the UC solution is… ▽ More The unit commitment problem (UC) is crucial for the operation and market mechanism of power systems. With the development of modern electricity, the scale of power systems is expanding, and solving the UC problem is also becoming more and more difficult. To this end, this paper proposes a new fast solution method based on Lagrangian relaxation and dynamic program-ming. Firstly, the UC solution is estimated to be an initial trial UC solution by a fast method based on Lagrangian relaxation. This initial trial UC solution fully considers the system-wide con-straints. Secondly, a dynamic programming module is introduced to adjust the trial UC solution to make it satisfy the unit-wise constraints. Thirdly, a method for constructing a feasible UC solution is proposed based on the adjusted trial UC solution. Specifically, a feasibility-testing model and an updating strategy for the trial UC solution are established in this part. Numerical tests are implemented on IEEE 24-bus, IEEE 118-bus, Polish 2383-bus, and French 6468-bus systems, which verify the effec-tiveness and efficiency of the proposed method. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: 10 pages, journal paper, transactions

arXiv:2210.17456 [pdf, other]

Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal Self-Supervised Embeddings

Authors: I-Chun Chern, Kuo-Hsuan Hung, Yi-Ting Chen, Tassadaq Hussain, Mandar Gogate, Amir Hussain, Yu Tsao, Jen-Cheng Hou

Abstract: AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-moda… ▽ More AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-modal AV regression tasks, such as audio-visual speech enhancement (AVSE) and audio-visual speech separation (AVSS). In this study, we leveraged the pre-trained AV-HuBERT model followed by an SE module for AVSE and AVSS. Comparative experimental results demonstrate that our proposed model performs better than the state-of-the-art AVSE and traditional audio-only SE models. In summary, our results confirm the effectiveness of our proposed model for the AVSS task with proper fine-tuning strategies, demonstrating that multi-modal self-supervised embeddings obtained from AV-HuBERT can be generalized to audio-visual regression tasks. △ Less

Submitted 31 May, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: ICASSP AMHAT 2023

arXiv:2210.16743 [pdf, other]

WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit

Authors: Jie Wang, Menglong Xu, **gyong Hou, Binbin Zhang, Xiao-Lei Zhang, Lei Xie, Fu** Pan

Abstract: Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices. Recently, end-to-end (E2E) methods have become the most popular approach for on-device KWS tasks. However, there is still a gap between the research and deployment of E2E KWS methods. In this paper, we introduce WeKws, a production-quality, easy-to-build, and convenient-t… ▽ More Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices. Recently, end-to-end (E2E) methods have become the most popular approach for on-device KWS tasks. However, there is still a gap between the research and deployment of E2E KWS methods. In this paper, we introduce WeKws, a production-quality, easy-to-build, and convenient-to-be-applied E2E KWS toolkit. WeKws contains the implementations of several state-of-the-art backbone networks, making it achieve highly competitive results on three publicly available datasets. To make WeKws a pure E2E toolkit, we utilize a refined max-pooling loss to make the model learn the ending position of the keyword by itself, which significantly simplifies the training pipeline and makes WeKws very efficient to be applied in real-world scenarios. The toolkit is publicly available at https://github.com/wenet-e2e/wekws. △ Less

Submitted 30 October, 2022; originally announced October 2022.

arXiv:2210.07749

LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge

Authors: Yan Jia, Mi Hong, **gyu Hou, Kailong Ren, Sifan Ma, ** Wang, Fangzhen Peng, Yinglin Ji, Lin Yang, Junjie Wang

Abstract: This paper describes LeVoice automatic speech recognition systems to track2 of intelligent cockpit speech recognition challenge 2022. Track2 is a speech recognition task without limits on the scope of model size. Our main points include deep learning based speech enhancement, text-to-speech based speech generation, training data augmentation via various techniques and speech recognition model fusi… ▽ More This paper describes LeVoice automatic speech recognition systems to track2 of intelligent cockpit speech recognition challenge 2022. Track2 is a speech recognition task without limits on the scope of model size. Our main points include deep learning based speech enhancement, text-to-speech based speech generation, training data augmentation via various techniques and speech recognition model fusion. We compared and fused the hybrid architecture and two kinds of end-to-end architecture. For end-to-end modeling, we used models based on connectionist temporal classification/attention-based encoder-decoder architecture and recurrent neural network transducer/attention-based encoder-decoder architecture. The performance of these models is evaluated with an additional language model to improve word error rates. As a result, our system achieved 10.2\% character error rate on the challenge test set data and ranked third place among the submitted systems in the challenge. △ Less

Submitted 16 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: There are experimental errors

arXiv:2210.00515 [pdf, other]

Deep-OCTA: Ensemble Deep Learning Approaches for Diabetic Retinopathy Analysis on OCTA Images

Authors: Junlin Hou, Fan Xiao, Jilan Xu, Yuejie Zhang, Haidong Zou, Rui Feng

Abstract: The ultra-wide optical coherence tomography angiography (OCTA) has become an important imaging modality in diabetic retinopathy (DR) diagnosis. However, there are few researches focusing on automatic DR analysis using ultra-wide OCTA. In this paper, we present novel and practical deep-learning solutions based on ultra-wide OCTA for the Diabetic Retinopathy Analysis Challenge (DRAC). In the segment… ▽ More The ultra-wide optical coherence tomography angiography (OCTA) has become an important imaging modality in diabetic retinopathy (DR) diagnosis. However, there are few researches focusing on automatic DR analysis using ultra-wide OCTA. In this paper, we present novel and practical deep-learning solutions based on ultra-wide OCTA for the Diabetic Retinopathy Analysis Challenge (DRAC). In the segmentation of DR lesions task, we utilize UNet and UNet++ to segment three lesions with strong data augmentation and model ensemble. In the image quality assessment task, we create an ensemble of InceptionV3, SE-ResNeXt, and Vision Transformer models. Pre-training on the large dataset as well as the hybrid MixUp and CutMix strategy are both adopted to boost the generalization ability of our model. In the DR grading task, we build a Vision Transformer (ViT) and fnd that the ViT model pre-trained on color fundus images serves as a useful substrate for OCTA images. Our proposed methods ranked 4th, 3rd, and 5th on the three leaderboards of DRAC, respectively. The source code will be made available at https://github.com/FDU-VTS/DRAC. △ Less

Submitted 2 October, 2022; originally announced October 2022.

arXiv:2207.04266 [pdf, other]

Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image Denoising

Authors: **hui Hou, Zhiyu Zhu, Hui Liu, Junhui Hou

Abstract: This paper tackles the challenging problem of hyperspectral (HS) image denoising. Unlike existing deep learning-based methods usually adopting complicated network architectures or empirically stacking off-the-shelf modules to pursue performance improvement, we focus on the efficient and effective feature extraction manner for capturing the high-dimensional characteristics of HS images. To be speci… ▽ More This paper tackles the challenging problem of hyperspectral (HS) image denoising. Unlike existing deep learning-based methods usually adopting complicated network architectures or empirically stacking off-the-shelf modules to pursue performance improvement, we focus on the efficient and effective feature extraction manner for capturing the high-dimensional characteristics of HS images. To be specific, based on the theoretical analysis that increasing the rank of the matrix formed by the unfolded convolutional kernels can promote feature diversity, we propose rank-enhanced low-dimensional convolution set (Re-ConvSet), which separately performs 1-D convolution along the three dimensions of an HS image side-by-side, and then aggregates the resulting spatial-spectral embeddings via a learnable compression layer. Re-ConvSet not only learns the diverse spatial-spectral features of HS images, but also reduces the parameters and complexity of the network. We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method. Surprisingly, we observe such a concise framework outperforms the most recent method to a large extent in terms of quantitative metrics, visual results, and efficiency. We believe our work may shed light on deep learning-based HS image processing and analysis. △ Less

Submitted 9 July, 2022; originally announced July 2022.

Comments: 10 pages, 8 figures

arXiv:2207.03105 [pdf]

doi 10.1088/1361-6560/ac9e3e

Uncertainty-Aware Self-supervised Neural Network for Liver $T_{1ρ}$ Map** with Relaxation Constraint

Authors: Chaoxing Huang, Yurui Qian, Simon Chun Ho Yu, Jian Hou, Baiyan Jiang, Queenie Chan, Vincent Wai-Sun Wong, Winnie Chiu-Wing Chu, Weitian Chen

Abstract: $T_{1ρ}$ map** is a promising quantitative MRI technique for the non-invasive assessment of tissue properties. Learning-based approaches can map $T_{1ρ}$ from a reduced number of $T_{1ρ}$ weighted images, but requires significant amounts of high quality training data. Moreover, existing methods do not provide the confidence level of the $T_{1ρ}… ▽ More $T_{1ρ}$ map** is a promising quantitative MRI technique for the non-invasive assessment of tissue properties. Learning-based approaches can map $T_{1ρ}$ from a reduced number of $T_{1ρ}$ weighted images, but requires significant amounts of high quality training data. Moreover, existing methods do not provide the confidence level of the $T_{1ρ}$ estimation. To address these problems, we proposed a self-supervised learning neural network that learns a $T_{1ρ}$ map** using the relaxation constraint in the learning process. Epistemic uncertainty and aleatoric uncertainty are modelled for the $T_{1ρ}$ quantification network to provide a Bayesian confidence estimation of the $T_{1ρ}$ map**. The uncertainty estimation can also regularize the model to prevent it from learning imperfect data. We conducted experiments on $T_{1ρ}$ data collected from 52 patients with non-alcoholic fatty liver disease. The results showed that our method outperformed the existing methods for $T_{1ρ}$ quantification of the liver using as few as two $T_{1ρ}$-weighted images. Our uncertainty estimation provided a feasible way of modelling the confidence of the self-supervised learning based $T_{1ρ}$ estimation, which is consistent with the reality in liver $T_{1ρ}$ imaging. △ Less

Submitted 25 October, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

Comments: Provisionally accepted by Physics in Medicine and Biology

arXiv:2207.01909 [pdf, other]

StyleFlow For Content-Fixed Image to Image Translation

Authors: Weichen Fan, **ghuan Chen, Jiabin Ma, Jun Hou, Shuai Yi

Abstract: Image-to-image (I2I) translation is a challenging topic in computer vision. We divide this problem into three tasks: strongly constrained translation, normally constrained translation, and weakly constrained translation. The constraint here indicates the extent to which the content or semantic information in the original image is preserved. Although previous approaches have achieved good performan… ▽ More Image-to-image (I2I) translation is a challenging topic in computer vision. We divide this problem into three tasks: strongly constrained translation, normally constrained translation, and weakly constrained translation. The constraint here indicates the extent to which the content or semantic information in the original image is preserved. Although previous approaches have achieved good performance in weakly constrained tasks, they failed to fully preserve the content in both strongly and normally constrained tasks, including photo-realism synthesis, style transfer, and colorization, etc. To achieve content-preserving transfer in strongly constrained and normally constrained tasks, we propose StyleFlow, a new I2I translation model that consists of normalizing flows and a novel Style-Aware Normalization (SAN) module. With the invertible network structure, StyleFlow first projects input images into deep feature space in the forward pass, while the backward pass utilizes the SAN module to perform content-fixed feature transformation and then projects back to image space. Our model supports both image-guided translation and multi-modal synthesis. We evaluate our model in several I2I translation benchmarks, and the results show that the proposed model has advantages over previous methods in both strongly constrained and normally constrained tasks. △ Less

Submitted 5 July, 2022; originally announced July 2022.

arXiv:2207.01758 [pdf, other]

FDVTS's Solution for 2nd COV19D Competition on COVID-19 Detection and Severity Analysis

Authors: Junlin Hou, Jilan Xu, Rui Feng, Yuejie Zhang

Abstract: This paper presents our solution for the 2nd COVID-19 Competition, occurring in the framework of the AIMIA Workshop in the European Conference on Computer Vision (ECCV 2022). In our approach, we employ an effective 3D Contrastive Mixup Classification network for COVID-19 diagnosis on chest CT images, which is composed of contrastive representation learning and mixup classification. For the COVID-1… ▽ More This paper presents our solution for the 2nd COVID-19 Competition, occurring in the framework of the AIMIA Workshop in the European Conference on Computer Vision (ECCV 2022). In our approach, we employ an effective 3D Contrastive Mixup Classification network for COVID-19 diagnosis on chest CT images, which is composed of contrastive representation learning and mixup classification. For the COVID-19 detection challenge, our approach reaches 0.9245 macro F1 score on 484 validation CT scans, which significantly outperforms the baseline method by 16.5%. In the COVID-19 severity detection challenge, our approach achieves 0.7186 macro F1 score on 61 validation samples, which also surpasses the baseline by 8.86%. △ Less

Submitted 4 July, 2022; originally announced July 2022.

arXiv:2205.14887 [pdf, other]

doi 10.1109/TIP.2022.3201478

Deep Posterior Distribution-based Embedding for Hyperspectral Image Super-resolution

Authors: **hui Hou, Zhiyu Zhu, Junhui Hou, Huanqiang Zeng, **jian Wu, Jiantao Zhou

Abstract: In this paper, we investigate the problem of hyperspectral (HS) image spatial super-resolution via deep learning. Particularly, we focus on how to embed the high-dimensional spatial-spectral information of HS images efficiently and effectively. Specifically, in contrast to existing methods adopting empirically-designed network modules, we formulate HS embedding as an approximation of the posterior… ▽ More In this paper, we investigate the problem of hyperspectral (HS) image spatial super-resolution via deep learning. Particularly, we focus on how to embed the high-dimensional spatial-spectral information of HS images efficiently and effectively. Specifically, in contrast to existing methods adopting empirically-designed network modules, we formulate HS embedding as an approximation of the posterior distribution of a set of carefully-defined HS embedding events, including layer-wise spatial-spectral feature extraction and network-level feature aggregation. Then, we incorporate the proposed feature embedding scheme into a source-consistent super-resolution framework that is physically-interpretable, producing lightweight PDE-Net, in which high-resolution (HR) HS images are iteratively refined from the residuals between input low-resolution (LR) HS images and pseudo-LR-HS images degenerated from reconstructed HR-HS images via probability-inspired HS embedding. Extensive experiments over three common benchmark datasets demonstrate that PDE-Net achieves superior performance over state-of-the-art methods. Besides, the probabilistic characteristic of this kind of networks can provide the epistemic uncertainty of the network outputs, which may bring additional benefits when used for other HS image-based applications. The code will be publicly available at https://github.com/**nh/PDE-Net. △ Less

Submitted 23 August, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: Accepted by IEEE Transactions on Image Processing

arXiv:2203.00914 [pdf, other]

doi 10.1109/TIP.2022.3222918

PUFA-GAN: A Frequency-Aware Generative Adversarial Network for 3D Point Cloud Upsampling

Authors: Hao Liu, Hui Yuan, Junhui Hou, Raouf Hamzaoui, Wei Gao

Abstract: We propose a generative adversarial network for point cloud upsampling, which can not only make the upsampled points evenly distributed on the underlying surface but also efficiently generate clean high frequency regions. The generator of our network includes a dynamic graph hierarchical residual aggregation unit and a hierarchical residual aggregation unit for point feature extraction and upsampl… ▽ More We propose a generative adversarial network for point cloud upsampling, which can not only make the upsampled points evenly distributed on the underlying surface but also efficiently generate clean high frequency regions. The generator of our network includes a dynamic graph hierarchical residual aggregation unit and a hierarchical residual aggregation unit for point feature extraction and upsampling, respectively. The former extracts multiscale point-wise descriptive features, while the latter captures rich feature details with hierarchical residuals. To generate neat edges, our discriminator uses a graph filter to extract and retain high frequency points. The generated high resolution point cloud and corresponding high frequency points help the discriminator learn the global and high frequency properties of the point cloud. We also propose an identity distribution loss function to make sure that the upsampled points remain on the underlying surface of the input low resolution point cloud. To assess the regularity of the upsampled points in high frequency regions, we introduce two evaluation metrics. Objective and subjective results demonstrate that the visual quality of the upsampled point clouds generated by our method is better than that of the state-of-the-art methods. △ Less

Submitted 2 March, 2022; originally announced March 2022.

arXiv:2110.11998 [pdf, other]

Semi-Supervised Semantic Segmentation of Vessel Images using Leaking Perturbations

Authors: **yong Hou, Xuejie Ding, Jeremiah D. Deng

Abstract: Semantic segmentation based on deep learning methods can attain appealing accuracy provided large amounts of annotated samples. However, it remains a challenging task when only limited labelled data are available, which is especially common in medical imaging. In this paper, we propose to use Leaking GAN, a GAN-based semi-supervised architecture for retina vessel semantic segmentation. Our key ide… ▽ More Semantic segmentation based on deep learning methods can attain appealing accuracy provided large amounts of annotated samples. However, it remains a challenging task when only limited labelled data are available, which is especially common in medical imaging. In this paper, we propose to use Leaking GAN, a GAN-based semi-supervised architecture for retina vessel semantic segmentation. Our key idea is to pollute the discriminator by leaking information from the generator. This leads to more moderate generations that benefit the training of GAN. As a result, the unlabelled examples can be better utilized to boost the learning of the discriminator, which eventually leads to stronger classification performance. In addition, to overcome the variations in medical images, the mean-teacher mechanism is utilized as an auxiliary regularization of the discriminator. Further, we modify the focal loss to fit it as the consistency objective for mean-teacher regularizer. Extensive experiments demonstrate that the Leaking GAN framework achieves competitive performance compared to the state-of-the-art methods when evaluated on benchmark datasets including DRIVE, STARE and CHASE\_DB1, using as few as 8 labelled images in the semi-supervised setting. It also outperforms existing algorithms on cross-domain segmentation tasks. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: To appear in WACV'22

arXiv:2108.05547 [pdf, other]

Deep Amended Gradient Descent for Efficient Spectral Reconstruction from Single RGB Images

Authors: Zhiyu Zhu, Hui Liu, Junhui Hou, Sen Jia, Qingfu Zhang

Abstract: This paper investigates the problem of recovering hyperspectral (HS) images from single RGB images. To tackle such a severely ill-posed problem, we propose a physically-interpretable, compact, efficient, and end-to-end learning-based framework, namely AGD-Net. Precisely, by taking advantage of the imaging process, we first formulate the problem explicitly based on the classic gradient descent algo… ▽ More This paper investigates the problem of recovering hyperspectral (HS) images from single RGB images. To tackle such a severely ill-posed problem, we propose a physically-interpretable, compact, efficient, and end-to-end learning-based framework, namely AGD-Net. Precisely, by taking advantage of the imaging process, we first formulate the problem explicitly based on the classic gradient descent algorithm. Then, we design a lightweight neural network with a multi-stage architecture to mimic the formed amended gradient descent process, in which efficient convolution and novel spectral zero-mean normalization are proposed to effectively extract spatial-spectral features for regressing an initialization, a basic gradient, and an incremental gradient. Besides, based on the approximate low-rank property of HS images, we propose a novel rank loss to promote the similarity between the global structures of reconstructed and ground-truth HS images, which is optimized with our singular value weighting strategy during training. Moreover, AGD-Net, a single network after one-time training, is flexible to handle the reconstruction with various spectral response functions. Extensive experiments over three commonly-used benchmark datasets demonstrate that AGD-Net can improve the reconstruction quality by more than 1.0 dB on average while saving 67$\times$ parameters and 32$\times$ FLOPs, compared with state-of-the-art methods. The code will be publicly available at https://github.com/zbzhzhy/GD-Net. △ Less

Submitted 12 August, 2021; originally announced August 2021.

arXiv:2107.06211 [pdf, other]

doi 10.1109/TIP.2022.3160070

Attention-Guided Progressive Neural Texture Fusion for High Dynamic Range Image Restoration

Authors: Jie Chen, Zaifeng Yang, Tsz Nam Chan, Hui Li, Junhui Hou, Lap-Pui Chau

Abstract: High Dynamic Range (HDR) imaging via multi-exposure fusion is an important task for most modern imaging platforms. In spite of recent developments in both hardware and algorithm innovations, challenges remain over content association ambiguities caused by saturation, motion, and various artifacts introduced during multi-exposure fusion such as ghosting, noise, and blur. In this work, we propose an… ▽ More High Dynamic Range (HDR) imaging via multi-exposure fusion is an important task for most modern imaging platforms. In spite of recent developments in both hardware and algorithm innovations, challenges remain over content association ambiguities caused by saturation, motion, and various artifacts introduced during multi-exposure fusion such as ghosting, noise, and blur. In this work, we propose an Attention-guided Progressive Neural Texture Fusion (APNT-Fusion) HDR restoration model which aims to address these issues within one framework. An efficient two-stream structure is proposed which separately focuses on texture feature transfer over saturated regions and multi-exposure tonal and texture feature fusion. A neural feature transfer mechanism is proposed which establishes spatial correspondence between different exposures based on multi-scale VGG features in the masked saturated HDR domain for discriminative contextual clues over the ambiguous image areas. A progressive texture blending module is designed to blend the encoded two-stream features in a multi-scale and progressive manner. In addition, we introduce several novel attention mechanisms, i.e., the motion attention module detects and suppresses the content discrepancies among the reference images; the saturation attention module facilitates differentiating the misalignment caused by saturation from those caused by motion; and the scale attention module ensures texture blending consistency between different coder/decoder scales. We carry out comprehensive qualitative and quantitative evaluations and ablation studies, which validate that these novel modules work coherently under the same framework and outperform state-of-the-art methods. △ Less

Submitted 13 July, 2021; originally announced July 2021.

arXiv:2105.06295 [pdf, other]

Gait Characterization in Duchenne Muscular Dystrophy (DMD) Using a Single-Sensor Accelerometer: Classical Machine Learning and Deep Learning Approaches

Authors: Albara Ah Ramli, Xin Liu, Kelly Berndt, Erica Goude, Jiahui Hou, Lynea B. Kaethler, Rex Liu, Amanda Lopez, Alina Nicorici, Corey Owens, David Rodriguez, Jane Wang, Huanle Zhang, Daniel Aranki, Craig M. McDonald, Erik K. Henricson

Abstract: Differences in gait patterns of children with Duchenne muscular dystrophy (DMD) and typically-develo** (TD) peers are visible to the eye, but quantifications of those differences outside of the gait laboratory have been elusive. In this work, we measured vertical, mediolateral, and anteroposterior acceleration using a waist-worn iPhone accelerometer during ambulation across a typical range of ve… ▽ More Differences in gait patterns of children with Duchenne muscular dystrophy (DMD) and typically-develo** (TD) peers are visible to the eye, but quantifications of those differences outside of the gait laboratory have been elusive. In this work, we measured vertical, mediolateral, and anteroposterior acceleration using a waist-worn iPhone accelerometer during ambulation across a typical range of velocities. Fifteen TD and fifteen DMD children from 3-16 years of age underwent eight walking/running activities, including five 25 meters walk/run speed-calibration tests at a slow walk to running speeds (SC-L1 to SC-L5), a 6-minute walk test (6MWT), a 100 meters fast-walk/jog/run (100MRW), and a free walk (FW). For clinical anchoring purposes, participants completed a Northstar Ambulatory Assessment (NSAA). We extracted temporospatial gait clinical features (CFs) and applied multiple machine learning (ML) approaches to differentiate between DMD and TD children using extracted temporospatial gait CFs and raw data. Extracted temporospatial gait CFs showed reduced step length and a greater mediolateral component of total power (TP) consistent with shorter strides and Trendelenberg-like gait commonly observed in DMD. ML approaches using temporospatial gait CFs and raw data varied in effectiveness at differentiating between DMD and TD controls at different speeds, with an accuracy of up to 100%. We demonstrate that by using ML with accelerometer data from a consumer-grade smartphone, we can capture DMD-associated gait characteristics in toddlers to teens. △ Less

Submitted 10 July, 2023; v1 submitted 12 May, 2021; originally announced May 2021.

arXiv:2104.10764 [pdf, other]

HMM-Free Encoder Pre-Training for Streaming RNN Transducer

Authors: Lu Huang, **gyu Sun, Yufeng Tang, Junfeng Hou, **kun Chen, Jun Zhang, Zejun Ma

Abstract: This work describes an encoder pre-training procedure using frame-wise label to improve the training of streaming recurrent neural network transducer (RNN-T) model. Streaming RNN-T trained from scratch usually performs worse than non-streaming RNN-T. Although it is common to address this issue through pre-training components of RNN-T with other criteria or frame-wise alignment guidance, the alignm… ▽ More This work describes an encoder pre-training procedure using frame-wise label to improve the training of streaming recurrent neural network transducer (RNN-T) model. Streaming RNN-T trained from scratch usually performs worse than non-streaming RNN-T. Although it is common to address this issue through pre-training components of RNN-T with other criteria or frame-wise alignment guidance, the alignment is not easily available in end-to-end manner. In this work, frame-wise alignment, used to pre-train streaming RNN-T's encoder, is generated without using a HMM-based system. Therefore an all-neural framework equip** HMM-free encoder pre-training is constructed. This is achieved by expanding the spikes of CTC model to their left/right blank frames, and two expanding strategies are proposed. To our best knowledge, this is the first work to simulate HMM-based frame-wise label using CTC model for pre-training. Experiments conducted on LibriSpeech and MLS English tasks show the proposed pre-training procedure, compared with random initialization, reduces the WER by relatively 5%~11% and the emission latency by 60 ms. Besides, the method is lexicon-free, so it is friendly to new languages without manually designed lexicon. △ Less

Submitted 10 June, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

Comments: Accepted by Interspeech 2021

arXiv:2102.13552 [pdf, other]

The NPU System for the 2020 Personalized Voice Trigger Challenge

Authors: **gyong Hou, Li Zhang, Yihui Fu, Qing Wang, Zhanheng Yang, Qijie Shao, Lei Xie

Abstract: This paper describes the system developed by the NPU team for the 2020 personalized voice trigger challenge. Our submitted system consists of two independently trained subsystems: a small footprint keyword spotting (KWS) system and a speaker verification (SV) system. For the KWS system, a multi-scale dilated temporal convolutional (MDTC) network is proposed to detect wake-up word (WuW). For SV sys… ▽ More This paper describes the system developed by the NPU team for the 2020 personalized voice trigger challenge. Our submitted system consists of two independently trained subsystems: a small footprint keyword spotting (KWS) system and a speaker verification (SV) system. For the KWS system, a multi-scale dilated temporal convolutional (MDTC) network is proposed to detect wake-up word (WuW). For SV system, Write something here. The KWS predicts posterior probabilities of whether an audio utterance contains WuW and estimates the location of WuW at the same time. When the posterior probability ofWuW reaches a predefined threshold, the identity information of triggered segment is determined by the SV system. On evaluation dataset, our submitted system obtains detection costs of 0.081and 0.091 in close talking and far-field tasks, respectively. △ Less

Submitted 26 February, 2021; originally announced February 2021.

arXiv:2102.07085 [pdf, other]

Light Field Reconstruction via Deep Adaptive Fusion of Hybrid Lenses

Authors: **g **, Mantang Guo, Junhui Hou, Hui Liu, Hongkai Xiong

Abstract: This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. The performance of existing methods is still limited, as they produce either blurry results on plain textured areas or distortions around depth discontinuous boundaries. To tackle this challenge, we propose a… ▽ More This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. The performance of existing methods is still limited, as they produce either blurry results on plain textured areas or distortions around depth discontinuous boundaries. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation, while the other module warps another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations adaptively via the learned attention maps, leading to the final high-resolution LF image with satisfactory results on both plain textured areas and depth discontinuous boundaries. Besides, to promote the effectiveness of our method trained with simulated hybrid data on real hybrid data captured by a hybrid LF imaging system, we carefully design the network architecture and the training strategy. Extensive experiments on both real and simulated hybrid data demonstrate the significant superiority of our approach over state-of-the-art ones. To the best of our knowledge, this is the first end-to-end deep learning method for LF reconstruction from a real hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and benefit LF data storage and transmission. △ Less

Submitted 17 June, 2023; v1 submitted 14 February, 2021; originally announced February 2021.

Comments: Accepted by IEEE TPAMI. arXiv admin note: text overlap with arXiv:1907.09640

arXiv:2012.04874 [pdf, other]

A Framework for Current-State Opacity under Dynamic Information Release Mechanism

Authors: Junyao Hou, Xiang Yin, Shaoyuan Li

Abstract: Opacity is an important information-flow security property that characterizes the plausible deniability of a dynamic system for its "secret" against eavesdrop** attacks. As an information-flow property, the underlying observation model is the key in the modeling and analysis of opacity. In this paper, we investigate the verification of current-state opacity for discrete-event systems under Orwel… ▽ More Opacity is an important information-flow security property that characterizes the plausible deniability of a dynamic system for its "secret" against eavesdrop** attacks. As an information-flow property, the underlying observation model is the key in the modeling and analysis of opacity. In this paper, we investigate the verification of current-state opacity for discrete-event systems under Orwellian-type observations, i.e., the system is allowed to re-interpret the observation of an event based on its future suffix. First, we propose a new Orwellian-type observation model called the dynamic information release mechanism (DIRM). In the DIRM, when to release previous "hold on" events is state-dependent. Then we propose a new definition of opacity based on the notion of history-equivalence rather than the standard projection-equivalence. This definition is more suitable for observations that are not prefix-closed. Finally, we show that by constructing a new structure called the DIRM-observer, current-state opacity can be effectively verified under the DIRM. Computational complexity analysis as well as illustrative examples for the proposed approach are also provided. Compared with the existing Orwellian-type observation model, the proposed framework is more general in the sense that the information-release-mechanism is state-dependent and the corresponding definition of opacity is more suitable for non-prefix-closed observations. △ Less

Submitted 17 January, 2022; v1 submitted 9 December, 2020; originally announced December 2020.

arXiv:2011.12688 [pdf, other]

doi 10.1109/TIP.2021.3096060

Reduced Reference Perceptual Quality Model and Application to Rate Control for 3D Point Cloud Compression

Authors: Qi Liu, Hui Yuan, Raouf Hamzaoui, Honglei Su, Junhui Hou, Huan Yang

Abstract: In rate-distortion optimization, the encoder settings are determined by maximizing a reconstruction quality measure subject to a constraint on the bit rate. One of the main challenges of this approach is to define a quality measure that can be computed with low computational cost and which correlates well with perceptual quality. While several quality measures that fulfil these two criteria have b… ▽ More In rate-distortion optimization, the encoder settings are determined by maximizing a reconstruction quality measure subject to a constraint on the bit rate. One of the main challenges of this approach is to define a quality measure that can be computed with low computational cost and which correlates well with perceptual quality. While several quality measures that fulfil these two criteria have been developed for images and video, no such one exists for 3D point clouds. We address this limitation for the video-based point cloud compression (V-PCC) standard by proposing a linear perceptual quality model whose variables are the V-PCC geometry and color quantization parameters and whose coefficients can easily be computed from two features extracted from the original 3D point cloud. Subjective quality tests with 400 compressed 3D point clouds show that the proposed model correlates well with the mean opinion score, outperforming state-of-the-art full reference objective measures in terms of Spearman rank-order and Pearsons linear correlation coefficient. Moreover, we show that for the same target bit rate, ratedistortion optimization based on the proposed model offers higher perceptual quality than rate-distortion optimization based on exhaustive search with a point-to-point objective quality metric. △ Less

Submitted 25 November, 2020; originally announced November 2020.

Comments: 14 figures and 7 tables, submitted to IEEE T IP

Showing 1–50 of 78 results for author: Hou, J