Search | arXiv e-print repository

A square cross-section FOV rotational CL (SC-CL) and its analytical reconstruction method

Authors: Xiang Zou, Wuliang Shi, Muge Du, Yuxiang Xing

Abstract: Rotational computed laminography (CL) has broad application potential in three-dimensional imaging of plate-like objects, as it only needs x-ray to pass through the tested object in the thickness direction during the imaging process. In this study, a square cross-section FOV rotational CL (SC-CL) was proposed. Then, the FDK-type analytical reconstruction algorithm applicable to the SC-CL was deriv… ▽ More Rotational computed laminography (CL) has broad application potential in three-dimensional imaging of plate-like objects, as it only needs x-ray to pass through the tested object in the thickness direction during the imaging process. In this study, a square cross-section FOV rotational CL (SC-CL) was proposed. Then, the FDK-type analytical reconstruction algorithm applicable to the SC-CL was derived. On this basis, the proposed method was validated through numerical experiments. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2401.08469 [pdf, other]

Explanations of Classifiers Enhance Medical Image Segmentation via End-to-end Pre-training

Authors: Jiamin Chen, Xuhong Li, Yanwu Xu, Mengnan Du, Haoyi Xiong

Abstract: Medical image segmentation aims to identify and locate abnormal structures in medical images, such as chest radiographs, using deep neural networks. These networks require a large number of annotated images with fine-grained masks for the regions of interest, making pre-training strategies based on classification datasets essential for sample efficiency. Based on a large-scale medical image classi… ▽ More Medical image segmentation aims to identify and locate abnormal structures in medical images, such as chest radiographs, using deep neural networks. These networks require a large number of annotated images with fine-grained masks for the regions of interest, making pre-training strategies based on classification datasets essential for sample efficiency. Based on a large-scale medical image classification dataset, our work collects explanations from well-trained classifiers to generate pseudo labels of segmentation tasks. Specifically, we offer a case study on chest radiographs and train image classifiers on the CheXpert dataset to identify 14 pathological observations in radiology. We then use Integrated Gradients (IG) method to distill and boost the explanations obtained from the classifiers, generating massive diagnosis-oriented localization labels (DoLL). These DoLL-annotated images are used for pre-training the model before fine-tuning it for downstream segmentation tasks, including COVID-19 infectious areas, lungs, heart, and clavicles. Our method outperforms other baselines, showcasing significant advantages in model performance and training efficiency across various segmentation settings. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.01755 [pdf, other]

Incremental FastPitch: Chunk-based High Quality Text to Speech

Authors: Muyang Du, Chuan Liu, Junjie Lai

Abstract: Parallel text-to-speech models have been widely applied for real-time speech synthesis, and they offer more controllability and a much faster synthesis process compared with conventional auto-regressive models. Although parallel models have benefits in many aspects, they become naturally unfit for incremental synthesis due to their fully parallel architecture such as transformer. In this work, we… ▽ More Parallel text-to-speech models have been widely applied for real-time speech synthesis, and they offer more controllability and a much faster synthesis process compared with conventional auto-regressive models. Although parallel models have benefits in many aspects, they become naturally unfit for incremental synthesis due to their fully parallel architecture such as transformer. In this work, we propose Incremental FastPitch, a novel FastPitch variant capable of incrementally producing high-quality Mel chunks by improving the architecture with chunk-based FFT blocks, training with receptive-field constrained chunk attention masks, and inference with fixed size past model states. Experimental results show that our proposal can produce speech quality comparable to the parallel FastPitch, with a significant lower latency that allows even lower response time for real-time speech applications. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: 5 pages, 4 figures, 1 table

arXiv:2312.00308 [pdf, other]

A knowledge-based data-driven (KBDD) framework for all-day identification of cloud types using satellite remote sensing

Authors: Longfeng Nie, Yuntian Chen, Mengge Du, Changqi Sun, Dongxiao Zhang

Abstract: Cloud types, as a type of meteorological data, are of particular significance for evaluating changes in rainfall, heatwaves, water resources, floods and droughts, food security and vegetation cover, as well as land use. In order to effectively utilize high-resolution geostationary observations, a knowledge-based data-driven (KBDD) framework for all-day identification of cloud types based on spectr… ▽ More Cloud types, as a type of meteorological data, are of particular significance for evaluating changes in rainfall, heatwaves, water resources, floods and droughts, food security and vegetation cover, as well as land use. In order to effectively utilize high-resolution geostationary observations, a knowledge-based data-driven (KBDD) framework for all-day identification of cloud types based on spectral information from Himawari-8/9 satellite sensors is designed. And a novel, simple and efficient network, named CldNet, is proposed. Compared with widely used semantic segmentation networks, including SegNet, PSPNet, DeepLabV3+, UNet, and ResUnet, our proposed model CldNet with an accuracy of 80.89+-2.18% is state-of-the-art in identifying cloud types and has increased by 32%, 46%, 22%, 2%, and 39%, respectively. With the assistance of auxiliary information (e.g., satellite zenith/azimuth angle, solar zenith/azimuth angle), the accuracy of CldNet-W using visible and near-infrared bands and CldNet-O not using visible and near-infrared bands on the test dataset is 82.23+-2.14% and 73.21+-2.02%, respectively. Meanwhile, the total parameters of CldNet are only 0.46M, making it easy for edge deployment. More importantly, the trained CldNet without any fine-tuning can predict cloud types with higher spatial resolution using satellite spectral data with spatial resolution 0.02°*0.02°, which indicates that CldNet possesses a strong generalization ability. In aggregate, the KBDD framework using CldNet is a highly effective cloud-type identification system capable of providing a high-fidelity, all-day, spatiotemporal cloud-type database for many climate assessment fields. △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.10349 [pdf, other]

Pseudo Label-Guided Data Fusion and Output Consistency for Semi-Supervised Medical Image Segmentation

Authors: Tao Wang, Yuanbin Chen, Xinlin Zhang, Yuanbo Zhou, Junlin Lan, Bizhe Bai, Tao Tan, Min Du, Qinquan Gao, Tong Tong

Abstract: Supervised learning algorithms based on Convolutional Neural Networks have become the benchmark for medical image segmentation tasks, but their effectiveness heavily relies on a large amount of labeled data. However, annotating medical image datasets is a laborious and time-consuming process. Inspired by semi-supervised algorithms that use both labeled and unlabeled data for training, we propose t… ▽ More Supervised learning algorithms based on Convolutional Neural Networks have become the benchmark for medical image segmentation tasks, but their effectiveness heavily relies on a large amount of labeled data. However, annotating medical image datasets is a laborious and time-consuming process. Inspired by semi-supervised algorithms that use both labeled and unlabeled data for training, we propose the PLGDF framework, which builds upon the mean teacher network for segmenting medical images with less annotation. We propose a novel pseudo-label utilization scheme, which combines labeled and unlabeled data to augment the dataset effectively. Additionally, we enforce the consistency between different scales in the decoder module of the segmentation network and propose a loss function suitable for evaluating the consistency. Moreover, we incorporate a sharpening operation on the predicted results, further enhancing the accuracy of the segmentation. Extensive experiments on three publicly available datasets demonstrate that the PLGDF framework can largely improve performance by incorporating the unlabeled data. Meanwhile, our framework yields superior performance compared to six state-of-the-art semi-supervised learning methods. The codes of this study are available at https://github.com/ortonwang/PLGDF. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2308.08181 [pdf, ps, other]

ChinaTelecom System Description to VoxCeleb Speaker Recognition Challenge 2023

Authors: Mengjie Du, Xiang Fang, Jie Li

Abstract: This technical report describes ChinaTelecom system for Track 1 (closed) of the VoxCeleb2023 Speaker Recognition Challenge (VoxSRC 2023). Our system consists of several ResNet variants trained only on VoxCeleb2, which were fused for better performance later. Score calibration was also applied for each variant and the fused system. The final submission achieved minDCF of 0.1066 and EER of 1.980%. This technical report describes ChinaTelecom system for Track 1 (closed) of the VoxCeleb2023 Speaker Recognition Challenge (VoxSRC 2023). Our system consists of several ResNet variants trained only on VoxCeleb2, which were fused for better performance later. Score calibration was also applied for each variant and the fused system. The final submission achieved minDCF of 0.1066 and EER of 1.980%. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: System description of VoxSRC 2023

arXiv:2306.16918 [pdf, other]

PCDAL: A Perturbation Consistency-Driven Active Learning Approach for Medical Image Segmentation and Classification

Authors: Tao Wang, Xinlin Zhang, Yuanbo Zhou, Junlin Lan, Tao Tan, Min Du, Qinquan Gao, Tong Tong

Abstract: In recent years, deep learning has become a breakthrough technique in assisting medical image diagnosis. Supervised learning using convolutional neural networks (CNN) provides state-of-the-art performance and has served as a benchmark for various medical image segmentation and classification. However, supervised learning deeply relies on large-scale annotated data, which is expensive, time-consumi… ▽ More In recent years, deep learning has become a breakthrough technique in assisting medical image diagnosis. Supervised learning using convolutional neural networks (CNN) provides state-of-the-art performance and has served as a benchmark for various medical image segmentation and classification. However, supervised learning deeply relies on large-scale annotated data, which is expensive, time-consuming, and even impractical to acquire in medical imaging applications. Active Learning (AL) methods have been widely applied in natural image classification tasks to reduce annotation costs by selecting more valuable examples from the unlabeled data pool. However, their application in medical image segmentation tasks is limited, and there is currently no effective and universal AL-based method specifically designed for 3D medical image segmentation. To address this limitation, we propose an AL-based method that can be simultaneously applied to 2D medical image classification, segmentation, and 3D medical image segmentation tasks. We extensively validated our proposed active learning method on three publicly available and challenging medical image datasets, Kvasir Dataset, COVID-19 Infection Segmentation Dataset, and BraTS2019 Dataset. The experimental results demonstrate that our PCDAL can achieve significantly improved performance with fewer annotations in 2D classification and segmentation and 3D segmentation tasks. The codes of this study are available at https://github.com/ortonwang/PCDAL. △ Less

Submitted 29 June, 2023; originally announced June 2023.

arXiv:2302.12864 [pdf, other]

A Data-Driven Polynomial Chaos Expansion-Based Method for Microgrid Ram** Support Capability Assessment and Enhancement

Authors: Mohan Du, Xiaozhe Wang

Abstract: Microgrids (MGs) are regarded as effective solutions to provide ram** support to the main grid during heavy-load periods. Nevertheless, the uncertain renewable energy sources (RES) and electric vehicles (EVs) integrated into an MG may affect the ram** support capability (RSC) of an MG. To address the challenge, this paper develops a data-driven sparse polynomial chaos expansion (DDSPCE)-based… ▽ More Microgrids (MGs) are regarded as effective solutions to provide ram** support to the main grid during heavy-load periods. Nevertheless, the uncertain renewable energy sources (RES) and electric vehicles (EVs) integrated into an MG may affect the ram** support capability (RSC) of an MG. To address the challenge, this paper develops a data-driven sparse polynomial chaos expansion (DDSPCE)-based method to accurately and efficiently evaluate the hour-by-hour RSC of an MG. The DDSPCE model is further exploited to identify the most influential random inputs, based on which a scheduling method of BESS is developed to enhance the RSC of an MG. Simulation results in the modified IEEE 33-bus MG shows that the proposed method takes less than 3 minutes for evaluating and enhancing the hourly RSC. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: This paper is accepted and will appear in 2023 IEEE Power & Energy Society General Meeting (GM). 5 pages, 4 figures

arXiv:2301.06595 [pdf, other]

doi 10.1364/OE.485370

PtyLab.m/py/jl: a cross-platform, open-source inverse modeling toolbox for conventional and Fourier ptychography

Authors: Lars Loetgering, Mengqi Du, Dirk Boonzajer Flaes, Tomas Aidukas, Felix Wechsler, Daniel S. Penagos Molina, Max Rose, Antonios Pelekanidis, Wilhelm Eschen, Jürgen Hess, Thomas Wilhein, Rainer Heintzmann, Jan Rothhardt, Stefan Witte

Abstract: Conventional (CP) and Fourier (FP) ptychography have emerged as versatile quantitative phase imaging techniques. While the main application cases for each technique are different, namely lens-less short wavelength imaging for CP and lens-based visible light imaging for FP, both methods share a common algorithmic ground. CP and FP have in part independently evolved to include experimentally robust… ▽ More Conventional (CP) and Fourier (FP) ptychography have emerged as versatile quantitative phase imaging techniques. While the main application cases for each technique are different, namely lens-less short wavelength imaging for CP and lens-based visible light imaging for FP, both methods share a common algorithmic ground. CP and FP have in part independently evolved to include experimentally robust forward models and inversion techniques. This separation has resulted in a plethora of algorithmic extensions, some of which have not crossed the boundary from one modality to the other. Here, we present an open source, cross-platform software, called PtyLab, enabling both CP and FP data analysis in a unified framework. With this framework, we aim to facilitate and accelerate cross-pollination between the two techniques. Moreover, the availability in Matlab, Python, and Julia will set a low barrier to enter each field. △ Less

Submitted 16 January, 2023; originally announced January 2023.

arXiv:2211.13939 [pdf, other]

Efficient Incremental Text-to-Speech on GPUs

Authors: Muyang Du, Chuan Liu, Jiaxing Qi, Junjie Lai

Abstract: Incremental text-to-speech, also known as streaming TTS, has been increasingly applied to online speech applications that require ultra-low response latency to provide an optimal user experience. However, most of the existing speech synthesis pipelines deployed on GPU are still non-incremental, which uncovers limitations in high-concurrency scenarios, especially when the pipeline is built with end… ▽ More Incremental text-to-speech, also known as streaming TTS, has been increasingly applied to online speech applications that require ultra-low response latency to provide an optimal user experience. However, most of the existing speech synthesis pipelines deployed on GPU are still non-incremental, which uncovers limitations in high-concurrency scenarios, especially when the pipeline is built with end-to-end neural network models. To address this issue, we present a highly efficient approach to perform real-time incremental TTS on GPUs with Instant Request Pooling and Module-wise Dynamic Batching. Experimental results demonstrate that the proposed method is capable of producing high-quality speech with a first-chunk latency lower than 80ms under 100 QPS on a single NVIDIA A10 GPU and significantly outperforms the non-incremental twin in both concurrency and latency. Our work reveals the effectiveness of high-performance incremental TTS on GPUs. △ Less

Submitted 5 December, 2022; v1 submitted 25 November, 2022; originally announced November 2022.

Comments: 5 pages, 4 figures

arXiv:2205.14850 [pdf, other]

Play it by Ear: Learning Skills amidst Occlusion through Audio-Visual Imitation Learning

Authors: Maximilian Du, Olivia Y. Lee, Suraj Nair, Chelsea Finn

Abstract: Humans are capable of completing a range of challenging manipulation tasks that require reasoning jointly over modalities such as vision, touch, and sound. Moreover, many such tasks are partially-observed; for example, taking a notebook out of a backpack will lead to visual occlusion and require reasoning over the history of audio or tactile information. While robust tactile sensing can be costly… ▽ More Humans are capable of completing a range of challenging manipulation tasks that require reasoning jointly over modalities such as vision, touch, and sound. Moreover, many such tasks are partially-observed; for example, taking a notebook out of a backpack will lead to visual occlusion and require reasoning over the history of audio or tactile information. While robust tactile sensing can be costly to capture on robots, microphones near or on a robot's gripper are a cheap and easy way to acquire audio feedback of contact events, which can be a surprisingly valuable data source for perception in the absence of vision. Motivated by the potential for sound to mitigate visual occlusion, we aim to learn a set of challenging partially-observed manipulation tasks from visual and audio inputs. Our proposed system learns these tasks by combining offline imitation learning from a modest number of tele-operated demonstrations and online finetuning using human provided interventions. In a set of simulated tasks, we find that our system benefits from using audio, and that by using online interventions we are able to improve the success rate of offline imitation learning by ~20%. Finally, we find that our system can complete a set of challenging, partially-observed tasks on a Franka Emika Panda robot, like extracting keys from a bag, with a 70% success rate, 50% higher than a policy that does not use audio. △ Less

Submitted 30 May, 2022; originally announced May 2022.

Journal ref: Robotics Science and Systems (RSS) 2022

arXiv:2112.06226 [pdf, other]

Attention based Broadly Self-guided Network for Low light Image Enhancement

Authors: Zilong Chen, Yaling Liang, Minghui Du

Abstract: During the past years,deep convolutional neural networks have achieved impressive success in low-light Image Enhancement.Existing deep learning methods mostly enhance the ability of feature extraction by stacking network structures and deepening the depth of the network.which causes more runtime cost on single image.In order to reduce inference time while fully extracting local features and global… ▽ More During the past years,deep convolutional neural networks have achieved impressive success in low-light Image Enhancement.Existing deep learning methods mostly enhance the ability of feature extraction by stacking network structures and deepening the depth of the network.which causes more runtime cost on single image.In order to reduce inference time while fully extracting local features and global features.Inspired by SGN,we propose a Attention based Broadly self-guided network (ABSGN) for real world low-light image Enhancement.such a broadly strategy is able to handle the noise at different exposures.The proposed network is validated by many mainstream benchmark.Additional experimental results show that the proposed network outperforms most of state-of-the-art low-light image Enhancement solutions. △ Less

Submitted 15 December, 2021; v1 submitted 12 December, 2021; originally announced December 2021.

Comments: 10 Pages,8 Figures,4 Tables

arXiv:2111.12983 [pdf, other]

Investigation of domain gap problem in several deep-learning-based CT metal artefact reduction methods

Authors: Muge Du, Kaichao Liang, Yinong Liu, Yuxiang Xing

Abstract: Metal artefacts in CT images may disrupt image quality and interfere with diagnosis. Recently many deep-learning-based CT metal artefact reduction (MAR) methods have been proposed. Current deep MAR methods may be troubled with domain gap problem, where methods trained on simulated data cannot perform well on practical data. In this work, we experimentally investigate two image-domain supervised me… ▽ More Metal artefacts in CT images may disrupt image quality and interfere with diagnosis. Recently many deep-learning-based CT metal artefact reduction (MAR) methods have been proposed. Current deep MAR methods may be troubled with domain gap problem, where methods trained on simulated data cannot perform well on practical data. In this work, we experimentally investigate two image-domain supervised methods, two dual-domain supervised methods and two image-domain unsupervised methods on a dental dataset and a torso dataset, to explore whether domain gap problem exists or is overcome. We find that I-DL-MAR and DudoNet are effective for practical data of the torso dataset, indicating the domain gap problem is solved. However, none of the investigated methods perform satisfactorily on practical data of the dental dataset. Based on the experimental results, we further analyze the causes of domain gap problem for each method and dataset, which may be beneficial for improving existing methods or designing new ones. The findings suggest that the domain gap problem in deep MAR methods remains to be addressed. △ Less

Submitted 25 November, 2021; originally announced November 2021.

arXiv:2108.11558 [pdf, ps, other]

Targeted False Data Injection Attacks Against AC State Estimation Without Network Parameters

Authors: Mingqiu Du, Georgia Pierrou, Xiaozhe Wang, Marthe Kassouf

Abstract: State estimation is a data processing algorithm for converting redundant meter measurements and other information into an estimate of the state of a power system. Relying heavily on meter measurements, state estimation has proven to be vulnerable to cyber attacks. In this paper, a novel targeted false data injection attack (FDIA) model against AC state estimation is proposed. Leveraging on the int… ▽ More State estimation is a data processing algorithm for converting redundant meter measurements and other information into an estimate of the state of a power system. Relying heavily on meter measurements, state estimation has proven to be vulnerable to cyber attacks. In this paper, a novel targeted false data injection attack (FDIA) model against AC state estimation is proposed. Leveraging on the intrinsic load dynamics in ambient conditions and important properties of the Ornstein-Uhlenbeck process, we, from the viewpoint of intruders, design an algorithm to extract power network parameters purely from PMU data, which are further used to construct the FDIA vector. Requiring no network parameters and relying only on limited phasor measurement unit (PMU) data, the proposed FDIA model can target specific states and launch large deviation attacks. Sufficient conditions for the proposed FDIA model are also developed. Various attack vectors and attacking regions are studied in the IEEE 39-bus system, showing that the proposed FDIA method can successfully bypass the bad data detection and launch targeted large deviation attacks with very high probabilities. △ Less

Submitted 25 August, 2021; originally announced August 2021.

arXiv:2107.00279 [pdf, other]

The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021

Authors: Dan Liu, Mengge Du, Xiaoxi Li, Yuchen Hu, Lirong Dai

Abstract: This paper describes USTC-NELSLIP's submissions to the IWSLT2021 Simultaneous Speech Translation task. We proposed a novel simultaneous translation model, Cross Attention Augmented Transducer (CAAT), which extends conventional RNN-T to sequence-to-sequence tasks without monotonic constraints, e.g., simultaneous translation. Experiments on speech-to-text (S2T) and text-to-text (T2T) simultaneous tr… ▽ More This paper describes USTC-NELSLIP's submissions to the IWSLT2021 Simultaneous Speech Translation task. We proposed a novel simultaneous translation model, Cross Attention Augmented Transducer (CAAT), which extends conventional RNN-T to sequence-to-sequence tasks without monotonic constraints, e.g., simultaneous translation. Experiments on speech-to-text (S2T) and text-to-text (T2T) simultaneous translation tasks shows CAAT achieves better quality-latency trade-offs compared to \textit{wait-k}, one of the previous state-of-the-art approaches. Based on CAAT architecture and data augmentation, we build S2T and T2T simultaneous translation systems in this evaluation campaign. Compared to last year's optimal systems, our S2T simultaneous translation system improves by an average of 11.3 BLEU for all latency regimes, and our T2T simultaneous translation system improves by an average of 4.6 BLEU. △ Less

Submitted 9 July, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

arXiv:2103.05114 [pdf, other]

Learning Invariant Representations across Domains and Tasks

Authors: **dong Wang, Wenjie Feng, Chang Liu, Chaohui Yu, Mingxuan Du, Renjun Xu, Tao Qin, Tie-Yan Liu

Abstract: Being expensive and time-consuming to collect massive COVID-19 image samples to train deep classification models, transfer learning is a promising approach by transferring knowledge from the abundant typical pneumonia datasets for COVID-19 image classification. However, negative transfer may deteriorate the performance due to the feature distribution divergence between two datasets and task semant… ▽ More Being expensive and time-consuming to collect massive COVID-19 image samples to train deep classification models, transfer learning is a promising approach by transferring knowledge from the abundant typical pneumonia datasets for COVID-19 image classification. However, negative transfer may deteriorate the performance due to the feature distribution divergence between two datasets and task semantic difference in diagnosing pneumonia and COVID-19 that rely on different characteristics. It is even more challenging when the target dataset has no labels available, i.e., unsupervised task transfer learning. In this paper, we propose a novel Task Adaptation Network (TAN) to solve this unsupervised task transfer problem. In addition to learning transferable features via domain-adversarial training, we propose a novel task semantic adaptor that uses the learning-to-learn strategy to adapt the task semantics. Experiments on three public COVID-19 datasets demonstrate that our proposed method achieves superior performance. Especially on COVID-DA dataset, TAN significantly increases the recall and F1 score by 5.0% and 7.8% compared to recently strong baselines. Moreover, we show that TAN also achieves superior performance on several public domain adaptation benchmarks. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: Technical report, 12 pages

arXiv:2102.11896 [pdf, ps, other]

Targeted False Data Injection Attack against DC State Estimation without Line Parameters

Authors: Mingqiu Du, Georgia Pierrou, Xiaozhe Wang

Abstract: A novel false data injection attack (FDIA) model against DC state estimation is proposed, which requires no network parameters and exploits only limited phasor measurement unit (PMU) data. The proposed FDIA model can target specific states and launch large deviation attacks using estimated line parameters. Sufficient conditions for the proposed method are also presented. Different attack vectors a… ▽ More A novel false data injection attack (FDIA) model against DC state estimation is proposed, which requires no network parameters and exploits only limited phasor measurement unit (PMU) data. The proposed FDIA model can target specific states and launch large deviation attacks using estimated line parameters. Sufficient conditions for the proposed method are also presented. Different attack vectors are studied in the IEEE 39-bus system, showing that the proposed FDIA method can successfully bypass the bad data detection (BDD) with high success rates of up to 95.3%. △ Less

Submitted 23 February, 2021; originally announced February 2021.

arXiv:2102.00869 [pdf, other]

doi 10.1107/S1600577521003507

Using a modified double deep image prior for crosstalk mitigation in multislice ptychography

Authors: Ming Du, Xiao**g Huang, Chris Jacobsen

Abstract: Multislice ptychography is a high-resolution microscopy technique used to image multiple separate axial planes using a single illumination direction. However, multislice ptychography reconstructions are often degraded by crosstalk, where some features on one plane erroneously contribute to the reconstructed image of another plane. Here, we demonstrate the use of a modified "double deep image prior… ▽ More Multislice ptychography is a high-resolution microscopy technique used to image multiple separate axial planes using a single illumination direction. However, multislice ptychography reconstructions are often degraded by crosstalk, where some features on one plane erroneously contribute to the reconstructed image of another plane. Here, we demonstrate the use of a modified "double deep image prior" (DDIP) architecture in mitigating crosstalk artifacts in multislice ptychography. Utilizing the tendency of generative neural networks to produce natural images, a modified DDIP method yielded good results on experimental data. For one of the datasets, we show that using DDIP could remove the need of using additional experimental data, such as from x-ray fluorescence, to suppress the crosstalk. Our method may help x-ray multislice ptychography work for more general experimental scenarios. △ Less

Submitted 29 January, 2021; originally announced February 2021.

Comments: 10 pages, 5 figures

arXiv:2012.12686 [pdf, other]

doi 10.1364/OE.418296

Adorym: A multi-platform generic x-ray image reconstruction framework based on automatic differentiation

Authors: Ming Du, Saugat Kandel, Jun**g Deng, Xiao**g Huang, Arnaud Demortiere, Tuan Tu Nguyen, Remi Tucoulou, Vincent De Andrade, Qiaoling **, Chris Jacobsen

Abstract: We describe and demonstrate an optimization-based x-ray image reconstruction framework called Adorym. Our framework provides a generic forward model, allowing one code framework to be used for a wide range of imaging methods ranging from near-field holography to and fly-scan ptychographic tomography. By using automatic differentiation for optimization, Adorym has the flexibility to refine experime… ▽ More We describe and demonstrate an optimization-based x-ray image reconstruction framework called Adorym. Our framework provides a generic forward model, allowing one code framework to be used for a wide range of imaging methods ranging from near-field holography to and fly-scan ptychographic tomography. By using automatic differentiation for optimization, Adorym has the flexibility to refine experimental parameters including probe positions, multiple hologram alignment, and object tilts. It is written with strong support for parallel processing, allowing large datasets to be processed on high-performance computing systems. We demonstrate its use on several experimental datasets to show improved image quality through parameter refinement. △ Less

Submitted 22 December, 2020; originally announced December 2020.

MSC Class: 78-04

arXiv:1912.03449 [pdf, other]

Fully Dense Neural Network for the Automatic Modulation Recognition

Authors: Miao Du, Qin Yu, Shaomin Fei, Chen Wang, Xiaofeng Gong, Ruisen Luo

Abstract: Nowadays, we mainly use various convolution neural network (CNN) structures to extract features from radio data or spectrogram in AMR. Based on expert experience and spectrograms, they not only increase the difficulty of preprocessing, but also consume a lot of memory. In order to directly use in-phase and quadrature (IQ) data obtained by the receiver and enhance the efficiency of network extracti… ▽ More Nowadays, we mainly use various convolution neural network (CNN) structures to extract features from radio data or spectrogram in AMR. Based on expert experience and spectrograms, they not only increase the difficulty of preprocessing, but also consume a lot of memory. In order to directly use in-phase and quadrature (IQ) data obtained by the receiver and enhance the efficiency of network extraction features to improve the recognition rate of modulation mode, this paper proposes a new network structure called Fully Dense Neural Network (FDNN). This network uses residual blocks to extract features, dense connect to reduce model size, and adds attentions mechanism to recalibrate. Experiments on RML2016.10a show that this network has a higher recognition rate and lower model complexity. And it shows that the FDNN model with dense connections can not only extract features effectively but also greatly reduce model parameters, which also provides a significant contribution for the application of deep learning to the intelligent radio system. △ Less

Submitted 7 December, 2019; originally announced December 2019.

arXiv:1910.10330 [pdf, ps, other]

doi 10.1007/978-3-030-33843-5_15

Stain Style Transfer using Transitive Adversarial Networks

Authors: Shao** Cai, Yuyang Xue3 Qinquan Gao, Min Du, Gang Chen, Hejun Zhang, Tong Tong

Abstract: Digitized pathological diagnosis has been in increasing demand recently. It is well known that color information is critical to the automatic and visual analysis of pathological slides. However, the color variations due to various factors not only have negative impact on pathologist's diagnosis, but also will reduce the robustness of the algorithms. The factors that cause the color differences are… ▽ More Digitized pathological diagnosis has been in increasing demand recently. It is well known that color information is critical to the automatic and visual analysis of pathological slides. However, the color variations due to various factors not only have negative impact on pathologist's diagnosis, but also will reduce the robustness of the algorithms. The factors that cause the color differences are not only in the process of making the slices, but also in the process of digitization. Different strategies have been proposed to alleviate the color variations. Most of such techniques rely on collecting color statistics to perform color matching across images and highly dependent on a reference template slide. Since the pathological slides between hospitals are usually unpaired, these methods do not yield good matching results. In this work, we propose a novel network that we refer to as Transitive Adversarial Networks (TAN) to transfer the color information among slides from different hospitals or centers. It is not necessary for an expert to pick a representative reference slide in the proposed TAN method. We compare the proposed method with the state-of-the-art methods quantitatively and qualitatively. Compared with the state-of-the-art methods, our method yields an improvement of 0.87dB in terms of PSNR, demonstrating the effectiveness of the proposed TAN method in stain style transfer. △ Less

Submitted 22 October, 2019; originally announced October 2019.

Comments: MICCAI 2019 MLMIR Workshop, Oral Paper

arXiv:1909.05090 [pdf, other]

doi 10.1109/ICIP40778.2020.9191174

Learning Enhanced Resolution-wise features for Human Pose Estimation

Authors: Kun Zhang, Peng He, ** Yao, Ge Chen, Rui Wu, Min Du, Huimin Li, Li Fu, Tianyao Zheng

Abstract: Recently, multi-resolution networks (such as Hourglass, CPN, HRNet, etc.) have achieved significant performance on pose estimation by combining feature maps of various resolutions. In this paper, we propose a Resolution-wise Attention Module (RAM) and Gradual Pyramid Refinement (GPR), to learn enhanced resolution-wise feature maps for precise pose estimation. Specifically, RAM learns a group of we… ▽ More Recently, multi-resolution networks (such as Hourglass, CPN, HRNet, etc.) have achieved significant performance on pose estimation by combining feature maps of various resolutions. In this paper, we propose a Resolution-wise Attention Module (RAM) and Gradual Pyramid Refinement (GPR), to learn enhanced resolution-wise feature maps for precise pose estimation. Specifically, RAM learns a group of weights to represent the different importance of feature maps across resolutions, and the GPR gradually merges every two feature maps from low to high resolutions to regress final human keypoint heatmaps. With the enhanced resolution-wise features learnt by CNN, we obtain more accurate human keypoint locations. The efficacies of our proposed methods are demonstrated on MS-COCO dataset, achieving state-of-the-art performance with average precision of 77.7 on COCO val2017 set and 77.0 on test-dev2017 set without using extra human keypoint training dataset. △ Less

Submitted 13 December, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

Comments: Published on ICIP 2020

arXiv:1908.06770 [pdf, other]

doi 10.1107/S1600576720005816

Near, far, wherever you are: simulations on the dose efficiency of holographic and ptychographic coherent imaging

Authors: Ming Du, Doga Gursoy, Chris Jacobsen

Abstract: Different studies in x-ray microscopy have arrived at conflicting conclusions about the dose efficiency of imaging modes involving the recording of intensity distributions in the near (Fresnel regime) or far (Fraunhofer regime) field downstream of a specimen. We present here a numerical study on the dose efficiency of near-field holography (NFH), near-field ptychography (NFP), and far-field ptycho… ▽ More Different studies in x-ray microscopy have arrived at conflicting conclusions about the dose efficiency of imaging modes involving the recording of intensity distributions in the near (Fresnel regime) or far (Fraunhofer regime) field downstream of a specimen. We present here a numerical study on the dose efficiency of near-field holography (NFH), near-field ptychography (NFP), and far-field ptychography (FFP), where ptychography involves multiple overlap** finite-sized illumination positions. Unlike what has been reported for coherent diffraction imaging (CDI), which involves recording a single far-field diffraction pattern, we find that all three methods offer similar image quality when using the same fluence on the specimen, with far-field ptychography offering slightly better spatial resolution and lower mean error. These results support the concept that (if the experiment and image reconstruction are done properly) the sample can be near, or far; wherever you are, photon fluence on the specimen sets one limit to spatial resolution. △ Less

Submitted 11 March, 2020; v1 submitted 16 August, 2019; originally announced August 2019.

Journal ref: Journal of Applied Crystallography. 53, 748-759 (2020)

arXiv:1907.04536 [pdf]

Multi-layer Attention Mechanism for Speech Keyword Recognition

Authors: Ruisen Luo, Tianran Sun, Chen Wang, Miao Du, Zuodong Tang, Kai Zhou, Xiaofeng Gong, Xiaomei Yang

Abstract: As an important part of speech recognition technology, automatic speech keyword recognition has been intensively studied in recent years. Such technology becomes especially pivotal under situations with limited infrastructures and computational resources, such as voice command recognition in vehicles and robot interaction. At present, the mainstream methods in automatic speech keyword recognition… ▽ More As an important part of speech recognition technology, automatic speech keyword recognition has been intensively studied in recent years. Such technology becomes especially pivotal under situations with limited infrastructures and computational resources, such as voice command recognition in vehicles and robot interaction. At present, the mainstream methods in automatic speech keyword recognition are based on long short-term memory (LSTM) networks with attention mechanism. However, due to inevitable information losses for the LSTM layer caused during feature extraction, the calculated attention weights are biased. In this paper, a novel approach, namely Multi-layer Attention Mechanism, is proposed to handle the inaccurate attention weights problem. The key idea is that, in addition to the conventional attention mechanism, information of layers prior to feature extraction and LSTM are introduced into attention weights calculations. Therefore, the attention weights are more accurate because the overall model can have more precise and focused areas. We conduct a comprehensive comparison and analysis on the keyword spotting performances on convolution neural network, bi-directional LSTM cyclic neural network, and cyclic neural network with the proposed attention mechanism on Google Speech Command datasets V2 datasets. Experimental results indicate favorable results for the proposed method and demonstrate the validity of the proposed method. The proposed multi-layer attention methods can be useful for other researches related to object spotting. △ Less

Submitted 10 July, 2019; originally announced July 2019.

arXiv:1907.02244 [pdf, other]

Searching for Apparel Products from Images in the Wild

Authors: Son Tran, Ming Du, Sampath Chanda, R. Manmatha, Cj Taylor

Abstract: In this age of social media, people often look at what others are wearing. In particular, Instagram and Twitter influencers often provide images of themselves wearing different outfits and their followers are often inspired to buy similar clothes.We propose a system to automatically find the closest visually similar clothes in the online Catalog (street-to-shop searching). The problem is challengi… ▽ More In this age of social media, people often look at what others are wearing. In particular, Instagram and Twitter influencers often provide images of themselves wearing different outfits and their followers are often inspired to buy similar clothes.We propose a system to automatically find the closest visually similar clothes in the online Catalog (street-to-shop searching). The problem is challenging since the original images are taken under different pose and lighting conditions. The system initially localizes high-level descriptive regions (top, bottom, wristwear. . . ) using multiple CNN detectors such as YOLO and SSD that are trained specifically for apparel domain. It then classifies these regions into more specific regions such as t-shirts, tunic or dresses. Finally, a feature embedding learned using a multi-task function is recovered for every item and then compared with corresponding items in the online Catalog database and ranked according to distance. We validate our approach component-wise using benchmark datasets and end-to-end using human evaluation. △ Less

Submitted 7 April, 2022; v1 submitted 4 July, 2019; originally announced July 2019.

Comments: KDD2019, AI for Fashion Workshop

arXiv:1905.10433 [pdf, other]

doi 10.1126/sciadv.aay3700

Three dimensions, two microscopes, one code: automatic differentiation for x-ray nanotomography beyond the depth of focus limit

Authors: Ming Du, Youssef S. G. Nashed, Saugat Kandel, Doga Gursoy, Chris Jacobsen

Abstract: Conventional tomographic reconstruction algorithms assume that one has obtained pure projection images, involving no within-specimen diffraction effects nor multiple scattering. Advances in x-ray nanotomography are leading towards the violation of these assumptions, by combining the high penetration power of x-rays which enables thick specimens to be imaged, with improved spatial resolution which… ▽ More Conventional tomographic reconstruction algorithms assume that one has obtained pure projection images, involving no within-specimen diffraction effects nor multiple scattering. Advances in x-ray nanotomography are leading towards the violation of these assumptions, by combining the high penetration power of x-rays which enables thick specimens to be imaged, with improved spatial resolution which decreases the depth of focus of the imaging system. We describe a reconstruction method where multiple scattering and diffraction effects in thick samples are modeled by multislice propagation, and the 3D object function is retrieved through iterative optimization. We show that the same proposed method works for both full-field microscopy, and for coherent scanning techniques like ptychography. Our implementation utilizes the optimization toolbox and the automatic differentiation capability of the open-source deep learning package TensorFlow, which demonstrates a much straightforward way to solve optimization problems in computational imaging, and endows our program great flexibility and portability. △ Less

Submitted 24 May, 2019; originally announced May 2019.

Journal ref: Science Advances. 6, eaay3700 (2020)

arXiv:1805.09846 [pdf, other]

doi 10.1364/JOSAA.35.001871

X-ray tomography of extended objects: a comparison of data acquisition approaches

Authors: Ming Du, Rafael Vescovi, Kamel Fezzaa, Chris Jacobsen, Doga Gursoy

Abstract: The penetration power of x-rays allows one to image large objects. For example, centimeter-sized specimens can be imaged with micron-level resolution using synchrotron sources. In this case, however, the limited beam diameter and detector size preclude the acquisition of the full sample in a single take, necessitating strategies for combining data from multiple regions. Object stitching involves t… ▽ More The penetration power of x-rays allows one to image large objects. For example, centimeter-sized specimens can be imaged with micron-level resolution using synchrotron sources. In this case, however, the limited beam diameter and detector size preclude the acquisition of the full sample in a single take, necessitating strategies for combining data from multiple regions. Object stitching involves the combination of local tomography data from overlap** regions, while projection stitching involves the collection of projections at multiple offset positions from the rotation axis followed by data merging and reconstruction. We compare these two approaches in terms of radiation dose applied to the specimen, and reconstructed image quality. Object stitching involves an easier data alignment problem, and immediate viewing of subregions before the entire dataset has been acquired. Projection stitching is more dose-efficient, and avoids certain artifacts of local tomography; however, it also involves a more difficult data assembly and alignment procedure, in that it is more sensitive to accumulative registration error. △ Less

Submitted 11 July, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

Comments: Under review

Journal ref: Journal of the Optical Society of America A. 35, 1871 (2018)

Showing 1–27 of 27 results for author: Du, M