-
A square cross-section FOV rotational CL (SC-CL) and its analytical reconstruction method
Authors:
Xiang Zou,
Wuliang Shi,
Muge Du,
Yuxiang Xing
Abstract:
Rotational computed laminography (CL) has broad application potential in three-dimensional imaging of plate-like objects, as it only needs x-ray to pass through the tested object in the thickness direction during the imaging process. In this study, a square cross-section FOV rotational CL (SC-CL) was proposed. Then, the FDK-type analytical reconstruction algorithm applicable to the SC-CL was deriv…
▽ More
Rotational computed laminography (CL) has broad application potential in three-dimensional imaging of plate-like objects, as it only needs x-ray to pass through the tested object in the thickness direction during the imaging process. In this study, a square cross-section FOV rotational CL (SC-CL) was proposed. Then, the FDK-type analytical reconstruction algorithm applicable to the SC-CL was derived. On this basis, the proposed method was validated through numerical experiments.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Explanations of Classifiers Enhance Medical Image Segmentation via End-to-end Pre-training
Authors:
Jiamin Chen,
Xuhong Li,
Yanwu Xu,
Mengnan Du,
Haoyi Xiong
Abstract:
Medical image segmentation aims to identify and locate abnormal structures in medical images, such as chest radiographs, using deep neural networks. These networks require a large number of annotated images with fine-grained masks for the regions of interest, making pre-training strategies based on classification datasets essential for sample efficiency. Based on a large-scale medical image classi…
▽ More
Medical image segmentation aims to identify and locate abnormal structures in medical images, such as chest radiographs, using deep neural networks. These networks require a large number of annotated images with fine-grained masks for the regions of interest, making pre-training strategies based on classification datasets essential for sample efficiency. Based on a large-scale medical image classification dataset, our work collects explanations from well-trained classifiers to generate pseudo labels of segmentation tasks. Specifically, we offer a case study on chest radiographs and train image classifiers on the CheXpert dataset to identify 14 pathological observations in radiology. We then use Integrated Gradients (IG) method to distill and boost the explanations obtained from the classifiers, generating massive diagnosis-oriented localization labels (DoLL). These DoLL-annotated images are used for pre-training the model before fine-tuning it for downstream segmentation tasks, including COVID-19 infectious areas, lungs, heart, and clavicles. Our method outperforms other baselines, showcasing significant advantages in model performance and training efficiency across various segmentation settings.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Incremental FastPitch: Chunk-based High Quality Text to Speech
Authors:
Muyang Du,
Chuan Liu,
Junjie Lai
Abstract:
Parallel text-to-speech models have been widely applied for real-time speech synthesis, and they offer more controllability and a much faster synthesis process compared with conventional auto-regressive models. Although parallel models have benefits in many aspects, they become naturally unfit for incremental synthesis due to their fully parallel architecture such as transformer. In this work, we…
▽ More
Parallel text-to-speech models have been widely applied for real-time speech synthesis, and they offer more controllability and a much faster synthesis process compared with conventional auto-regressive models. Although parallel models have benefits in many aspects, they become naturally unfit for incremental synthesis due to their fully parallel architecture such as transformer. In this work, we propose Incremental FastPitch, a novel FastPitch variant capable of incrementally producing high-quality Mel chunks by improving the architecture with chunk-based FFT blocks, training with receptive-field constrained chunk attention masks, and inference with fixed size past model states. Experimental results show that our proposal can produce speech quality comparable to the parallel FastPitch, with a significant lower latency that allows even lower response time for real-time speech applications.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
A knowledge-based data-driven (KBDD) framework for all-day identification of cloud types using satellite remote sensing
Authors:
Longfeng Nie,
Yuntian Chen,
Mengge Du,
Changqi Sun,
Dongxiao Zhang
Abstract:
Cloud types, as a type of meteorological data, are of particular significance for evaluating changes in rainfall, heatwaves, water resources, floods and droughts, food security and vegetation cover, as well as land use. In order to effectively utilize high-resolution geostationary observations, a knowledge-based data-driven (KBDD) framework for all-day identification of cloud types based on spectr…
▽ More
Cloud types, as a type of meteorological data, are of particular significance for evaluating changes in rainfall, heatwaves, water resources, floods and droughts, food security and vegetation cover, as well as land use. In order to effectively utilize high-resolution geostationary observations, a knowledge-based data-driven (KBDD) framework for all-day identification of cloud types based on spectral information from Himawari-8/9 satellite sensors is designed. And a novel, simple and efficient network, named CldNet, is proposed. Compared with widely used semantic segmentation networks, including SegNet, PSPNet, DeepLabV3+, UNet, and ResUnet, our proposed model CldNet with an accuracy of 80.89+-2.18% is state-of-the-art in identifying cloud types and has increased by 32%, 46%, 22%, 2%, and 39%, respectively. With the assistance of auxiliary information (e.g., satellite zenith/azimuth angle, solar zenith/azimuth angle), the accuracy of CldNet-W using visible and near-infrared bands and CldNet-O not using visible and near-infrared bands on the test dataset is 82.23+-2.14% and 73.21+-2.02%, respectively. Meanwhile, the total parameters of CldNet are only 0.46M, making it easy for edge deployment. More importantly, the trained CldNet without any fine-tuning can predict cloud types with higher spatial resolution using satellite spectral data with spatial resolution 0.02°*0.02°, which indicates that CldNet possesses a strong generalization ability. In aggregate, the KBDD framework using CldNet is a highly effective cloud-type identification system capable of providing a high-fidelity, all-day, spatiotemporal cloud-type database for many climate assessment fields.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
Pseudo Label-Guided Data Fusion and Output Consistency for Semi-Supervised Medical Image Segmentation
Authors:
Tao Wang,
Yuanbin Chen,
Xinlin Zhang,
Yuanbo Zhou,
Junlin Lan,
Bizhe Bai,
Tao Tan,
Min Du,
Qinquan Gao,
Tong Tong
Abstract:
Supervised learning algorithms based on Convolutional Neural Networks have become the benchmark for medical image segmentation tasks, but their effectiveness heavily relies on a large amount of labeled data. However, annotating medical image datasets is a laborious and time-consuming process. Inspired by semi-supervised algorithms that use both labeled and unlabeled data for training, we propose t…
▽ More
Supervised learning algorithms based on Convolutional Neural Networks have become the benchmark for medical image segmentation tasks, but their effectiveness heavily relies on a large amount of labeled data. However, annotating medical image datasets is a laborious and time-consuming process. Inspired by semi-supervised algorithms that use both labeled and unlabeled data for training, we propose the PLGDF framework, which builds upon the mean teacher network for segmenting medical images with less annotation. We propose a novel pseudo-label utilization scheme, which combines labeled and unlabeled data to augment the dataset effectively. Additionally, we enforce the consistency between different scales in the decoder module of the segmentation network and propose a loss function suitable for evaluating the consistency. Moreover, we incorporate a sharpening operation on the predicted results, further enhancing the accuracy of the segmentation.
Extensive experiments on three publicly available datasets demonstrate that the PLGDF framework can largely improve performance by incorporating the unlabeled data. Meanwhile, our framework yields superior performance compared to six state-of-the-art semi-supervised learning methods. The codes of this study are available at https://github.com/ortonwang/PLGDF.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
ChinaTelecom System Description to VoxCeleb Speaker Recognition Challenge 2023
Authors:
Mengjie Du,
Xiang Fang,
Jie Li
Abstract:
This technical report describes ChinaTelecom system for Track 1 (closed) of the VoxCeleb2023 Speaker Recognition Challenge (VoxSRC 2023). Our system consists of several ResNet variants trained only on VoxCeleb2, which were fused for better performance later. Score calibration was also applied for each variant and the fused system. The final submission achieved minDCF of 0.1066 and EER of 1.980%.
This technical report describes ChinaTelecom system for Track 1 (closed) of the VoxCeleb2023 Speaker Recognition Challenge (VoxSRC 2023). Our system consists of several ResNet variants trained only on VoxCeleb2, which were fused for better performance later. Score calibration was also applied for each variant and the fused system. The final submission achieved minDCF of 0.1066 and EER of 1.980%.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
PCDAL: A Perturbation Consistency-Driven Active Learning Approach for Medical Image Segmentation and Classification
Authors:
Tao Wang,
Xinlin Zhang,
Yuanbo Zhou,
Junlin Lan,
Tao Tan,
Min Du,
Qinquan Gao,
Tong Tong
Abstract:
In recent years, deep learning has become a breakthrough technique in assisting medical image diagnosis. Supervised learning using convolutional neural networks (CNN) provides state-of-the-art performance and has served as a benchmark for various medical image segmentation and classification. However, supervised learning deeply relies on large-scale annotated data, which is expensive, time-consumi…
▽ More
In recent years, deep learning has become a breakthrough technique in assisting medical image diagnosis. Supervised learning using convolutional neural networks (CNN) provides state-of-the-art performance and has served as a benchmark for various medical image segmentation and classification. However, supervised learning deeply relies on large-scale annotated data, which is expensive, time-consuming, and even impractical to acquire in medical imaging applications. Active Learning (AL) methods have been widely applied in natural image classification tasks to reduce annotation costs by selecting more valuable examples from the unlabeled data pool. However, their application in medical image segmentation tasks is limited, and there is currently no effective and universal AL-based method specifically designed for 3D medical image segmentation. To address this limitation, we propose an AL-based method that can be simultaneously applied to 2D medical image classification, segmentation, and 3D medical image segmentation tasks. We extensively validated our proposed active learning method on three publicly available and challenging medical image datasets, Kvasir Dataset, COVID-19 Infection Segmentation Dataset, and BraTS2019 Dataset. The experimental results demonstrate that our PCDAL can achieve significantly improved performance with fewer annotations in 2D classification and segmentation and 3D segmentation tasks. The codes of this study are available at https://github.com/ortonwang/PCDAL.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
A Data-Driven Polynomial Chaos Expansion-Based Method for Microgrid Ram** Support Capability Assessment and Enhancement
Authors:
Mohan Du,
Xiaozhe Wang
Abstract:
Microgrids (MGs) are regarded as effective solutions to provide ram** support to the main grid during heavy-load periods. Nevertheless, the uncertain renewable energy sources (RES) and electric vehicles (EVs) integrated into an MG may affect the ram** support capability (RSC) of an MG. To address the challenge, this paper develops a data-driven sparse polynomial chaos expansion (DDSPCE)-based…
▽ More
Microgrids (MGs) are regarded as effective solutions to provide ram** support to the main grid during heavy-load periods. Nevertheless, the uncertain renewable energy sources (RES) and electric vehicles (EVs) integrated into an MG may affect the ram** support capability (RSC) of an MG. To address the challenge, this paper develops a data-driven sparse polynomial chaos expansion (DDSPCE)-based method to accurately and efficiently evaluate the hour-by-hour RSC of an MG. The DDSPCE model is further exploited to identify the most influential random inputs, based on which a scheduling method of BESS is developed to enhance the RSC of an MG. Simulation results in the modified IEEE 33-bus MG shows that the proposed method takes less than 3 minutes for evaluating and enhancing the hourly RSC.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
PtyLab.m/py/jl: a cross-platform, open-source inverse modeling toolbox for conventional and Fourier ptychography
Authors:
Lars Loetgering,
Mengqi Du,
Dirk Boonzajer Flaes,
Tomas Aidukas,
Felix Wechsler,
Daniel S. Penagos Molina,
Max Rose,
Antonios Pelekanidis,
Wilhelm Eschen,
Jürgen Hess,
Thomas Wilhein,
Rainer Heintzmann,
Jan Rothhardt,
Stefan Witte
Abstract:
Conventional (CP) and Fourier (FP) ptychography have emerged as versatile quantitative phase imaging techniques. While the main application cases for each technique are different, namely lens-less short wavelength imaging for CP and lens-based visible light imaging for FP, both methods share a common algorithmic ground. CP and FP have in part independently evolved to include experimentally robust…
▽ More
Conventional (CP) and Fourier (FP) ptychography have emerged as versatile quantitative phase imaging techniques. While the main application cases for each technique are different, namely lens-less short wavelength imaging for CP and lens-based visible light imaging for FP, both methods share a common algorithmic ground. CP and FP have in part independently evolved to include experimentally robust forward models and inversion techniques. This separation has resulted in a plethora of algorithmic extensions, some of which have not crossed the boundary from one modality to the other. Here, we present an open source, cross-platform software, called PtyLab, enabling both CP and FP data analysis in a unified framework. With this framework, we aim to facilitate and accelerate cross-pollination between the two techniques. Moreover, the availability in Matlab, Python, and Julia will set a low barrier to enter each field.
△ Less
Submitted 16 January, 2023;
originally announced January 2023.
-
Efficient Incremental Text-to-Speech on GPUs
Authors:
Muyang Du,
Chuan Liu,
Jiaxing Qi,
Junjie Lai
Abstract:
Incremental text-to-speech, also known as streaming TTS, has been increasingly applied to online speech applications that require ultra-low response latency to provide an optimal user experience. However, most of the existing speech synthesis pipelines deployed on GPU are still non-incremental, which uncovers limitations in high-concurrency scenarios, especially when the pipeline is built with end…
▽ More
Incremental text-to-speech, also known as streaming TTS, has been increasingly applied to online speech applications that require ultra-low response latency to provide an optimal user experience. However, most of the existing speech synthesis pipelines deployed on GPU are still non-incremental, which uncovers limitations in high-concurrency scenarios, especially when the pipeline is built with end-to-end neural network models. To address this issue, we present a highly efficient approach to perform real-time incremental TTS on GPUs with Instant Request Pooling and Module-wise Dynamic Batching. Experimental results demonstrate that the proposed method is capable of producing high-quality speech with a first-chunk latency lower than 80ms under 100 QPS on a single NVIDIA A10 GPU and significantly outperforms the non-incremental twin in both concurrency and latency. Our work reveals the effectiveness of high-performance incremental TTS on GPUs.
△ Less
Submitted 5 December, 2022; v1 submitted 25 November, 2022;
originally announced November 2022.
-
Play it by Ear: Learning Skills amidst Occlusion through Audio-Visual Imitation Learning
Authors:
Maximilian Du,
Olivia Y. Lee,
Suraj Nair,
Chelsea Finn
Abstract:
Humans are capable of completing a range of challenging manipulation tasks that require reasoning jointly over modalities such as vision, touch, and sound. Moreover, many such tasks are partially-observed; for example, taking a notebook out of a backpack will lead to visual occlusion and require reasoning over the history of audio or tactile information. While robust tactile sensing can be costly…
▽ More
Humans are capable of completing a range of challenging manipulation tasks that require reasoning jointly over modalities such as vision, touch, and sound. Moreover, many such tasks are partially-observed; for example, taking a notebook out of a backpack will lead to visual occlusion and require reasoning over the history of audio or tactile information. While robust tactile sensing can be costly to capture on robots, microphones near or on a robot's gripper are a cheap and easy way to acquire audio feedback of contact events, which can be a surprisingly valuable data source for perception in the absence of vision. Motivated by the potential for sound to mitigate visual occlusion, we aim to learn a set of challenging partially-observed manipulation tasks from visual and audio inputs. Our proposed system learns these tasks by combining offline imitation learning from a modest number of tele-operated demonstrations and online finetuning using human provided interventions. In a set of simulated tasks, we find that our system benefits from using audio, and that by using online interventions we are able to improve the success rate of offline imitation learning by ~20%. Finally, we find that our system can complete a set of challenging, partially-observed tasks on a Franka Emika Panda robot, like extracting keys from a bag, with a 70% success rate, 50% higher than a policy that does not use audio.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Attention based Broadly Self-guided Network for Low light Image Enhancement
Authors:
Zilong Chen,
Yaling Liang,
Minghui Du
Abstract:
During the past years,deep convolutional neural networks have achieved impressive success in low-light Image Enhancement.Existing deep learning methods mostly enhance the ability of feature extraction by stacking network structures and deepening the depth of the network.which causes more runtime cost on single image.In order to reduce inference time while fully extracting local features and global…
▽ More
During the past years,deep convolutional neural networks have achieved impressive success in low-light Image Enhancement.Existing deep learning methods mostly enhance the ability of feature extraction by stacking network structures and deepening the depth of the network.which causes more runtime cost on single image.In order to reduce inference time while fully extracting local features and global features.Inspired by SGN,we propose a Attention based Broadly self-guided network (ABSGN) for real world low-light image Enhancement.such a broadly strategy is able to handle the noise at different exposures.The proposed network is validated by many mainstream benchmark.Additional experimental results show that the proposed network outperforms most of state-of-the-art low-light image Enhancement solutions.
△ Less
Submitted 15 December, 2021; v1 submitted 12 December, 2021;
originally announced December 2021.
-
Investigation of domain gap problem in several deep-learning-based CT metal artefact reduction methods
Authors:
Muge Du,
Kaichao Liang,
Yinong Liu,
Yuxiang Xing
Abstract:
Metal artefacts in CT images may disrupt image quality and interfere with diagnosis. Recently many deep-learning-based CT metal artefact reduction (MAR) methods have been proposed. Current deep MAR methods may be troubled with domain gap problem, where methods trained on simulated data cannot perform well on practical data. In this work, we experimentally investigate two image-domain supervised me…
▽ More
Metal artefacts in CT images may disrupt image quality and interfere with diagnosis. Recently many deep-learning-based CT metal artefact reduction (MAR) methods have been proposed. Current deep MAR methods may be troubled with domain gap problem, where methods trained on simulated data cannot perform well on practical data. In this work, we experimentally investigate two image-domain supervised methods, two dual-domain supervised methods and two image-domain unsupervised methods on a dental dataset and a torso dataset, to explore whether domain gap problem exists or is overcome. We find that I-DL-MAR and DudoNet are effective for practical data of the torso dataset, indicating the domain gap problem is solved. However, none of the investigated methods perform satisfactorily on practical data of the dental dataset. Based on the experimental results, we further analyze the causes of domain gap problem for each method and dataset, which may be beneficial for improving existing methods or designing new ones. The findings suggest that the domain gap problem in deep MAR methods remains to be addressed.
△ Less
Submitted 25 November, 2021;
originally announced November 2021.
-
Targeted False Data Injection Attacks Against AC State Estimation Without Network Parameters
Authors:
Mingqiu Du,
Georgia Pierrou,
Xiaozhe Wang,
Marthe Kassouf
Abstract:
State estimation is a data processing algorithm for converting redundant meter measurements and other information into an estimate of the state of a power system. Relying heavily on meter measurements, state estimation has proven to be vulnerable to cyber attacks. In this paper, a novel targeted false data injection attack (FDIA) model against AC state estimation is proposed. Leveraging on the int…
▽ More
State estimation is a data processing algorithm for converting redundant meter measurements and other information into an estimate of the state of a power system. Relying heavily on meter measurements, state estimation has proven to be vulnerable to cyber attacks. In this paper, a novel targeted false data injection attack (FDIA) model against AC state estimation is proposed. Leveraging on the intrinsic load dynamics in ambient conditions and important properties of the Ornstein-Uhlenbeck process, we, from the viewpoint of intruders, design an algorithm to extract power network parameters purely from PMU data, which are further used to construct the FDIA vector. Requiring no network parameters and relying only on limited phasor measurement unit (PMU) data, the proposed FDIA model can target specific states and launch large deviation attacks. Sufficient conditions for the proposed FDIA model are also developed. Various attack vectors and attacking regions are studied in the IEEE 39-bus system, showing that the proposed FDIA method can successfully bypass the bad data detection and launch targeted large deviation attacks with very high probabilities.
△ Less
Submitted 25 August, 2021;
originally announced August 2021.
-
The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021
Authors:
Dan Liu,
Mengge Du,
Xiaoxi Li,
Yuchen Hu,
Lirong Dai
Abstract:
This paper describes USTC-NELSLIP's submissions to the IWSLT2021 Simultaneous Speech Translation task. We proposed a novel simultaneous translation model, Cross Attention Augmented Transducer (CAAT), which extends conventional RNN-T to sequence-to-sequence tasks without monotonic constraints, e.g., simultaneous translation. Experiments on speech-to-text (S2T) and text-to-text (T2T) simultaneous tr…
▽ More
This paper describes USTC-NELSLIP's submissions to the IWSLT2021 Simultaneous Speech Translation task. We proposed a novel simultaneous translation model, Cross Attention Augmented Transducer (CAAT), which extends conventional RNN-T to sequence-to-sequence tasks without monotonic constraints, e.g., simultaneous translation. Experiments on speech-to-text (S2T) and text-to-text (T2T) simultaneous translation tasks shows CAAT achieves better quality-latency trade-offs compared to \textit{wait-k}, one of the previous state-of-the-art approaches. Based on CAAT architecture and data augmentation, we build S2T and T2T simultaneous translation systems in this evaluation campaign. Compared to last year's optimal systems, our S2T simultaneous translation system improves by an average of 11.3 BLEU for all latency regimes, and our T2T simultaneous translation system improves by an average of 4.6 BLEU.
△ Less
Submitted 9 July, 2021; v1 submitted 1 July, 2021;
originally announced July 2021.
-
Learning Invariant Representations across Domains and Tasks
Authors:
**dong Wang,
Wenjie Feng,
Chang Liu,
Chaohui Yu,
Mingxuan Du,
Renjun Xu,
Tao Qin,
Tie-Yan Liu
Abstract:
Being expensive and time-consuming to collect massive COVID-19 image samples to train deep classification models, transfer learning is a promising approach by transferring knowledge from the abundant typical pneumonia datasets for COVID-19 image classification. However, negative transfer may deteriorate the performance due to the feature distribution divergence between two datasets and task semant…
▽ More
Being expensive and time-consuming to collect massive COVID-19 image samples to train deep classification models, transfer learning is a promising approach by transferring knowledge from the abundant typical pneumonia datasets for COVID-19 image classification. However, negative transfer may deteriorate the performance due to the feature distribution divergence between two datasets and task semantic difference in diagnosing pneumonia and COVID-19 that rely on different characteristics. It is even more challenging when the target dataset has no labels available, i.e., unsupervised task transfer learning. In this paper, we propose a novel Task Adaptation Network (TAN) to solve this unsupervised task transfer problem. In addition to learning transferable features via domain-adversarial training, we propose a novel task semantic adaptor that uses the learning-to-learn strategy to adapt the task semantics. Experiments on three public COVID-19 datasets demonstrate that our proposed method achieves superior performance. Especially on COVID-DA dataset, TAN significantly increases the recall and F1 score by 5.0% and 7.8% compared to recently strong baselines. Moreover, we show that TAN also achieves superior performance on several public domain adaptation benchmarks.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Targeted False Data Injection Attack against DC State Estimation without Line Parameters
Authors:
Mingqiu Du,
Georgia Pierrou,
Xiaozhe Wang
Abstract:
A novel false data injection attack (FDIA) model against DC state estimation is proposed, which requires no network parameters and exploits only limited phasor measurement unit (PMU) data. The proposed FDIA model can target specific states and launch large deviation attacks using estimated line parameters. Sufficient conditions for the proposed method are also presented. Different attack vectors a…
▽ More
A novel false data injection attack (FDIA) model against DC state estimation is proposed, which requires no network parameters and exploits only limited phasor measurement unit (PMU) data. The proposed FDIA model can target specific states and launch large deviation attacks using estimated line parameters. Sufficient conditions for the proposed method are also presented. Different attack vectors are studied in the IEEE 39-bus system, showing that the proposed FDIA method can successfully bypass the bad data detection (BDD) with high success rates of up to 95.3%.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Using a modified double deep image prior for crosstalk mitigation in multislice ptychography
Authors:
Ming Du,
Xiao**g Huang,
Chris Jacobsen
Abstract:
Multislice ptychography is a high-resolution microscopy technique used to image multiple separate axial planes using a single illumination direction. However, multislice ptychography reconstructions are often degraded by crosstalk, where some features on one plane erroneously contribute to the reconstructed image of another plane. Here, we demonstrate the use of a modified "double deep image prior…
▽ More
Multislice ptychography is a high-resolution microscopy technique used to image multiple separate axial planes using a single illumination direction. However, multislice ptychography reconstructions are often degraded by crosstalk, where some features on one plane erroneously contribute to the reconstructed image of another plane. Here, we demonstrate the use of a modified "double deep image prior" (DDIP) architecture in mitigating crosstalk artifacts in multislice ptychography. Utilizing the tendency of generative neural networks to produce natural images, a modified DDIP method yielded good results on experimental data. For one of the datasets, we show that using DDIP could remove the need of using additional experimental data, such as from x-ray fluorescence, to suppress the crosstalk. Our method may help x-ray multislice ptychography work for more general experimental scenarios.
△ Less
Submitted 29 January, 2021;
originally announced February 2021.
-
Adorym: A multi-platform generic x-ray image reconstruction framework based on automatic differentiation
Authors:
Ming Du,
Saugat Kandel,
Jun**g Deng,
Xiao**g Huang,
Arnaud Demortiere,
Tuan Tu Nguyen,
Remi Tucoulou,
Vincent De Andrade,
Qiaoling **,
Chris Jacobsen
Abstract:
We describe and demonstrate an optimization-based x-ray image reconstruction framework called Adorym. Our framework provides a generic forward model, allowing one code framework to be used for a wide range of imaging methods ranging from near-field holography to and fly-scan ptychographic tomography. By using automatic differentiation for optimization, Adorym has the flexibility to refine experime…
▽ More
We describe and demonstrate an optimization-based x-ray image reconstruction framework called Adorym. Our framework provides a generic forward model, allowing one code framework to be used for a wide range of imaging methods ranging from near-field holography to and fly-scan ptychographic tomography. By using automatic differentiation for optimization, Adorym has the flexibility to refine experimental parameters including probe positions, multiple hologram alignment, and object tilts. It is written with strong support for parallel processing, allowing large datasets to be processed on high-performance computing systems. We demonstrate its use on several experimental datasets to show improved image quality through parameter refinement.
△ Less
Submitted 22 December, 2020;
originally announced December 2020.
-
Fully Dense Neural Network for the Automatic Modulation Recognition
Authors:
Miao Du,
Qin Yu,
Shaomin Fei,
Chen Wang,
Xiaofeng Gong,
Ruisen Luo
Abstract:
Nowadays, we mainly use various convolution neural network (CNN) structures to extract features from radio data or spectrogram in AMR. Based on expert experience and spectrograms, they not only increase the difficulty of preprocessing, but also consume a lot of memory. In order to directly use in-phase and quadrature (IQ) data obtained by the receiver and enhance the efficiency of network extracti…
▽ More
Nowadays, we mainly use various convolution neural network (CNN) structures to extract features from radio data or spectrogram in AMR. Based on expert experience and spectrograms, they not only increase the difficulty of preprocessing, but also consume a lot of memory. In order to directly use in-phase and quadrature (IQ) data obtained by the receiver and enhance the efficiency of network extraction features to improve the recognition rate of modulation mode, this paper proposes a new network structure called Fully Dense Neural Network (FDNN). This network uses residual blocks to extract features, dense connect to reduce model size, and adds attentions mechanism to recalibrate. Experiments on RML2016.10a show that this network has a higher recognition rate and lower model complexity. And it shows that the FDNN model with dense connections can not only extract features effectively but also greatly reduce model parameters, which also provides a significant contribution for the application of deep learning to the intelligent radio system.
△ Less
Submitted 7 December, 2019;
originally announced December 2019.
-
Stain Style Transfer using Transitive Adversarial Networks
Authors:
Shao** Cai,
Yuyang Xue3 Qinquan Gao,
Min Du,
Gang Chen,
Hejun Zhang,
Tong Tong
Abstract:
Digitized pathological diagnosis has been in increasing demand recently. It is well known that color information is critical to the automatic and visual analysis of pathological slides. However, the color variations due to various factors not only have negative impact on pathologist's diagnosis, but also will reduce the robustness of the algorithms. The factors that cause the color differences are…
▽ More
Digitized pathological diagnosis has been in increasing demand recently. It is well known that color information is critical to the automatic and visual analysis of pathological slides. However, the color variations due to various factors not only have negative impact on pathologist's diagnosis, but also will reduce the robustness of the algorithms. The factors that cause the color differences are not only in the process of making the slices, but also in the process of digitization. Different strategies have been proposed to alleviate the color variations. Most of such techniques rely on collecting color statistics to perform color matching across images and highly dependent on a reference template slide. Since the pathological slides between hospitals are usually unpaired, these methods do not yield good matching results. In this work, we propose a novel network that we refer to as Transitive Adversarial Networks (TAN) to transfer the color information among slides from different hospitals or centers. It is not necessary for an expert to pick a representative reference slide in the proposed TAN method. We compare the proposed method with the state-of-the-art methods quantitatively and qualitatively. Compared with the state-of-the-art methods, our method yields an improvement of 0.87dB in terms of PSNR, demonstrating the effectiveness of the proposed TAN method in stain style transfer.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
Learning Enhanced Resolution-wise features for Human Pose Estimation
Authors:
Kun Zhang,
Peng He,
** Yao,
Ge Chen,
Rui Wu,
Min Du,
Huimin Li,
Li Fu,
Tianyao Zheng
Abstract:
Recently, multi-resolution networks (such as Hourglass, CPN, HRNet, etc.) have achieved significant performance on pose estimation by combining feature maps of various resolutions. In this paper, we propose a Resolution-wise Attention Module (RAM) and Gradual Pyramid Refinement (GPR), to learn enhanced resolution-wise feature maps for precise pose estimation. Specifically, RAM learns a group of we…
▽ More
Recently, multi-resolution networks (such as Hourglass, CPN, HRNet, etc.) have achieved significant performance on pose estimation by combining feature maps of various resolutions. In this paper, we propose a Resolution-wise Attention Module (RAM) and Gradual Pyramid Refinement (GPR), to learn enhanced resolution-wise feature maps for precise pose estimation. Specifically, RAM learns a group of weights to represent the different importance of feature maps across resolutions, and the GPR gradually merges every two feature maps from low to high resolutions to regress final human keypoint heatmaps. With the enhanced resolution-wise features learnt by CNN, we obtain more accurate human keypoint locations. The efficacies of our proposed methods are demonstrated on MS-COCO dataset, achieving state-of-the-art performance with average precision of 77.7 on COCO val2017 set and 77.0 on test-dev2017 set without using extra human keypoint training dataset.
△ Less
Submitted 13 December, 2020; v1 submitted 11 September, 2019;
originally announced September 2019.
-
Near, far, wherever you are: simulations on the dose efficiency of holographic and ptychographic coherent imaging
Authors:
Ming Du,
Doga Gursoy,
Chris Jacobsen
Abstract:
Different studies in x-ray microscopy have arrived at conflicting conclusions about the dose efficiency of imaging modes involving the recording of intensity distributions in the near (Fresnel regime) or far (Fraunhofer regime) field downstream of a specimen. We present here a numerical study on the dose efficiency of near-field holography (NFH), near-field ptychography (NFP), and far-field ptycho…
▽ More
Different studies in x-ray microscopy have arrived at conflicting conclusions about the dose efficiency of imaging modes involving the recording of intensity distributions in the near (Fresnel regime) or far (Fraunhofer regime) field downstream of a specimen. We present here a numerical study on the dose efficiency of near-field holography (NFH), near-field ptychography (NFP), and far-field ptychography (FFP), where ptychography involves multiple overlap** finite-sized illumination positions. Unlike what has been reported for coherent diffraction imaging (CDI), which involves recording a single far-field diffraction pattern, we find that all three methods offer similar image quality when using the same fluence on the specimen, with far-field ptychography offering slightly better spatial resolution and lower mean error. These results support the concept that (if the experiment and image reconstruction are done properly) the sample can be near, or far; wherever you are, photon fluence on the specimen sets one limit to spatial resolution.
△ Less
Submitted 11 March, 2020; v1 submitted 16 August, 2019;
originally announced August 2019.
-
Multi-layer Attention Mechanism for Speech Keyword Recognition
Authors:
Ruisen Luo,
Tianran Sun,
Chen Wang,
Miao Du,
Zuodong Tang,
Kai Zhou,
Xiaofeng Gong,
Xiaomei Yang
Abstract:
As an important part of speech recognition technology, automatic speech keyword recognition has been intensively studied in recent years. Such technology becomes especially pivotal under situations with limited infrastructures and computational resources, such as voice command recognition in vehicles and robot interaction. At present, the mainstream methods in automatic speech keyword recognition…
▽ More
As an important part of speech recognition technology, automatic speech keyword recognition has been intensively studied in recent years. Such technology becomes especially pivotal under situations with limited infrastructures and computational resources, such as voice command recognition in vehicles and robot interaction. At present, the mainstream methods in automatic speech keyword recognition are based on long short-term memory (LSTM) networks with attention mechanism. However, due to inevitable information losses for the LSTM layer caused during feature extraction, the calculated attention weights are biased. In this paper, a novel approach, namely Multi-layer Attention Mechanism, is proposed to handle the inaccurate attention weights problem. The key idea is that, in addition to the conventional attention mechanism, information of layers prior to feature extraction and LSTM are introduced into attention weights calculations. Therefore, the attention weights are more accurate because the overall model can have more precise and focused areas. We conduct a comprehensive comparison and analysis on the keyword spotting performances on convolution neural network, bi-directional LSTM cyclic neural network, and cyclic neural network with the proposed attention mechanism on Google Speech Command datasets V2 datasets. Experimental results indicate favorable results for the proposed method and demonstrate the validity of the proposed method. The proposed multi-layer attention methods can be useful for other researches related to object spotting.
△ Less
Submitted 10 July, 2019;
originally announced July 2019.
-
Searching for Apparel Products from Images in the Wild
Authors:
Son Tran,
Ming Du,
Sampath Chanda,
R. Manmatha,
Cj Taylor
Abstract:
In this age of social media, people often look at what others are wearing. In particular, Instagram and Twitter influencers often provide images of themselves wearing different outfits and their followers are often inspired to buy similar clothes.We propose a system to automatically find the closest visually similar clothes in the online Catalog (street-to-shop searching). The problem is challengi…
▽ More
In this age of social media, people often look at what others are wearing. In particular, Instagram and Twitter influencers often provide images of themselves wearing different outfits and their followers are often inspired to buy similar clothes.We propose a system to automatically find the closest visually similar clothes in the online Catalog (street-to-shop searching). The problem is challenging since the original images are taken under different pose and lighting conditions. The system initially localizes high-level descriptive regions (top, bottom, wristwear. . . ) using multiple CNN detectors such as YOLO and SSD that are trained specifically for apparel domain. It then classifies these regions into more specific regions such as t-shirts, tunic or dresses. Finally, a feature embedding learned using a multi-task function is recovered for every item and then compared with corresponding items in the online Catalog database and ranked according to distance. We validate our approach component-wise using benchmark datasets and end-to-end using human evaluation.
△ Less
Submitted 7 April, 2022; v1 submitted 4 July, 2019;
originally announced July 2019.
-
Three dimensions, two microscopes, one code: automatic differentiation for x-ray nanotomography beyond the depth of focus limit
Authors:
Ming Du,
Youssef S. G. Nashed,
Saugat Kandel,
Doga Gursoy,
Chris Jacobsen
Abstract:
Conventional tomographic reconstruction algorithms assume that one has obtained pure projection images, involving no within-specimen diffraction effects nor multiple scattering. Advances in x-ray nanotomography are leading towards the violation of these assumptions, by combining the high penetration power of x-rays which enables thick specimens to be imaged, with improved spatial resolution which…
▽ More
Conventional tomographic reconstruction algorithms assume that one has obtained pure projection images, involving no within-specimen diffraction effects nor multiple scattering. Advances in x-ray nanotomography are leading towards the violation of these assumptions, by combining the high penetration power of x-rays which enables thick specimens to be imaged, with improved spatial resolution which decreases the depth of focus of the imaging system. We describe a reconstruction method where multiple scattering and diffraction effects in thick samples are modeled by multislice propagation, and the 3D object function is retrieved through iterative optimization. We show that the same proposed method works for both full-field microscopy, and for coherent scanning techniques like ptychography. Our implementation utilizes the optimization toolbox and the automatic differentiation capability of the open-source deep learning package TensorFlow, which demonstrates a much straightforward way to solve optimization problems in computational imaging, and endows our program great flexibility and portability.
△ Less
Submitted 24 May, 2019;
originally announced May 2019.
-
X-ray tomography of extended objects: a comparison of data acquisition approaches
Authors:
Ming Du,
Rafael Vescovi,
Kamel Fezzaa,
Chris Jacobsen,
Doga Gursoy
Abstract:
The penetration power of x-rays allows one to image large objects. For example, centimeter-sized specimens can be imaged with micron-level resolution using synchrotron sources. In this case, however, the limited beam diameter and detector size preclude the acquisition of the full sample in a single take, necessitating strategies for combining data from multiple regions. Object stitching involves t…
▽ More
The penetration power of x-rays allows one to image large objects. For example, centimeter-sized specimens can be imaged with micron-level resolution using synchrotron sources. In this case, however, the limited beam diameter and detector size preclude the acquisition of the full sample in a single take, necessitating strategies for combining data from multiple regions. Object stitching involves the combination of local tomography data from overlap** regions, while projection stitching involves the collection of projections at multiple offset positions from the rotation axis followed by data merging and reconstruction. We compare these two approaches in terms of radiation dose applied to the specimen, and reconstructed image quality. Object stitching involves an easier data alignment problem, and immediate viewing of subregions before the entire dataset has been acquired. Projection stitching is more dose-efficient, and avoids certain artifacts of local tomography; however, it also involves a more difficult data assembly and alignment procedure, in that it is more sensitive to accumulative registration error.
△ Less
Submitted 11 July, 2018; v1 submitted 24 May, 2018;
originally announced May 2018.