Search | arXiv e-print repository

High-Linearity PAM-4 Silicon Micro-ring Transmitter Architecture with Electronic-Photonic Hybrid DAC

Authors: Zheng Li, Chengyang Lv, Min Tan

Abstract: This paper presents a high linearity PAM-4 transmitter (TX) architecture, consisting of a three-segment micro-ring modulator (MRM) and a matched CMOS driver. This architecture can drive a high-linearity 4-level pulse amplitude (PAM-4) modulation signal, thereby extending the tunable operating wavelength range for achieving linear PAM-4 output. We use the three-segment MRM to increase design flexib… ▽ More This paper presents a high linearity PAM-4 transmitter (TX) architecture, consisting of a three-segment micro-ring modulator (MRM) and a matched CMOS driver. This architecture can drive a high-linearity 4-level pulse amplitude (PAM-4) modulation signal, thereby extending the tunable operating wavelength range for achieving linear PAM-4 output. We use the three-segment MRM to increase design flexibility so that the linearity of PAM-4 output can be optimized with another degree of freedom. Each phase shift region is directly driven by the independently amplitude-tunable Non-Return-to-Zero (NRZ) signal. The three-segment modulator can achieve an adjustable wavelength range of approximately 0.037 nm within the high linearity PAM-4 output limit when the driving voltage varies from 1.5 V to 3 V, simultaneously achieving an adjustable insertion loss (IL) range of approximately 2 dB, roughly four times that of the two-segment MRM with a similar design. The driver circuit with adjustable driving voltage is co-designed to adjust the eye height to improve PAM-4 linearity. In this article, the high linearity PAM-4 silicon micro-ring architecture can be employed in optical transmitters to adjust PAM-4 eye-opening size and maximize the PAM-4 output linearity, thus offering the potential for high-performance and low-power overhead transmitters. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: 14 pages, 11 figures

arXiv:2401.03173 [pdf, other]

doi 10.4108/eetcasa.v10i1.4681

UGGNet: Bridging U-Net and VGG for Advanced Breast Cancer Diagnosis

Authors: Tran Cao Minh, Nguyen Kim Quoc, Phan Cong Vinh, Dang Nhu Phu, Vuong Xuan Chi, Ha Minh Tan

Abstract: In the field of medical imaging, breast ultrasound has emerged as a crucial diagnostic tool for early detection of breast cancer. However, the accuracy of diagnosing the location of the affected area and the extent of the disease depends on the experience of the physician. In this paper, we propose a novel model called UGGNet, combining the power of the U-Net and VGG architectures to enhance the p… ▽ More In the field of medical imaging, breast ultrasound has emerged as a crucial diagnostic tool for early detection of breast cancer. However, the accuracy of diagnosing the location of the affected area and the extent of the disease depends on the experience of the physician. In this paper, we propose a novel model called UGGNet, combining the power of the U-Net and VGG architectures to enhance the performance of breast ultrasound image analysis. The U-Net component of the model helps accurately segment the lesions, while the VGG component utilizes deep convolutional layers to extract features. The fusion of these two architectures in UGGNet aims to optimize both segmentation and feature representation, providing a comprehensive solution for accurate diagnosis in breast ultrasound images. Experimental results have demonstrated that the UGGNet model achieves a notable accuracy of 78.2% on the "Breast Ultrasound Images Dataset." △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: Submitted to the journal "EAI Endorsed Transactions on Context-aware Systems and Applications" ,2 images, 5 data tables

Journal ref: EAI Endorsed Transactions on Contex-aware Systems and Applications, 10(1), 2024

arXiv:2310.02171 [pdf]

Deep learning-based image super-resolution of a novel end-expandable optical fiber probe for application in esophageal cancer diagnostics

Authors: Xiaohui Zhang, Mimi Tan, Mansour Nabil, Richa Shukla, Shaleen Vasavada, Sharmila Anandasabapathy, Mark A. Anastasio, Elena Petrova

Abstract: Significance: Endoscopic screening for esophageal cancer may enable early cancer diagnosis and treatment. While optical microendoscopic technology has shown promise in improving specificity, the limited field of view (<1 mm) significantly reduces the ability to survey large areas efficiently in esophageal cancer screening. Aim: To improve the efficiency of endoscopic screening, we proposed a novel… ▽ More Significance: Endoscopic screening for esophageal cancer may enable early cancer diagnosis and treatment. While optical microendoscopic technology has shown promise in improving specificity, the limited field of view (<1 mm) significantly reduces the ability to survey large areas efficiently in esophageal cancer screening. Aim: To improve the efficiency of endoscopic screening, we proposed a novel end-expandable endoscopic optical fiber probe for larger field of visualization and employed a deep learning-based image super-resolution (DL-SR) method to overcome the issue of limited sampling capability. Approach: To demonstrate feasibility of the end-expandable optical fiber probe, DL-SR was applied on simulated low-resolution (LR) microendoscopic images to generate super-resolved (SR) ones. Varying the degradation model of image data acquisition, we identified the optimal parameters for optical fiber probe prototy**. The proposed screening method was validated with a human pathology reading study. Results: For various degradation parameters considered, the DL-SR method demonstrated different levels of improvement of traditional measures of image quality. The endoscopist interpretations of the SR images were comparable to those performed on the high-resolution ones. Conclusions: This work suggests avenues for development of DL-SR-enabled end-expandable optical fiber probes to improve high-yield esophageal cancer screening. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.07155 [pdf]

doi 10.1109/JLT.2023.3314526

Maximizing the performance for microcomb based microwave photonic transversal signal processors

Authors: Yang Sun, Jiayang Wu, Yang Li, Xingyuan Xu, Guanghui Ren, Mengxi Tan, Sai Tak Chu, Brent E. Little, Roberto Morandotti, Arnan Mitchell, David J. Moss

Abstract: Microwave photonic (MWP) transversal signal processors offer a compelling solution for realizing versatile high-speed information processing by combining the advantages of reconfigurable electrical digital signal processing and high-bandwidth photonic processing. With the capability of generating a number of discrete wavelengths from micro-scale resonators, optical microcombs are powerful multi-wa… ▽ More Microwave photonic (MWP) transversal signal processors offer a compelling solution for realizing versatile high-speed information processing by combining the advantages of reconfigurable electrical digital signal processing and high-bandwidth photonic processing. With the capability of generating a number of discrete wavelengths from micro-scale resonators, optical microcombs are powerful multi-wavelength sources for implementing MWP transversal signal processors with significantly reduced size, power consumption, and complexity. By using microcomb-based MWP transversal signal processors, a diverse range of signal processing functions have been demonstrated recently. In this paper, we provide a detailed analysis for the processing inaccuracy that is induced by the imperfect response of experimental components. First, we investigate the errors arising from different sources including imperfections in the microcombs, the chirp of electro-optic modulators, chromatic dispersion of the dispersive module, sha** errors of the optical spectral shapers, and noise of the photodetector. Next, we provide a global picture quantifying the impact of different error sources on the overall system performance. Finally, we introduce feedback control to compensate the errors caused by experimental imperfections and achieve significantly improved accuracy. These results provide a guide for optimizing the accuracy of microcomb-based MWP transversal signal processors. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 15 pages, 12 figures, 60 references

Journal ref: Journal of Lightwave Technology Volume 41, (2023)

arXiv:2307.10316 [pdf, other]

CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation

Authors: Lizhao Liu, Zhuangwei Zhuang, Shangxin Huang, Xunlong Xiao, Tianhang Xiang, Cen Chen, **gdong Wang, Mingkui Tan

Abstract: We study the task of weakly-supervised point cloud semantic segmentation with sparse annotations (e.g., less than 0.1% points are labeled), aiming to reduce the expensive cost of dense annotations. Unfortunately, with extremely sparse annotated points, it is very difficult to extract both contextual and object information for scene understanding such as semantic segmentation. Motivated by masked m… ▽ More We study the task of weakly-supervised point cloud semantic segmentation with sparse annotations (e.g., less than 0.1% points are labeled), aiming to reduce the expensive cost of dense annotations. Unfortunately, with extremely sparse annotated points, it is very difficult to extract both contextual and object information for scene understanding such as semantic segmentation. Motivated by masked modeling (e.g., MAE) in image and video representation learning, we seek to endow the power of masked modeling to learn contextual information from sparsely-annotated points. However, directly applying MAE to 3D point clouds with sparse annotations may fail to work. First, it is nontrivial to effectively mask out the informative visual context from 3D point clouds. Second, how to fully exploit the sparse annotations for context modeling remains an open question. In this paper, we propose a simple yet effective Contextual Point Cloud Modeling (CPCM) method that consists of two parts: a region-wise masking (RegionMask) strategy and a contextual masked training (CMT) method. Specifically, RegionMask masks the point cloud continuously in geometric space to construct a meaningful masked prediction task for subsequent context learning. CMT disentangles the learning of supervised segmentation and unsupervised masked context prediction for effectively learning the very limited labeled points and mass unlabeled points, respectively. Extensive experiments on the widely-tested ScanNet V2 and S3DIS benchmarks demonstrate the superiority of CPCM over the state-of-the-art. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: Accepted by ICCV 2023

arXiv:2304.00257 [pdf]

RADIFUSION: A multi-radiomics deep learning based breast cancer risk prediction model using sequential mammographic images with image attention and bilateral asymmetry refinement

Authors: Hong Hui Yeoh, Andrea Liew, Raphaël Phan, Fredrik Strand, Kartini Rahmat, Tuong Linh Nguyen, John L. Hopper, Maxine Tan

Abstract: Breast cancer is a significant public health concern and early detection is critical for triaging high risk patients. Sequential screening mammograms can provide important spatiotemporal information about changes in breast tissue over time. In this study, we propose a deep learning architecture called RADIFUSION that utilizes sequential mammograms and incorporates a linear image attention mechanis… ▽ More Breast cancer is a significant public health concern and early detection is critical for triaging high risk patients. Sequential screening mammograms can provide important spatiotemporal information about changes in breast tissue over time. In this study, we propose a deep learning architecture called RADIFUSION that utilizes sequential mammograms and incorporates a linear image attention mechanism, radiomic features, a new gating mechanism to combine different mammographic views, and bilateral asymmetry-based finetuning for breast cancer risk assessment. We evaluate our model on a screening dataset called Cohort of Screen-Aged Women (CSAW) dataset. Based on results obtained on the independent testing set consisting of 1,749 women, our approach achieved superior performance compared to other state-of-the-art models with area under the receiver operating characteristic curves (AUCs) of 0.905, 0.872 and 0.866 in the three respective metrics of 1-year AUC, 2-year AUC and > 2-year AUC. Our study highlights the importance of incorporating various deep learning mechanisms, such as image attention, radiomic features, gating mechanism, and bilateral asymmetry-based fine-tuning, to improve the accuracy of breast cancer risk assessment. We also demonstrate that our model's performance was enhanced by leveraging spatiotemporal information from sequential mammograms. Our findings suggest that RADIFUSION can provide clinicians with a powerful tool for breast cancer risk assessment. △ Less

Submitted 2 June, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

Comments: v2

arXiv:2210.02445 [pdf, other]

Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive Networks

Authors: Xiaofeng Lei, Shaohua Li, Xinxing Xu, Huazhu Fu, Yong Liu, Yih-Chung Tham, Yangqin Feng, Mingrui Tan, Yanyu Xu, Jocelyn Hui Lin Goh, Rick Siow Mong Goh, Ching-Yu Cheng

Abstract: Localizing anatomical landmarks are important tasks in medical image analysis. However, the landmarks to be localized often lack prominent visual features. Their locations are elusive and easily confused with the background, and thus precise localization highly depends on the context formed by their surrounding areas. In addition, the required precision is usually higher than segmentation and obje… ▽ More Localizing anatomical landmarks are important tasks in medical image analysis. However, the landmarks to be localized often lack prominent visual features. Their locations are elusive and easily confused with the background, and thus precise localization highly depends on the context formed by their surrounding areas. In addition, the required precision is usually higher than segmentation and object detection tasks. Therefore, localization has its unique challenges different from segmentation or detection. In this paper, we propose a zoom-in attentive network (ZIAN) for anatomical landmark localization in ocular images. First, a coarse-to-fine, or "zoom-in" strategy is utilized to learn the contextualized features in different scales. Then, an attentive fusion module is adopted to aggregate multi-scale features, which consists of 1) a co-attention network with a multiple regions-of-interest (ROIs) scheme that learns complementary features from the multiple ROIs, 2) an attention-based fusion module which integrates the multi-ROIs features and non-ROI features. We evaluated ZIAN on two open challenge tasks, i.e., the fovea localization in fundus images and scleral spur localization in AS-OCT images. Experiments show that ZIAN achieves promising performances and outperforms state-of-the-art localization methods. The source code and trained models of ZIAN are available at https://github.com/leixiaofeng-astar/OMIA9-ZIAN. △ Less

Submitted 22 December, 2022; v1 submitted 25 September, 2022; originally announced October 2022.

arXiv:2208.11184 [pdf, other]

AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

Authors: Ren Yang, Radu Timofte, Xin Li, Qi Zhang, Lin Zhang, Fanglong Liu, Dongliang He, Fu li, He Zheng, Weihang Yuan, Pavel Ostyakov, Dmitry Vyal, Magauiya Zhussip, Xueyi Zou, Youliang Yan, Lei Li, **gzhu Tang, Ming Chen, Shijie Zhao, Yu Zhu, Xiaoran Qin, Chenghua Li, Cong Leng, Jian Cheng, Claudio Rota , et al. (28 additional authors not shown)

Abstract: This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 3… ▽ More This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 365 videos, including the LDV 2.0 dataset (335 videos) and 30 additional videos. In this challenge, there are 12 teams and 2 teams that submitted the final results to Track 1 and Track 2, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video. The proposed LDV 3.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge is at https://github.com/RenYang-home/AIM22_CompressSR. △ Less

Submitted 25 August, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Camera-ready version

arXiv:2208.03016 [pdf, other]

Calibrate the inter-observer segmentation uncertainty via diagnosis-first principle

Authors: Junde Wu, Huihui Fang, Hoayi Xiong, Lixin Duan, Mingkui Tan, Weihua Yang, Huiying Liu, Yanwu Xu

Abstract: On the medical images, many of the tissues/lesions may be ambiguous. That is why the medical segmentation is typically annotated by a group of clinical experts to mitigate the personal bias. However, this clinical routine also brings new challenges to the application of machine learning algorithms. Without a definite ground-truth, it will be difficult to train and evaluate the deep learning models… ▽ More On the medical images, many of the tissues/lesions may be ambiguous. That is why the medical segmentation is typically annotated by a group of clinical experts to mitigate the personal bias. However, this clinical routine also brings new challenges to the application of machine learning algorithms. Without a definite ground-truth, it will be difficult to train and evaluate the deep learning models. When the annotations are collected from different graders, a common choice is majority vote. However such a strategy ignores the difference between the grader expertness. In this paper, we consider the task of predicting the segmentation with the calibrated inter-observer uncertainty. We note that in clinical practice, the medical image segmentation is usually used to assist the disease diagnosis. Inspired by this observation, we propose diagnosis-first principle, which is to take disease diagnosis as the criterion to calibrate the inter-observer segmentation uncertainty. Following this idea, a framework named Diagnosis First segmentation Framework (DiFF) is proposed to estimate diagnosis-first segmentation from the raw images.Specifically, DiFF will first learn to fuse the multi-rater segmentation labels to a single ground-truth which could maximize the disease diagnosis performance. We dubbed the fused ground-truth as Diagnosis First Ground-truth (DF-GT).Then, we further propose Take and Give Modelto segment DF-GT from the raw image. We verify the effectiveness of DiFF on three different medical segmentation tasks: OD/OC segmentation on fundus images, thyroid nodule segmentation on ultrasound images, and skin lesion segmentation on dermoscopic images. Experimental results show that the proposed DiFF is able to significantly facilitate the corresponding disease diagnosis, which outperforms previous state-of-the-art multi-rater learning methods. △ Less

Submitted 5 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2202.06505

arXiv:2204.00630 [pdf, other]

Extremely Low-light Image Enhancement with Scene Text Restoration

Authors: Pohao Hsu, Che-Tsung Lin, Chun Chet Ng, Jie-Long Kew, Mei Yih Tan, Shang-Hong Lai, Chee Seng Chan, Christopher Zach

Abstract: Deep learning-based methods have made impressive progress in enhancing extremely low-light images - the image quality of the reconstructed images has generally improved. However, we found out that most of these methods could not sufficiently recover the image details, for instance, the texts in the scene. In this paper, a novel image enhancement framework is proposed to precisely restore the scene… ▽ More Deep learning-based methods have made impressive progress in enhancing extremely low-light images - the image quality of the reconstructed images has generally improved. However, we found out that most of these methods could not sufficiently recover the image details, for instance, the texts in the scene. In this paper, a novel image enhancement framework is proposed to precisely restore the scene texts, as well as the overall quality of the image simultaneously under extremely low-light images conditions. Mainly, we employed a self-regularised attention map, an edge map, and a novel text detection loss. In addition, leveraging synthetic low-light images is beneficial for image enhancement on the genuine ones in terms of text detection. The quantitative and qualitative experimental results have shown that the proposed model outperforms state-of-the-art methods in image restoration, text detection, and text spotting on See In the Dark and ICDAR15 datasets. △ Less

Submitted 1 April, 2022; originally announced April 2022.

arXiv:2111.12890 [pdf, other]

V2C: Visual Voice Cloning

Authors: Qi Chen, Yuanqing Li, Yuankai Qi, Jiaqiu Zhou, Mingkui Tan, Qi Wu

Abstract: Existing Voice Cloning (VC) tasks aim to convert a paragraph text to a speech with desired voice specified by a reference audio. This has significantly boosted the development of artificial speech applications. However, there also exist many scenarios that cannot be well reflected by these VC tasks, such as movie dubbing, which requires the speech to be with emotions consistent with the movie plot… ▽ More Existing Voice Cloning (VC) tasks aim to convert a paragraph text to a speech with desired voice specified by a reference audio. This has significantly boosted the development of artificial speech applications. However, there also exist many scenarios that cannot be well reflected by these VC tasks, such as movie dubbing, which requires the speech to be with emotions consistent with the movie plots. To fill this gap, in this work we propose a new task named Visual Voice Cloning (V2C), which seeks to convert a paragraph of text to a speech with both desired voice specified by a reference audio and desired emotion specified by a reference video. To facilitate research in this field, we construct a dataset, V2C-Animation, and propose a strong baseline based on existing state-of-the-art (SoTA) VC techniques. Our dataset contains 10,217 animated movie clips covering a large variety of genres (e.g., Comedy, Fantasy) and emotions (e.g., happy, sad). We further design a set of evaluation metrics, named MCD-DTW-SL, which help evaluate the similarity between ground-truth speeches and the synthesised ones. Extensive experimental results show that even SoTA VC methods cannot generate satisfying speeches for our V2C task. We hope the proposed new task together with the constructed dataset and evaluation metric will facilitate the research in the field of voice cloning and the broader vision-and-language community. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: 15 pages, 14 figures

arXiv:2110.08812 [pdf]

Rheumatoid Arthritis: Automated Scoring of Radiographic Joint Damage

Authors: Yan Ming Tan, Raphael Quek Hao Chong, Carol Anne Hargreaves

Abstract: Rheumatoid arthritis is an autoimmune disease that causes joint damage due to inflammation in the soft tissue lining the joints known as the synovium. It is vital to identify joint damage as soon as possible to provide necessary treatment early and prevent further damage to the bone structures. Radiographs are often used to assess the extent of the joint damage. Currently, the scoring of joint dam… ▽ More Rheumatoid arthritis is an autoimmune disease that causes joint damage due to inflammation in the soft tissue lining the joints known as the synovium. It is vital to identify joint damage as soon as possible to provide necessary treatment early and prevent further damage to the bone structures. Radiographs are often used to assess the extent of the joint damage. Currently, the scoring of joint damage from the radiograph takes expertise, effort, and time. Joint damage associated with rheumatoid arthritis is also not quantitated in clinical practice and subjective descriptors are used. In this work, we describe a pipeline of deep learning models to automatically identify and score rheumatoid arthritic joint damage from a radiographic image. Our automatic tool was shown to produce scores with extremely high balanced accuracy within a couple of minutes and utilizing this would remove the subjectivity of the scores between human reviewers. △ Less

Submitted 17 October, 2021; originally announced October 2021.

arXiv:2107.04099 [pdf]

CASPIANET++: A Multidimensional Channel-Spatial Asymmetric Attention Network with Noisy Student Curriculum Learning Paradigm for Brain Tumor Segmentation

Authors: Andrea Liew, Chun Cheng Lee, Boon Leong Lan, Maxine Tan

Abstract: Convolutional neural networks (CNNs) have been used quite successfully for semantic segmentation of brain tumors. However, current CNNs and attention mechanisms are stochastic in nature and neglect the morphological indicators used by radiologists to manually annotate regions of interest. In this paper, we introduce a channel and spatial wise asymmetric attention (CASPIAN) by leveraging the inhere… ▽ More Convolutional neural networks (CNNs) have been used quite successfully for semantic segmentation of brain tumors. However, current CNNs and attention mechanisms are stochastic in nature and neglect the morphological indicators used by radiologists to manually annotate regions of interest. In this paper, we introduce a channel and spatial wise asymmetric attention (CASPIAN) by leveraging the inherent structure of tumors to detect regions of saliency. To demonstrate the efficacy of our proposed layer, we integrate this into a well-established convolutional neural network (CNN) architecture to achieve higher Dice scores, with less GPU resources. Also, we investigate the inclusion of auxiliary multiscale and multiplanar attention branches to increase the spatial context crucial in semantic segmentation tasks. The resulting architecture is the new CASPIANET++, which achieves Dice Scores of 91.19% whole tumor, 87.6% for tumor core and 81.03% for enhancing tumor. Furthermore, driven by the scarcity of brain tumor data, we investigate the Noisy Student method for segmentation tasks. Our new Noisy Student Curriculum Learning paradigm, which infuses noise incrementally to increase the complexity of the training images exposed to the network, further boosts the enhancing tumor region to 81.53%. Additional validation performed on the BraTS2020 data shows that the Noisy Student Curriculum Learning method works well without any additional training or finetuning. △ Less

Submitted 8 July, 2021; originally announced July 2021.

arXiv:2107.02660 [pdf, ps, other]

doi 10.1109/TIP.2023.3309408

HybrUR: A Hybrid Physical-Neural Solution for Unsupervised Underwater Image Restoration

Authors: Shuaizheng Yan, Xingyu Chen, Zhengxing Wu, Min Tan, Junzhi Yu

Abstract: Robust vision restoration of underwater images remains a challenge. Owing to the lack of well-matched underwater and in-air images, unsupervised methods based on the cyclic generative adversarial framework have been widely investigated in recent years. However, when using an end-to-end unsupervised approach with only unpaired image data, mode collapse could occur, and the color correction of the r… ▽ More Robust vision restoration of underwater images remains a challenge. Owing to the lack of well-matched underwater and in-air images, unsupervised methods based on the cyclic generative adversarial framework have been widely investigated in recent years. However, when using an end-to-end unsupervised approach with only unpaired image data, mode collapse could occur, and the color correction of the restored images is usually poor. In this paper, we propose a data- and physics-driven unsupervised architecture to perform underwater image restoration from unpaired underwater and in-air images. For effective color correction and quality enhancement, an underwater image degeneration model must be explicitly constructed based on the optically unambiguous physics law. Thus, we employ the Jaffe-McGlamery degeneration theory to design a generator and use neural networks to model the process of underwater visual degeneration. Furthermore, we impose physical constraints on the scene depth and degeneration factors for backscattering estimation to avoid the vanishing gradient problem during the training of the hybrid physical-neural model. Experimental results show that the proposed method can be used to perform high-quality restoration of unconstrained underwater images without supervision. On multiple benchmarks, the proposed method outperforms several state-of-the-art supervised and unsupervised approaches. We demonstrate that our method yields encouraging results in real-world applications. △ Less

Submitted 6 October, 2023; v1 submitted 6 July, 2021; originally announced July 2021.

Comments: 13 pages, 9 figures

Journal ref: IEEE Transactions on Image Processing, vol. 32, pp. 5004-5016, 2023

arXiv:2105.10407 [pdf]

doi 10.1117/12.2584011

Photonic single perceptron at Giga-OP/s speeds with Kerr microcombs for scalable optical neural networks

Authors: Mengxi Tan, Xingyuan Xu, David J. Moss

Abstract: Optical artificial neural networks (ONNs) have significant potential for ultra-high computing speed and energy efficiency. We report a novel approach to ONNs that uses integrated Kerr optical microcombs. This approach is programmable and scalable and is capable of reaching ultrahigh speeds. We demonstrate the basic building block ONNs, a single neuron perceptron, by map** synapses onto 49 wavele… ▽ More Optical artificial neural networks (ONNs) have significant potential for ultra-high computing speed and energy efficiency. We report a novel approach to ONNs that uses integrated Kerr optical microcombs. This approach is programmable and scalable and is capable of reaching ultrahigh speeds. We demonstrate the basic building block ONNs, a single neuron perceptron, by map** synapses onto 49 wavelengths to achieve an operating speed of 11.9 x 109 operations per second, or GigaOPS, at 8 bits per operation, which equates to 95.2 gigabits/s (Gbps). We test the perceptron on handwritten digit recognition and cancer cell detection, achieving over 90% and 85% accuracy, respectively. By scaling the perceptron to a deep learning network using off the shelf telecom technology we can achieve high throughput operation for matrix multiplication for real-time massive data processing. △ Less

Submitted 12 May, 2021; originally announced May 2021.

Comments: 14 pages, 8 figures, 107 references. arXiv admin note: substantial text overlap with arXiv:2101.12356

Journal ref: SPIE Paper 11690-21, PW21O-OE202-28, Smart Photonic and Optoelectronic Integrated Circuits XXIII, March (2021)

arXiv:2103.03354 [pdf]

doi 10.23919/MWP48676.2020.9314476

Optical data transmission field trial @ 44Tb/s with a 49GHz Kerr soliton crystal microcomb

Authors: Mengxi Tan, Xingyuan Xu, David J. Moss

Abstract: We report world record high data transmission over standard optical fiber from a single optical source. We achieve a line rate of 44.2 Terabits per second (Tb/s) employing only the C-band at 1550nm, resulting in a spectral efficiency of 10.4 bits/s/Hz. We use a new and powerful class of micro-comb called soliton crystals that exhibit robust operation and stable generation as well as a high intrins… ▽ More We report world record high data transmission over standard optical fiber from a single optical source. We achieve a line rate of 44.2 Terabits per second (Tb/s) employing only the C-band at 1550nm, resulting in a spectral efficiency of 10.4 bits/s/Hz. We use a new and powerful class of micro-comb called soliton crystals that exhibit robust operation and stable generation as well as a high intrinsic efficiency that, together with an extremely low spacing of 48.9 GHz enables a very high coherent data modulation format of 64 QAM. We achieve error free transmission across 75 km of standard optical fiber in the lab and over a field trial with a metropolitan optical fiber network. This work demonstrates the ability of optical micro-combs to exceed other approaches in performance for the most demanding practical optical communications applications. △ Less

Submitted 28 January, 2021; originally announced March 2021.

Comments: 5 pages, 5 figures, 81 references

Journal ref: IEEE Microwave Photonics Conference 2020 talk We2.3

arXiv:2102.05610 [pdf, other]

Searching for Fast Model Families on Datacenter Accelerators

Authors: Sheng Li, Mingxing Tan, Ruoming Pang, Andrew Li, Liqun Cheng, Quoc Le, Norman P. Jouppi

Abstract: Neural Architecture Search (NAS), together with model scaling, has shown remarkable progress in designing high accuracy and fast convolutional architecture families. However, as neither NAS nor model scaling considers sufficient hardware architecture details, they do not take full advantage of the emerging datacenter (DC) accelerators. In this paper, we search for fast and accurate CNN model famil… ▽ More Neural Architecture Search (NAS), together with model scaling, has shown remarkable progress in designing high accuracy and fast convolutional architecture families. However, as neither NAS nor model scaling considers sufficient hardware architecture details, they do not take full advantage of the emerging datacenter (DC) accelerators. In this paper, we search for fast and accurate CNN model families for efficient inference on DC accelerators. We first analyze DC accelerators and find that existing CNNs suffer from insufficient operational intensity, parallelism, and execution efficiency. These insights let us create a DC-accelerator-optimized search space, with space-to-depth, space-to-batch, hybrid fused convolution structures with vanilla and depthwise convolutions, and block-wise activation functions. On top of our DC accelerator optimized neural architecture search space, we further propose a latency-aware compound scaling (LACS), the first multi-objective compound scaling method optimizing both accuracy and latency. Our LACS discovers that network depth should grow much faster than image size and network width, which is quite different from previous compound scaling results. With the new search space and LACS, our search and scaling on datacenter accelerators results in a new model series named EfficientNet-X. EfficientNet-X is up to more than 2X faster than EfficientNet (a model series with state-of-the-art trade-off on FLOPs and accuracy) on TPUv3 and GPUv100, with comparable accuracy. EfficientNet-X is also up to 7X faster than recent RegNet and ResNeSt on TPUv3 and GPUv100. △ Less

Submitted 10 February, 2021; originally announced February 2021.

arXiv:2012.14117 [pdf]

doi 10.1007/s11548-021-02415-z

3D Axial-Attention for Lung Nodule Classification

Authors: Mundher Al-Shabi, Kelvin Shak, Maxine Tan

Abstract: Purpose: In recent years, Non-Local based methods have been successfully applied to lung nodule classification. However, these methods offer 2D attention or limited 3D attention to low-resolution feature maps. Moreover, they still depend on a convenient local filter such as convolution as full 3D attention is expensive to compute and requires a big dataset, which might not be available. Methods:… ▽ More Purpose: In recent years, Non-Local based methods have been successfully applied to lung nodule classification. However, these methods offer 2D attention or limited 3D attention to low-resolution feature maps. Moreover, they still depend on a convenient local filter such as convolution as full 3D attention is expensive to compute and requires a big dataset, which might not be available. Methods: We propose to use 3D Axial-Attention, which requires a fraction of the computing power of a regular Non-Local network (i.e., self-attention). Unlike a regular Non-Local network, the 3D Axial-Attention network applies the attention operation to each axis separately. Additionally, we solve the invariant position problem of the Non-Local network by proposing to add 3D positional encoding to shared embeddings. Results: We validated the proposed method on 442 benign nodules and 406 malignant nodules, extracted from the public LIDC-IDRI dataset by following a rigorous experimental setup using only nodules annotated by at least three radiologists. Our results show that the 3D Axial-Attention model achieves state-of-the-art performance on all evaluation metrics, including AUC and Accuracy. Conclusions: The proposed model provides full 3D attention, whereby every element (i.e., pixel) in the 3D volume space attends to every other element in the nodule effectively. Thus, the 3D Axial-Attention network can be used in all layers without the need for local filters. The experimental results show the importance of full 3D attention for classifying lung nodules. △ Less

Submitted 1 June, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

Journal ref: International Journal of Computer Assisted Radiology and Surgery, 1-6 (2021)

arXiv:2012.09472 [pdf]

A new semi-supervised self-training method for lung cancer prediction

Authors: Kelvin Shak, Mundher Al-Shabi, Andrea Liew, Boon Leong Lan, Wai Yee Chan, Kwan Hoong Ng, Maxine Tan

Abstract: Background and Objective: Early detection of lung cancer is crucial as it has high mortality rate with patients commonly present with the disease at stage 3 and above. There are only relatively few methods that simultaneously detect and classify nodules from computed tomography (CT) scans. Furthermore, very few studies have used semi-supervised learning for lung cancer prediction. This study prese… ▽ More Background and Objective: Early detection of lung cancer is crucial as it has high mortality rate with patients commonly present with the disease at stage 3 and above. There are only relatively few methods that simultaneously detect and classify nodules from computed tomography (CT) scans. Furthermore, very few studies have used semi-supervised learning for lung cancer prediction. This study presents a complete end-to-end scheme to detect and classify lung nodules using the state-of-the-art Self-training with Noisy Student method on a comprehensive CT lung screening dataset of around 4,000 CT scans. Methods: We used three datasets, namely LUNA16, LIDC and NLST, for this study. We first utilise a three-dimensional deep convolutional neural network model to detect lung nodules in the detection stage. The classification model known as Maxout Local-Global Network uses non-local networks to detect global features including shape features, residual blocks to detect local features including nodule texture, and a Maxout layer to detect nodule variations. We trained the first Self-training with Noisy Student model to predict lung cancer on the unlabelled NLST datasets. Then, we performed Mixup regularization to enhance our scheme and provide robustness to erroneous labels. Results and Conclusions: Our new Mixup Maxout Local-Global network achieves an AUC of 0.87 on 2,005 completely independent testing scans from the NLST dataset. Our new scheme significantly outperformed the next highest performing method at the 5% significance level using DeLong's test (p = 0.0001). This study presents a new complete end-to-end scheme to predict lung cancer using Self-training with Noisy Student combined with Mixup regularization. On a completely independent dataset of 2,005 scans, we achieved state-of-the-art performance even with more images as compared to other methods. △ Less

Submitted 17 December, 2020; originally announced December 2020.

Comments: 23 pages, 6 figures

arXiv:2010.15417 [pdf]

doi 10.1016/j.patcog.2021.108309

ProCAN: Progressive Growing Channel Attentive Non-Local Network for Lung Nodule Classification

Authors: Mundher Al-Shabi, Kelvin Shak, Maxine Tan

Abstract: Lung cancer classification in screening computed tomography (CT) scans is one of the most crucial tasks for early detection of this disease. Many lives can be saved if we are able to accurately classify malignant/cancerous lung nodules. Consequently, several deep learning based models have been proposed recently to classify lung nodules as malignant or benign. Nevertheless, the large variation in… ▽ More Lung cancer classification in screening computed tomography (CT) scans is one of the most crucial tasks for early detection of this disease. Many lives can be saved if we are able to accurately classify malignant/cancerous lung nodules. Consequently, several deep learning based models have been proposed recently to classify lung nodules as malignant or benign. Nevertheless, the large variation in the size and heterogeneous appearance of the nodules makes this task an extremely challenging one. We propose a new Progressive Growing Channel Attentive Non-Local (ProCAN) network for lung nodule classification. The proposed method addresses this challenge from three different aspects. First, we enrich the Non-Local network by adding channel-wise attention capability to it. Second, we apply Curriculum Learning principles, whereby we first train our model on easy examples before hard ones. Third, as the classification task gets harder during the Curriculum learning, our model is progressively grown to increase its capability of handling the task at hand. We examined our proposed method on two different public datasets and compared its performance with state-of-the-art methods in the literature. The results show that the ProCAN model outperforms state-of-the-art methods and achieves an AUC of 98.05% and an accuracy of 95.28% on the LIDC-IDRI dataset. Moreover, we conducted extensive ablation studies to analyze the contribution and effects of each new component of our proposed method. △ Less

Submitted 17 September, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

Journal ref: Published in 2022, Pattern Recognition

arXiv:2008.01325 [pdf, other]

Minimizing Electricity Cost through Smart Lighting Control for Indoor Plant Factories

Authors: Clement Lork, Michael Cubillas, Benny Kai Kiat Ng, Chau Yuen, Matthew Tan

Abstract: Smart plant factories incorporate sensing technology, actuators and control algorithms to automate processes, reducing the cost of production while improving crop yield many times over that of traditional farms. This paper investigates the growth of lettuce (Lactuca Sativa) in a smart farming setup when exposed to red and blue light-emitting diode (LED) horticulture lighting. An image segmentation… ▽ More Smart plant factories incorporate sensing technology, actuators and control algorithms to automate processes, reducing the cost of production while improving crop yield many times over that of traditional farms. This paper investigates the growth of lettuce (Lactuca Sativa) in a smart farming setup when exposed to red and blue light-emitting diode (LED) horticulture lighting. An image segmentation method based on K-means clustering is used to identify the size of the plant at each stage of growth, and the growth of the plant modelled in a feed forward network. Finally, an optimization algorithm based on the plant growth model is proposed to find the optimal lighting schedule for growing lettuce with respect to dynamic electricity pricing. Genetic algorithm was utilized to find solutions to the optimization problem. When compared to a baseline in a simulation setting, the schedules proposed by the genetic algorithm can achieved between 40-52% savings in energy costs, and up to a 6% increase in leaf area. △ Less

Submitted 4 August, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

Comments: IEEE IECON 2020

arXiv:2008.00942 [pdf, other]

Improving Generative Adversarial Networks with Local Coordinate Coding

Authors: Jiezhang Cao, Yong Guo, Qingyao Wu, Chunhua Shen, Junzhou Huang, Mingkui Tan

Abstract: Generative adversarial networks (GANs) have shown remarkable success in generating realistic data from some predefined prior distribution (e.g., Gaussian noises). However, such prior distribution is often independent of real data and thus may lose semantic information (e.g., geometric structure or content in images) of data. In practice, the semantic information might be represented by some latent… ▽ More Generative adversarial networks (GANs) have shown remarkable success in generating realistic data from some predefined prior distribution (e.g., Gaussian noises). However, such prior distribution is often independent of real data and thus may lose semantic information (e.g., geometric structure or content in images) of data. In practice, the semantic information might be represented by some latent distribution learned from data. However, such latent distribution may incur difficulties in data sampling for GANs. In this paper, rather than sampling from the predefined prior distribution, we propose an LCCGAN model with local coordinate coding (LCC) to improve the performance of generating data. First, we propose an LCC sampling method in LCCGAN to sample meaningful points from the latent manifold. With the LCC sampling method, we can exploit the local information on the latent manifold and thus produce new data with promising quality. Second, we propose an improved version, namely LCCGAN++, by introducing a higher-order term in the generator approximation. This term is able to achieve better approximation and thus further improve the performance. More critically, we derive the generalization bound for both LCCGAN and LCCGAN++ and prove that a low-dimensional input is sufficient to achieve good generalization performance. Extensive experiments on four benchmark datasets demonstrate the superiority of the proposed method over existing GANs. △ Less

Submitted 28 July, 2020; originally announced August 2020.

Comments: 20 pages, 5 figures

arXiv:2008.00820 [pdf, other]

doi 10.1109/TIP.2020.3009820

Generating Visually Aligned Sound from Videos

Authors: Peihao Chen, Yang Zhang, Mingkui Tan, Hongdong Xiao, Deng Huang, Chuang Gan

Abstract: We focus on the task of generating sound from natural videos, and the sound should be both temporally and content-wise aligned with visual signals. This task is extremely challenging because some sounds generated \emph{outside} a camera can not be inferred from video content. The model may be forced to learn an incorrect map** between visual content and these irrelevant sounds. To address this c… ▽ More We focus on the task of generating sound from natural videos, and the sound should be both temporally and content-wise aligned with visual signals. This task is extremely challenging because some sounds generated \emph{outside} a camera can not be inferred from video content. The model may be forced to learn an incorrect map** between visual content and these irrelevant sounds. To address this challenge, we propose a framework named REGNET. In this framework, we first extract appearance and motion features from video frames to better distinguish the object that emits sound from complex background information. We then introduce an innovative audio forwarding regularizer that directly considers the real sound as input and outputs bottlenecked sound features. Using both visual and bottlenecked sound features for sound prediction during training provides stronger supervision for the sound prediction. The audio forwarding regularizer can control the irrelevant sound component and thus prevent the model from learning an incorrect map** between video frames and sound emitted by the object that is out of the screen. During testing, the audio forwarding regularizer is removed to ensure that REGNET can produce purely aligned sound only from visual features. Extensive evaluations based on Amazon Mechanical Turk demonstrate that our method significantly improves both temporal and content-wise alignment. Remarkably, our generated sound can fool the human with a 68.12% success rate. Code and pre-trained models are publicly available at https://github.com/PeihaoChen/regnet △ Less

Submitted 14 July, 2020; originally announced August 2020.

Comments: Published in IEEE Transactions on Image Processing, 2020. Code, pre-trained models and demo video: https://github.com/PeihaoChen/regnet

arXiv:2008.00817 [pdf, other]

Retinal Image Segmentation with a Structure-Texture Demixing Network

Authors: Shihao Zhang, Huazhu Fu, Yanwu Xu, Yanxia Liu, Mingkui Tan

Abstract: Retinal image segmentation plays an important role in automatic disease diagnosis. This task is very challenging because the complex structure and texture information are mixed in a retinal image, and distinguishing the information is difficult. Existing methods handle texture and structure jointly, which may lead biased models toward recognizing textures and thus results in inferior segmentation… ▽ More Retinal image segmentation plays an important role in automatic disease diagnosis. This task is very challenging because the complex structure and texture information are mixed in a retinal image, and distinguishing the information is difficult. Existing methods handle texture and structure jointly, which may lead biased models toward recognizing textures and thus results in inferior segmentation performance. To address it, we propose a segmentation strategy that seeks to separate structure and texture components and significantly improve the performance. To this end, we design a structure-texture demixing network (STD-Net) that can process structures and textures differently and better. Extensive experiments on two retinal image segmentation tasks (i.e., blood vessel segmentation, optic disc and cup segmentation) demonstrate the effectiveness of the proposed method. △ Less

Submitted 15 July, 2020; originally announced August 2020.

Comments: Accepted to MICCAI 2020

arXiv:2007.07222 [pdf, other]

doi 10.1109/TIP.2020.3006377

Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis

Authors: Yifan Zhang, Ying Wei, Qingyao Wu, Peilin Zhao, Shuaicheng Niu, Junzhou Huang, Mingkui Tan

Abstract: Deep learning based medical image diagnosis has shown great potential in clinical medicine. However, it often suffers two major difficulties in real-world applications: 1) only limited labels are available for model training, due to expensive annotation costs over medical images; 2) labeled images may contain considerable label noise (e.g., mislabeling labels) due to diagnostic difficulties of dis… ▽ More Deep learning based medical image diagnosis has shown great potential in clinical medicine. However, it often suffers two major difficulties in real-world applications: 1) only limited labels are available for model training, due to expensive annotation costs over medical images; 2) labeled images may contain considerable label noise (e.g., mislabeling labels) due to diagnostic difficulties of diseases. To address these, we seek to exploit rich labeled data from relevant domains to help the learning in the target task via {Unsupervised Domain Adaptation} (UDA). Unlike most UDA methods that rely on clean labeled data or assume samples are equally transferable, we innovatively propose a Collaborative Unsupervised Domain Adaptation algorithm, which conducts transferability-aware adaptation and conquers label noise in a collaborative way. We theoretically analyze the generalization performance of the proposed method, and also empirically evaluate it on both medical and general images. Promising experimental results demonstrate the superiority and generalization of the proposed method. △ Less

Submitted 5 July, 2020; originally announced July 2020.

Comments: IEEE Transactions on Image Processing

arXiv:2005.02869 [pdf]

doi 10.1109/JLT.2020.2997699

Photonic RF channelizer based on a 90 wavelength optical soliton crystal 49GHz Kerr microcomb

Authors: Xingyuan Xu, Mengxi Tan, Jiayang Wu, Andreas Boes, Thach G. Nguyen, Sai T. Chu, Brent E. Little, Roberto Morandotti, Arnan Mitchell, David J. Moss

Abstract: We report a broadband radio frequency (RF) channelizer with up to 92 channels using a coherent microcomb source. A soliton crystal microcomb, generated by a 49 GHz micro-ring resonator (MRR), is used as a multi-wavelength source. Due to its ultra-low comb spacing, up to 92 wavelengths are available in the C band, yielding a broad operation bandwidth. Another high-Q MRR is employed as a passive opt… ▽ More We report a broadband radio frequency (RF) channelizer with up to 92 channels using a coherent microcomb source. A soliton crystal microcomb, generated by a 49 GHz micro-ring resonator (MRR), is used as a multi-wavelength source. Due to its ultra-low comb spacing, up to 92 wavelengths are available in the C band, yielding a broad operation bandwidth. Another high-Q MRR is employed as a passive optical periodic filter to slice the RF spectrum with a high resolution of 121.4 MHz. We experimentally achieve an instantaneous RF operation bandwidth of 8.08 GHz and verify RF channelization up to 17.55 GHz via thermal tuning. Our approach is a significant step towards the monolithically integrated photonic RF receivers with reduced complexity, size, and unprecedented performance, which is important for wide RF applications ranging from broadband analog signal processing to digital-compatible signal detection. △ Less

Submitted 20 April, 2020; originally announced May 2020.

Comments: 7 pages, 4 figures, 59 references

Journal ref: Journal of Lightwave Technology Early Access vol. 38 (2020)

arXiv:2005.01577 [pdf, other]

COVID-DA: Deep Domain Adaptation from Typical Pneumonia to COVID-19

Authors: Yifan Zhang, Shuaicheng Niu, Zhen Qiu, Ying Wei, Peilin Zhao, Jianhua Yao, Junzhou Huang, Qingyao Wu, Mingkui Tan

Abstract: The outbreak of novel coronavirus disease 2019 (COVID-19) has already infected millions of people and is still rapidly spreading all over the globe. Most COVID-19 patients suffer from lung infection, so one important diagnostic method is to screen chest radiography images, e.g., X-Ray or CT images. However, such examinations are time-consuming and labor-intensive, leading to limited diagnostic eff… ▽ More The outbreak of novel coronavirus disease 2019 (COVID-19) has already infected millions of people and is still rapidly spreading all over the globe. Most COVID-19 patients suffer from lung infection, so one important diagnostic method is to screen chest radiography images, e.g., X-Ray or CT images. However, such examinations are time-consuming and labor-intensive, leading to limited diagnostic efficiency. To solve this issue, AI-based technologies, such as deep learning, have been used recently as effective computer-aided means to improve diagnostic efficiency. However, one practical and critical difficulty is the limited availability of annotated COVID-19 data, due to the prohibitive annotation costs and urgent work of doctors to fight against the pandemic. This makes the learning of deep diagnosis models very challenging. To address this, motivated by that typical pneumonia has similar characteristics with COVID-19 and many pneumonia datasets are publicly available, we propose to conduct domain knowledge adaptation from typical pneumonia to COVID-19. There are two main challenges: 1) the discrepancy of data distributions between domains; 2) the task difference between the diagnosis of typical pneumonia and COVID-19. To address them, we propose a new deep domain adaptation method for COVID-19 diagnosis, namely COVID-DA. Specifically, we alleviate the domain discrepancy via feature adversarial adaptation and handle the task difference issue via a novel classifier separation scheme. In this way, COVID-DA is able to diagnose COVID-19 effectively with only a small number of COVID-19 annotations. Extensive experiments verify the effectiveness of COVID-DA and its great potential for real-world applications. △ Less

Submitted 29 April, 2020; originally announced May 2020.

arXiv:2003.13969 [pdf, ps, other]

A Thorough Comparison Study on Adversarial Attacks and Defenses for Common Thorax Disease Classification in Chest X-rays

Authors: Chendi Rao, Jiezhang Cao, Runhao Zeng, Qi Chen, Huazhu Fu, Yanwu Xu, Mingkui Tan

Abstract: Recently, deep neural networks (DNNs) have made great progress on automated diagnosis with chest X-rays images. However, DNNs are vulnerable to adversarial examples, which may cause misdiagnoses to patients when applying the DNN based methods in disease detection. Recently, there is few comprehensive studies exploring the influence of attack and defense methods on disease detection, especially for… ▽ More Recently, deep neural networks (DNNs) have made great progress on automated diagnosis with chest X-rays images. However, DNNs are vulnerable to adversarial examples, which may cause misdiagnoses to patients when applying the DNN based methods in disease detection. Recently, there is few comprehensive studies exploring the influence of attack and defense methods on disease detection, especially for the multi-label classification problem. In this paper, we aim to review various adversarial attack and defense methods on chest X-rays. First, the motivations and the mathematical representations of attack and defense methods are introduced in details. Second, we evaluate the influence of several state-of-the-art attack and defense methods for common thorax disease classification in chest X-rays. We found that the attack and defense methods have poor performance with excessive iterations and large perturbations. To address this, we propose a new defense method that is robust to different degrees of perturbations. This study could provide new insights into methodological development for the community. △ Less

Submitted 31 March, 2020; originally announced March 2020.

arXiv:2002.12588 [pdf, other]

Regional Registration of Whole Slide Image Stacks Containing Highly Deformed Artefacts

Authors: Mahsa Paknezhad, Sheng Yang Michael Loh, Yukti Choudhury, Valerie Koh Cui Koh, TimothyTay Kwang Yong, Hui Shan Tan, Ravindran Kanesvaran, Puay Hoon Tan, John Yuen Shyi Peng, Weimiao Yu, Yongcheng Benjamin Tan, Yong Zhen Loy, Min-Han Tan, Hwee Kuan Lee

Abstract: Motivation: High resolution 2D whole slide imaging provides rich information about the tissue structure. This information can be a lot richer if these 2D images can be stacked into a 3D tissue volume. A 3D analysis, however, requires accurate reconstruction of the tissue volume from the 2D image stack. This task is not trivial due to the distortions that each individual tissue slice experiences wh… ▽ More Motivation: High resolution 2D whole slide imaging provides rich information about the tissue structure. This information can be a lot richer if these 2D images can be stacked into a 3D tissue volume. A 3D analysis, however, requires accurate reconstruction of the tissue volume from the 2D image stack. This task is not trivial due to the distortions that each individual tissue slice experiences while cutting and mounting the tissue on the glass slide. Performing registration for the whole tissue slices may be adversely affected by the deformed tissue regions. Consequently, regional registration is found to be more effective. In this paper, we propose an accurate and robust regional registration algorithm for whole slide images which incrementally focuses registration on the area around the region of interest. Results: Using mean similarity index as the metric, the proposed algorithm (mean $\pm$ std: $0.84 \pm 0.11$) followed by a fine registration algorithm ($0.86 \pm 0.08$) outperformed the state-of-the-art linear whole tissue registration algorithm ($0.74 \pm 0.19$) and the regional version of this algorithm ($0.81 \pm 0.15$). The proposed algorithm also outperforms the state-of-the-art nonlinear registration algorithm (original : $0.82 \pm 0.12$, regional : $0.77 \pm 0.22$) for whole slide images and a recently proposed patch-based registration algorithm (patch size 256: $0.79 \pm 0.16$ , patch size 512: $0.77 \pm 0.16$) for medical images. Availability: The C++ implementation code is available online at the github repository: https://github.com/MahsaPaknezhad/WSIRegistration △ Less

Submitted 28 February, 2020; originally announced February 2020.

arXiv:1912.05027 [pdf, other]

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Authors: Xianzhi Du, Tsung-Yi Lin, Pengchong **, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song

Abstract: Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a b… ▽ More Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a backbone model designed for classification tasks. In this paper, we argue encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone. We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search. Using similar building blocks, SpineNet models outperform ResNet-FPN models by ~3% AP at various scales while using 10-20% fewer FLOPs. In particular, SpineNet-190 achieves 52.5% AP with a MaskR-CNN detector and achieves 52.1% AP with a RetinaNet detector on COCO for a single model without test-time augmentation, significantly outperforms prior art of detectors. SpineNet can transfer to classification tasks, achieving 5% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset. Code is at: https://github.com/tensorflow/tpu/tree/master/models/official/detection. △ Less

Submitted 17 June, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

Comments: CVPR 2020

arXiv:1911.09070 [pdf, other]

EfficientDet: Scalable and Efficient Object Detection

Authors: Mingxing Tan, Ruoming Pang, Quoc V. Le

Abstract: Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multiscale feature fusion; Second, we propose a compound scal… ▽ More Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multiscale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations and better backbones, we have developed a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior art across a wide spectrum of resource constraints. In particular, with single model and single-scale, our EfficientDet-D7 achieves state-of-the-art 55.1 AP on COCO test-dev with 77M parameters and 410B FLOPs, being 4x - 9x smaller and using 13x - 42x fewer FLOPs than previous detectors. Code is available at https://github.com/google/automl/tree/master/efficientdet. △ Less

Submitted 27 July, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: CVPR 2020

Journal ref: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

arXiv:1910.06282 [pdf]

doi 10.1109/JLT.2019.2946606

Microwave photonic fractional Hilbert transformer with an integrated optical soliton crystal micro-comb

Authors: Mengxi Tan, Xingyuan Xu, Bill Corcoran, Jiayang Wu, Andreas Boes, Thach G. Nguyen, Sai T. Chu, Brent E. Little, Roberto Morandotti, Arnan Mitchell, David J. Moss

Abstract: We report a photonic microwave and RF fractional Hilbert transformer based on an integrated Kerr micro-comb source. The micro-comb source has a free spectral range (FSR) of 50GHz, generating a large number of comb lines that serve as a high-performance multi-wavelength source for the transformer. By programming and sha** the comb lines according to calculated tap weights, we achieve both arbitra… ▽ More We report a photonic microwave and RF fractional Hilbert transformer based on an integrated Kerr micro-comb source. The micro-comb source has a free spectral range (FSR) of 50GHz, generating a large number of comb lines that serve as a high-performance multi-wavelength source for the transformer. By programming and sha** the comb lines according to calculated tap weights, we achieve both arbitrary fractional orders and a broad operation bandwidth. We experimentally characterize the RF amplitude and phase response for different fractional orders and perform system demonstrations of real-time fractional Hilbert transforms. We achieve a phase ripple of < 0.15 rad within the 3-dB pass-band, with bandwidths ranging from 5 to 9 octaves, depending on the order. The experimental results show good agreement with theory, confirming the effectiveness of our approach as a new way to implement high-performance fractional Hilbert transformers with broad processing bandwidth, high reconfigurability, and greatly reduced size and complexity. △ Less

Submitted 8 October, 2019; originally announced October 2019.

Comments: 12 pages, 7 figures, 61 references

Journal ref: IEEE Journal of Lightwave Technology, Volume 37, (2019)

arXiv:1910.04030 [pdf, other]

Cribriform pattern detection in prostate histopathological images using deep learning models

Authors: Malay Singh, Emarene Mationg Kalaw, Wang Jie, Mundher Al-Shabi, Chin Fong Wong, Danilo Medina Giron, Kian-Tai Chong, Maxine Tan, Zeng Zeng, Hwee Kuan Lee

Abstract: Architecture, size, and shape of glands are most important patterns used by pathologists for assessment of cancer malignancy in prostate histopathological tissue slides. Varying structures of glands along with cumbersome manual observations may result in subjective and inconsistent assessment. Cribriform gland with irregular border is an important feature in Gleason pattern 4. We propose using dee… ▽ More Architecture, size, and shape of glands are most important patterns used by pathologists for assessment of cancer malignancy in prostate histopathological tissue slides. Varying structures of glands along with cumbersome manual observations may result in subjective and inconsistent assessment. Cribriform gland with irregular border is an important feature in Gleason pattern 4. We propose using deep neural networks for cribriform pattern classification in prostate histopathological images. $163708$ Hematoxylin and Eosin (H\&E) stained images were extracted from histopathologic tissue slides of $19$ patients with prostate cancer and annotated for cribriform patterns. Our automated image classification system analyses the H\&E images to classify them as either `Cribriform' or `Non-cribriform'. Our system uses various deep learning approaches and hand-crafted image pixel intensity-based features. We present our results for cribriform pattern detection across various parameters and configuration allowed by our system. The combination of fine-tuned deep learning models outperformed the state-of-art nuclei feature based methods. Our image classification system achieved the testing accuracy of $85.93~\pm~7.54$ (cross-validated) and $88.04~\pm~5.63$ ( additional unseen test set) across three folds. In this paper, we present an annotated cribriform dataset along with analysis of deep learning models and hand-crafted features for cribriform pattern detection in prostate histopathological images. △ Less

Submitted 9 October, 2019; originally announced October 2019.

Comments: 21 pages, 4 figures, 6 tables

arXiv:1909.03353 [pdf]

doi 10.1109/LPT.2019.2940497

Microwave and RF signal processing based on integrated soliton crystal optical microcombs

Authors: Xingyuan Xu, Mengxi Tan, Jiayang Wu, Roberto Morandotti, Arnan Mitchell, David J. Moss

Abstract: Microcombs are powerful tools as sources of multiple wavelength channels for photonic RF signal processing. They offer a compact device footprint, large numbers of wavelengths, and wide Nyquist bands. Here, we review recent progress on microcomb-based photonic RF signal processors, including true time delays, reconfigurable filters, Hilbert transformers, differentiators, and channelizers. The stro… ▽ More Microcombs are powerful tools as sources of multiple wavelength channels for photonic RF signal processing. They offer a compact device footprint, large numbers of wavelengths, and wide Nyquist bands. Here, we review recent progress on microcomb-based photonic RF signal processors, including true time delays, reconfigurable filters, Hilbert transformers, differentiators, and channelizers. The strong potential of optical micro-combs for RF photonics applications in terms of functions and integrability is also discussed. △ Less

Submitted 7 September, 2019; originally announced September 2019.

Comments: 7 pages, 7 figures, 39 references

Journal ref: IEEE Photonics Technology Letters Volume 31 (2019)

arXiv:1907.12930 [pdf, other]

Attention Guided Network for Retinal Image Segmentation

Authors: Shihao Zhang, Huazhu Fu, Yuguang Yan, Yubing Zhang, Qingyao Wu, Ming Yang, Mingkui Tan, Yanwu Xu

Abstract: Learning structural information is critical for producing an ideal result in retinal image segmentation. Recently, convolutional neural networks have shown a powerful ability to extract effective representations. However, convolutional and pooling operations filter out some useful structural information. In this paper, we propose an Attention Guided Network (AG-Net) to preserve the structural info… ▽ More Learning structural information is critical for producing an ideal result in retinal image segmentation. Recently, convolutional neural networks have shown a powerful ability to extract effective representations. However, convolutional and pooling operations filter out some useful structural information. In this paper, we propose an Attention Guided Network (AG-Net) to preserve the structural information and guide the expanding operation. In our AG-Net, the guided filter is exploited as a structure sensitive expanding path to transfer structural information from previous feature maps, and an attention block is introduced to exclude the noise and reduce the negative influence of background further. The extensive experiments on two retinal image segmentation tasks (i.e., blood vessel segmentation, optic disc and cup segmentation) demonstrate the effectiveness of our proposed method. △ Less

Submitted 23 October, 2019; v1 submitted 25 July, 2019; originally announced July 2019.

Comments: Accepted to MICCAI 2019. Project page: (https://github.com/HzFu/AGNet)

arXiv:1808.08828 [pdf]

Photonic single sideband RF generator based on an integrated optical micro-ring resonator

Authors: Xingyuan Xu, Jiayang Wu, Mengxi Tan, Thach G. Nguyen, Sai T. Chu, Brent E. Little, Roberto Morandotti, Arnan Mitchell, David J. Moss

Abstract: We demonstrate narrowband orthogonally polarized optical RF single sideband generation as well as dual-channel equalization based on an integrated dual-polarization-mode high-Q microring resonator. The device operates in the optical communications band and enables narrowband RF operation at either 16.6 GHz or 32.2 GHz, determined by the free spectral range and TE/TM mode interval in the resonator.… ▽ More We demonstrate narrowband orthogonally polarized optical RF single sideband generation as well as dual-channel equalization based on an integrated dual-polarization-mode high-Q microring resonator. The device operates in the optical communications band and enables narrowband RF operation at either 16.6 GHz or 32.2 GHz, determined by the free spectral range and TE/TM mode interval in the resonator. We achieve a very large dynamic tuning range of over 55 dB for both the optical carrier-to-sideband ratio and the dual-channel RF equalization. △ Less

Submitted 7 August, 2018; originally announced August 2018.

Comments: 10 pages, 13 Figures, 53 references

Journal ref: IEEE Journal of Lightwave Technology (JLT) Volume 36 (2018)

arXiv:1508.06927 [pdf, ps, other]

On Convergence Rate of Leader-Following Consensus of Linear Multi-Agent Systems with Communication Noises

Authors: Long Cheng, Yunpeng Wang, Wei Ren, Zeng-Guang Hou, Min Tan

Abstract: This note further studies the previously proposed consensus protocol for linear multi-agent systems with communication noises in [15], [16]. Each agent is allowed to have its own time-varying gain to attenuate the effect of communication noises. Therefore, the common assumption in most references that all agents have the same noise-attenuation gain is not necessary. It has been proved that if all… ▽ More This note further studies the previously proposed consensus protocol for linear multi-agent systems with communication noises in [15], [16]. Each agent is allowed to have its own time-varying gain to attenuate the effect of communication noises. Therefore, the common assumption in most references that all agents have the same noise-attenuation gain is not necessary. It has been proved that if all noise-attenuation gains are infinitesimal of the same order, then the mean square leader-following consensus can be reached. Furthermore, the convergence rate of the multi-agent system has been investigated. If the noise-attenuation gains belong to a class of functions which are bounded above and below by $t^{-β}$ $(β\in(0,1))$ asymptotically, then the states of all follower agents are convergent in mean square to the leader's state with the rate characterized by a function bounded above by $t^{-β}$ asymptotically. △ Less

Submitted 27 August, 2015; originally announced August 2015.

arXiv:1501.05502 [pdf, ps, other]

doi 10.1155/2017/1048081

Optimizing production scheduling of steel plate hot rolling for economic load dispatch under time-of-use electricity pricing

Authors: Mao Tan, Hua-li Yang, Bin Duan, Yong-xin Su, Feng He

Abstract: Time-of-Use (TOU) electricity pricing provides an opportunity for industrial users to cut electricity costs. Although many methods for Economic Load Dispatch (ELD) under TOU pricing in continuous industrial processing have been proposed, there are still difficulties in batch-type processing since power load units are not directly adjustable and nonlinearly depend on production planning and schedul… ▽ More Time-of-Use (TOU) electricity pricing provides an opportunity for industrial users to cut electricity costs. Although many methods for Economic Load Dispatch (ELD) under TOU pricing in continuous industrial processing have been proposed, there are still difficulties in batch-type processing since power load units are not directly adjustable and nonlinearly depend on production planning and scheduling. In this paper, for hot rolling, a typical batch-type and energy intensive process in steel industry, a production scheduling optimization model for ELD is proposed under TOU pricing, in which the objective is to minimize electricity costs while considering penalties caused by jumps between adjacent slabs. A NSGA-II based multi-objective production scheduling algorithm is developed to obtain Pareto-optimal solutions, and then TOPSIS based multi-criteria decision-making is performed to recommend an optimal solution to facilitate filed operation. Experimental results and analyses show that the proposed method cuts electricity costs in production, especially in case of allowance for penalty score increase in a certain range. Further analyses show that the proposed method has effect on peak load regulation of power grid. △ Less

Submitted 10 March, 2017; v1 submitted 22 January, 2015; originally announced January 2015.

Comments: 13 pages, 6 figures, 4 tables

arXiv:1411.4346 [pdf, other]

Containment Control of Multi-Agent Systems with Dynamic Leaders Based on a $PI^n$-Type Approach

Authors: Yunpeng Wang, Long Cheng, Wei Ren, Zeng-Guang Hou, Min Tan

Abstract: This paper studies the containment control problem of multi-agent systems with multiple dynamic leaders in both the discrete-time domain and the continuous-time domain. The leaders' motions are described by $(n-1)$-order polynomial trajectories. This setting makes practical sense because given some critical points, the leaders' trajectories are usually planned by the polynomial interpolations. In… ▽ More This paper studies the containment control problem of multi-agent systems with multiple dynamic leaders in both the discrete-time domain and the continuous-time domain. The leaders' motions are described by $(n-1)$-order polynomial trajectories. This setting makes practical sense because given some critical points, the leaders' trajectories are usually planned by the polynomial interpolations. In order to drive all followers into the convex hull spanned by the leaders, a $PI^n$-type ($P$ and $I$ are short for {\it Proportion} and {\it Integration}, respectively; $I^n$ implies that the algorithm includes high-order integral terms) containment algorithm is proposed. It is theoretically proved that the $PI^n$-type containment algorithm is able to solve the containment problem of multi-agent systems where the followers are described by any order integral dynamics. Compared with the previous results on the multi-agent systems with dynamic leaders, the distinguished features of this paper are that: (1) the containment problem is studied not only in the continuous-time domain but also in the discrete-time domain while most existing results only work in the continuous-time domain; (2) to deal with the leaders with the $(n-1)$-order polynomial trajectories, existing results require the follower's dynamics to be $n$-order integral while the followers considered in this paper can be described by any-order integral; and (3) the "sign" function is not employed in the proposed algorithm, which avoids the chattering phenomenon. Furthermore, in order to illustrate the practical value of the proposed approach, an application, the containment control of multiple mobile robots is studied. Finally, two simulation examples are given to demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 27 August, 2015; v1 submitted 16 November, 2014; originally announced November 2014.

arXiv:1304.3972 [pdf, ps, other]

Reaching a Consensus in Networks of High-Order Integral Agents under Switching Directed Topology

Authors: Long Cheng, Zeng-Guang Hou, Min Tan

Abstract: Consensus problem of high-order integral multi-agent systems under switching directed topology is considered in this study. Depending on whether the agent's full state is available or not, two distributed protocols are proposed to ensure that states of all agents can be convergent to a same stationary value. In the proposed protocols, the gain vector associated with the agent's (estimated) state a… ▽ More Consensus problem of high-order integral multi-agent systems under switching directed topology is considered in this study. Depending on whether the agent's full state is available or not, two distributed protocols are proposed to ensure that states of all agents can be convergent to a same stationary value. In the proposed protocols, the gain vector associated with the agent's (estimated) state and the gain vector associated with the relative (estimated) states between agents are designed in a sophisticated way. By this particular design, the high-order integral multi-agent system can be transformed into a first-order integral multi-agent system. And the convergence of the transformed first-order integral agent's state indicates the convergence of the original high-order integral agent's state if and only if all roots of the polynomial, whose coefficients are the entries of the gain vector associated with the relative (estimated) states between agents, are in the open left-half complex plane. Therefore, many analysis techniques in the first-order integral multi-agent system can be directly borrowed to solve the problems in the high-order integral multi-agent system. Due to this property, it is proved that to reach a consensus, the switching directed topology of multi-agent system is only required to be "uniformly jointly quasi-strongly connected", which seems the mildest connectivity condition in the literature. In addition, the consensus problem of discrete-time high-order integral multi-agent systems is studied. The corresponding consensus protocol and performance analysis are presented. Finally, three simulation examples are provided to show the effectiveness of the proposed approach. △ Less

Submitted 14 April, 2013; originally announced April 2013.

Showing 1–40 of 40 results for author: Tan, M