-
Deep Evidential Learning for Dose Prediction
Authors:
Hai Siong Tan,
Kuancheng Wang,
Rafe Mcbeth
Abstract:
In this work, we present a novel application of an uncertainty-quantification framework called Deep Evidential Learning in the domain of radiotherapy dose prediction. Using medical images of the Open Knowledge-Based Planning Challenge dataset, we found that this model can be effectively harnessed to yield uncertainty estimates that inherited correlations with prediction errors upon completion of n…
▽ More
In this work, we present a novel application of an uncertainty-quantification framework called Deep Evidential Learning in the domain of radiotherapy dose prediction. Using medical images of the Open Knowledge-Based Planning Challenge dataset, we found that this model can be effectively harnessed to yield uncertainty estimates that inherited correlations with prediction errors upon completion of network training. This was achieved only after reformulating the original loss function for a stable implementation. We found that (i)epistemic uncertainty was highly correlated with prediction errors, with various association indices comparable or stronger than those for Monte-Carlo Dropout and Deep Ensemble methods, (ii)the median error varied with uncertainty threshold much more linearly for epistemic uncertainty in Deep Evidential Learning relative to these other two conventional frameworks, indicative of a more uniformly calibrated sensitivity to model errors, (iii)relative to epistemic uncertainty, aleatoric uncertainty demonstrated a more significant shift in its distribution in response to Gaussian noise added to CT intensity, compatible with its interpretation as reflecting data noise. Collectively, our results suggest that Deep Evidential Learning is a promising approach that can endow deep-learning models in radiotherapy dose prediction with statistical robustness. Towards enhancing its clinical relevance, we demonstrate how we can use such a model to construct the predicted Dose-Volume-Histograms' confidence intervals.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment
Authors:
Tianwei Zhou,
Songbai Tan,
Wei Zhou,
Yu Luo,
Yuan-Gen Wang,
Guanghui Yue
Abstract:
With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a nov…
▽ More
With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a novel blind image quality assessment (IQA) network, named AMFF-Net, for AGIs. AMFF-Net evaluates AGI quality from three dimensions, i.e., "visual quality", "authenticity", and "consistency". Specifically, inspired by the characteristics of the human visual system and motivated by the observation that "visual quality" and "authenticity" are characterized by both local and global aspects, AMFF-Net scales the image up and down and takes the scaled images and original-sized image as the inputs to obtain multi-scale features. After that, an Adaptive Feature Fusion (AFF) block is used to adaptively fuse the multi-scale features with learnable weights. In addition, considering the correlation between the image and prompt, AMFF-Net compares the semantic features from text encoder and image encoder to evaluate the text-to-image alignment. We carry out extensive experiments on three AGI quality assessment databases, and the experimental results show that our AMFF-Net obtains better performance than nine state-of-the-art blind IQA methods. The results of ablation experiments further demonstrate the effectiveness of the proposed multi-scale input strategy and AFF block.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Transfer CLIP for Generalizable Image Denoising
Authors:
Jun Cheng,
Dong Liang,
Shan Tan
Abstract:
Image denoising is a fundamental task in computer vision. While prevailing deep learning-based supervised and self-supervised methods have excelled in eliminating in-distribution noise, their susceptibility to out-of-distribution (OOD) noise remains a significant challenge. The recent emergence of contrastive language-image pre-training (CLIP) model has showcased exceptional capabilities in open-w…
▽ More
Image denoising is a fundamental task in computer vision. While prevailing deep learning-based supervised and self-supervised methods have excelled in eliminating in-distribution noise, their susceptibility to out-of-distribution (OOD) noise remains a significant challenge. The recent emergence of contrastive language-image pre-training (CLIP) model has showcased exceptional capabilities in open-world image recognition and segmentation. Yet, the potential for leveraging CLIP to enhance the robustness of low-level tasks remains largely unexplored. This paper uncovers that certain dense features extracted from the frozen ResNet image encoder of CLIP exhibit distortion-invariant and content-related properties, which are highly desirable for generalizable denoising. Leveraging these properties, we devise an asymmetrical encoder-decoder denoising network, which incorporates dense features including the noisy image and its multi-scale features from the frozen ResNet encoder of CLIP into a learnable image decoder to achieve generalizable denoising. The progressive feature augmentation strategy is further proposed to mitigate feature overfitting and improve the robustness of the learnable decoder. Extensive experiments and comparisons conducted across diverse OOD noises, including synthetic noise, real-world sRGB noise, and low-dose CT image noise, demonstrate the superior generalization ability of our method.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
REWIND Dataset: Privacy-preserving Speaking Status Segmentation from Multimodal Body Movement Signals in the Wild
Authors:
Jose Vargas Quiros,
Chirag Raman,
Stephanie Tan,
Ekin Gedik,
Laura Cabrera-Quiros,
Hayley Hung
Abstract:
Recognizing speaking in humans is a central task towards understanding social interactions. Ideally, speaking would be detected from individual voice recordings, as done previously for meeting scenarios. However, individual voice recordings are hard to obtain in the wild, especially in crowded mingling scenarios due to cost, logistics, and privacy concerns. As an alternative, machine learning mode…
▽ More
Recognizing speaking in humans is a central task towards understanding social interactions. Ideally, speaking would be detected from individual voice recordings, as done previously for meeting scenarios. However, individual voice recordings are hard to obtain in the wild, especially in crowded mingling scenarios due to cost, logistics, and privacy concerns. As an alternative, machine learning models trained on video and wearable sensor data make it possible to recognize speech by detecting its related gestures in an unobtrusive, privacy-preserving way. These models themselves should ideally be trained using labels obtained from the speech signal. However, existing mingling datasets do not contain high quality audio recordings. Instead, speaking status annotations have often been inferred by human annotators from video, without validation of this approach against audio-based ground truth. In this paper we revisit no-audio speaking status estimation by presenting the first publicly available multimodal dataset with high-quality individual speech recordings of 33 subjects in a professional networking event. We present three baselines for no-audio speaking status segmentation: a) from video, b) from body acceleration (chest-worn accelerometer), c) from body pose tracks. In all cases we predict a 20Hz binary speaking status signal extracted from the audio, a time resolution not available in previous datasets. In addition to providing the signals and ground truth necessary to evaluate a wide range of speaking status detection methods, the availability of audio in REWIND makes it suitable for cross-modality studies not feasible with previous mingling datasets. Finally, our flexible data consent setup creates new challenges for multimodal systems under missing modalities.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Artificial Intelligence and Diabetes Mellitus: An Inside Look Through the Retina
Authors:
Yasin Sadeghi Bazargani,
Majid Mirzaei,
Navid Sobhi,
Mirsaeed Abdollahi,
Ali Jafarizadeh,
Siamak Pedrammehr,
Roohallah Alizadehsani,
Ru San Tan,
Sheikh Mohammed Shariful Islam,
U. Rajendra Acharya
Abstract:
Diabetes mellitus (DM) predisposes patients to vascular complications. Retinal images and vasculature reflect the body's micro- and macrovascular health. They can be used to diagnose DM complications, including diabetic retinopathy (DR), neuropathy, nephropathy, and atherosclerotic cardiovascular disease, as well as forecast the risk of cardiovascular events. Artificial intelligence (AI)-enabled s…
▽ More
Diabetes mellitus (DM) predisposes patients to vascular complications. Retinal images and vasculature reflect the body's micro- and macrovascular health. They can be used to diagnose DM complications, including diabetic retinopathy (DR), neuropathy, nephropathy, and atherosclerotic cardiovascular disease, as well as forecast the risk of cardiovascular events. Artificial intelligence (AI)-enabled systems developed for high-throughput detection of DR using digitized retinal images have become clinically adopted. Beyond DR screening, AI integration also holds immense potential to address challenges associated with the holistic care of the patient with DM. In this work, we aim to comprehensively review the literature for studies on AI applications based on retinal images related to DM diagnosis, prognostication, and management. We will describe the findings of holistic AI-assisted diabetes care, including but not limited to DR screening, and discuss barriers to implementing such systems, including issues concerning ethics, data privacy, equitable access, and explainability. With the ability to evaluate the patient's health status vis a vis DM complication as well as risk prognostication of future cardiovascular complications, AI-assisted retinal image analysis has the potential to become a central tool for modern personalized medicine in patients with DM.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Deep Learning Based Adaptive Joint mmWave Beam Alignment
Authors:
Daniel Tandler,
Marc Gauger,
Ahmet Serdar Tan,
Sebastian Dörner,
Stephan ten Brink
Abstract:
The challenging propagation environment, combined with the hardware limitations of mmWave systems, gives rise to the need for accurate initial access beam alignment strategies with low latency and high achievable beamforming gain. Much of the recent work in this area either focuses on one-sided beam alignment, or, joint beam alignment methods where both sides of the link perform a sequence of fixe…
▽ More
The challenging propagation environment, combined with the hardware limitations of mmWave systems, gives rise to the need for accurate initial access beam alignment strategies with low latency and high achievable beamforming gain. Much of the recent work in this area either focuses on one-sided beam alignment, or, joint beam alignment methods where both sides of the link perform a sequence of fixed channel probing steps. Codebook-based non-adaptive beam alignment schemes have the potential to allow multiple user equipment (UE) to perform initial access beam alignment in parallel whereas adaptive schemes are favourable in achievable beamforming gain. This work introduces a novel deep learning based joint beam alignment scheme that aims to combine the benefits of adaptive, codebook-free beam alignment at the UE side with the advantages of a codebook-sweep based scheme at the base station. The proposed end-to-end trainable scheme is compatible with current cellular standard signaling and can be readily integrated into the standard without requiring significant changes to it. Extensive simulations demonstrate superior performance of the proposed approach over purely codebook-based ones.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Swin UNETR++: Advancing Transformer-Based Dense Dose Prediction Towards Fully Automated Radiation Oncology Treatments
Authors:
Kuancheng Wang,
Hai Siong Tan,
Rafe Mcbeth
Abstract:
The field of Radiation Oncology is uniquely positioned to benefit from the use of artificial intelligence to fully automate the creation of radiation treatment plans for cancer therapy. This time-consuming and specialized task combines patient imaging with organ and tumor segmentation to generate a 3D radiation dose distribution to meet clinical treatment goals, similar to voxel-level dense predic…
▽ More
The field of Radiation Oncology is uniquely positioned to benefit from the use of artificial intelligence to fully automate the creation of radiation treatment plans for cancer therapy. This time-consuming and specialized task combines patient imaging with organ and tumor segmentation to generate a 3D radiation dose distribution to meet clinical treatment goals, similar to voxel-level dense prediction. In this work, we propose Swin UNETR++, that contains a lightweight 3D Dual Cross-Attention (DCA) module to capture the intra and inter-volume relationships of each patient's unique anatomy, which fully convolutional neural networks lack. Our model was trained, validated, and tested on the Open Knowledge-Based Planning dataset. In addition to metrics of Dose Score $\overline{S_{\text{Dose}}}$ and DVH Score $\overline{S_{\text{DVH}}}$ that quantitatively measure the difference between the predicted and ground-truth 3D radiation dose distribution, we propose the qualitative metrics of average volume-wise acceptance rate $\overline{R_{\text{VA}}}$ and average patient-wise clinical acceptance rate $\overline{R_{\text{PA}}}$ to assess the clinical reliability of the predictions. Swin UNETR++ demonstrates near-state-of-the-art performance on validation and test dataset (validation: $\overline{S_{\text{DVH}}}$=1.492 Gy, $\overline{S_{\text{Dose}}}$=2.649 Gy, $\overline{R_{\text{VA}}}$=88.58%, $\overline{R_{\text{PA}}}$=100.0%; test: $\overline{S_{\text{DVH}}}$=1.634 Gy, $\overline{S_{\text{Dose}}}$=2.757 Gy, $\overline{R_{\text{VA}}}$=90.50%, $\overline{R_{\text{PA}}}$=98.0%), establishing a basis for future studies to translate 3D dose predictions into a deliverable treatment plan, facilitating full automation.
△ Less
Submitted 17 March, 2024; v1 submitted 11 November, 2023;
originally announced November 2023.
-
Stain Consistency Learning: Handling Stain Variation for Automatic Digital Pathology Segmentation
Authors:
Michael Yeung,
Todd Watts,
Sean YW Tan,
Pedro F. Ferreira,
Andrew D. Scott,
Sonia Nielles-Vallespin,
Guang Yang
Abstract:
Stain variation is a unique challenge associated with automated analysis of digital pathology. Numerous methods have been developed to improve the robustness of machine learning methods to stain variation, but comparative studies have demonstrated limited benefits to performance. Moreover, methods to handle stain variation were largely developed for H&E stained data, with evaluation generally limi…
▽ More
Stain variation is a unique challenge associated with automated analysis of digital pathology. Numerous methods have been developed to improve the robustness of machine learning methods to stain variation, but comparative studies have demonstrated limited benefits to performance. Moreover, methods to handle stain variation were largely developed for H&E stained data, with evaluation generally limited to classification tasks. Here we propose Stain Consistency Learning, a novel framework combining stain-specific augmentation with a stain consistency loss function to learn stain colour invariant features. We perform the first, extensive comparison of methods to handle stain variation for segmentation tasks, comparing ten methods on Masson's trichrome and H&E stained cell and nuclei datasets, respectively. We observed that stain normalisation methods resulted in equivalent or worse performance, while stain augmentation or stain adversarial methods demonstrated improved performance, with the best performance consistently achieved by our proposed approach. The code is available at: https://github.com/mlyg/stain_consistency_learning
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Unsupervised CT Metal Artifact Reduction by Plugging Diffusion Priors in Dual Domains
Authors:
Xuan Liu,
Yaoqin Xie,
Songhui Diao,
Shan Tan,
Xiaokun Liang
Abstract:
During the process of computed tomography (CT), metallic implants often cause disruptive artifacts in the reconstructed images, impeding accurate diagnosis. Several supervised deep learning-based approaches have been proposed for reducing metal artifacts (MAR). However, these methods heavily rely on training with simulated data, as obtaining paired metal artifact CT and clean CT data in clinical s…
▽ More
During the process of computed tomography (CT), metallic implants often cause disruptive artifacts in the reconstructed images, impeding accurate diagnosis. Several supervised deep learning-based approaches have been proposed for reducing metal artifacts (MAR). However, these methods heavily rely on training with simulated data, as obtaining paired metal artifact CT and clean CT data in clinical settings is challenging. This limitation can lead to decreased performance when applying these methods in clinical practice. Existing unsupervised MAR methods, whether based on learning or not, typically operate within a single domain, either in the image domain or the sinogram domain. In this paper, we propose an unsupervised MAR method based on the diffusion model, a generative model with a high capacity to represent data distributions. Specifically, we first train a diffusion model using CT images without metal artifacts. Subsequently, we iteratively utilize the priors embedded within the pre-trained diffusion model in both the sinogram and image domains to restore the degraded portions caused by metal artifacts. This dual-domain processing empowers our approach to outperform existing unsupervised MAR methods, including another MAR method based on the diffusion model, which we have qualitatively and quantitatively validated using synthetic datasets. Moreover, our method demonstrates superior visual results compared to both supervised and unsupervised methods on clinical datasets.
△ Less
Submitted 5 January, 2024; v1 submitted 31 August, 2023;
originally announced August 2023.
-
Diffusion Probabilistic Priors for Zero-Shot Low-Dose CT Image Denoising
Authors:
Xuan Liu,
Yaoqin Xie,
Jun Cheng,
Songhui Diao,
Shan Tan,
Xiaokun Liang
Abstract:
Denoising low-dose computed tomography (CT) images is a critical task in medical image computing. Supervised deep learning-based approaches have made significant advancements in this area in recent years. However, these methods typically require pairs of low-dose and normal-dose CT images for training, which are challenging to obtain in clinical settings. Existing unsupervised deep learning-based…
▽ More
Denoising low-dose computed tomography (CT) images is a critical task in medical image computing. Supervised deep learning-based approaches have made significant advancements in this area in recent years. However, these methods typically require pairs of low-dose and normal-dose CT images for training, which are challenging to obtain in clinical settings. Existing unsupervised deep learning-based methods often require training with a large number of low-dose CT images or rely on specially designed data acquisition processes to obtain training data. To address these limitations, we propose a novel unsupervised method that only utilizes normal-dose CT images during training, enabling zero-shot denoising of low-dose CT images. Our method leverages the diffusion model, a powerful generative model. We begin by training a cascaded unconditional diffusion model capable of generating high-quality normal-dose CT images from low-resolution to high-resolution. The cascaded architecture makes the training of high-resolution diffusion models more feasible. Subsequently, we introduce low-dose CT images into the reverse process of the diffusion model as likelihood, combined with the priors provided by the diffusion model and iteratively solve multiple maximum a posteriori (MAP) problems to achieve denoising. Additionally, we propose methods to adaptively adjust the coefficients that balance the likelihood and prior in MAP estimations, allowing for adaptation to different noise levels in low-dose CT images. We test our method on low-dose CT datasets of different regions with varying dose levels. The results demonstrate that our method outperforms the state-of-the-art unsupervised method and surpasses several supervised deep learning-based methods. Codes are available in https://github.com/DeepXuan/Dn-Dp.
△ Less
Submitted 13 July, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
RCP-RF: A Comprehensive Road-car-pedestrian Risk Management Framework based on Driving Risk Potential Field
Authors:
Shuhang Tan,
Zhiling Wang,
Yan Zhong
Abstract:
Recent years have witnessed the proliferation of traffic accidents, which led wide researches on Automated Vehicle (AV) technologies to reduce vehicle accidents, especially on risk assessment framework of AV technologies. However, existing time-based frameworks can not handle complex traffic scenarios and ignore the motion tendency influence of each moving objects on the risk distribution, leading…
▽ More
Recent years have witnessed the proliferation of traffic accidents, which led wide researches on Automated Vehicle (AV) technologies to reduce vehicle accidents, especially on risk assessment framework of AV technologies. However, existing time-based frameworks can not handle complex traffic scenarios and ignore the motion tendency influence of each moving objects on the risk distribution, leading to performance degradation. To address this problem, we novelly propose a comprehensive driving risk management framework named RCP-RF based on potential field theory under Connected and Automated Vehicles (CAV) environment, where the pedestrian risk metric are combined into a unified road-vehicle driving risk management framework. Different from existing algorithms, the motion tendency between ego and obstacle cars and the pedestrian factor are legitimately considered in the proposed framework, which can improve the performance of the driving risk model. Moreover, it requires only O(N 2) of time complexity in the proposed method. Empirical studies validate the superiority of our proposed framework against state-of-the-art methods on real-world dataset NGSIM and real AV platform.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
An Electromagnetic-Information-Theory Based Model for Efficient Characterization of MIMO Systems in Complex Space
Authors:
Ruifeng Li,
Da Li,
**yan Ma,
Zhaoyang Feng,
Ling Zhang,
Shurun Tan,
Wei E. I. Sha,
Hongsheng Chen,
Er-** Li
Abstract:
It is the pursuit of a multiple-input-multiple-output (MIMO) system to approach and even break the limit of channel capacity. However, it is always a big challenge to efficiently characterize the MIMO systems in complex space and get better propagation performance than the conventional MIMO systems considering only free space, which is important for guiding the power and phase allocation of antenn…
▽ More
It is the pursuit of a multiple-input-multiple-output (MIMO) system to approach and even break the limit of channel capacity. However, it is always a big challenge to efficiently characterize the MIMO systems in complex space and get better propagation performance than the conventional MIMO systems considering only free space, which is important for guiding the power and phase allocation of antenna units. In this manuscript, an Electromagnetic-Information-Theory (EMIT) based model is developed for efficient characterization of MIMO systems in complex space. The group-T-matrix-based multiple scattering fast algorithm, the mode-decomposition-based characterization method, and their joint theoretical framework in complex space are discussed. Firstly, key informatics parameters in free electromagnetic space based on a dyadic Green's function are derived. Next, a novel group-T-matrix-based multiple scattering fast algorithm is developed to describe a representative inhomogeneous electromagnetic space. All the analytical results are validated by simulations. In addition, the complete form of the EMIT-based model is proposed to derive the informatics parameters frequently used in electromagnetic propagation, through integrating the mode analysis method with the dyadic Green's function matrix. Finally, as a proof-or-concept, microwave anechoic chamber measurements of a cylindrical array is performed, demonstrating the effectiveness of the EMIT-based model. Meanwhile, a case of image transmission with limited power is presented to illustrate how to use this EMIT-based model to guide the power and phase allocation of antenna units for real MIMO applications.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Technology Trends for Massive MIMO towards 6G
Authors:
Yiming Huo,
Xingqin Lin,
Boya Di,
Hongliang Zhang,
Francisco Javier Lorca Hernando,
Ahmet Serdar Tan,
Shahid Mumtaz,
Özlem Tuğfe Demir,
Kun Chen-Hu
Abstract:
At the dawn of the next-generation wireless systems and networks, massive multiple-input multiple-output (MIMO) has been envisioned as one of the enabling technologies. With the continued success of being applied in the 5G and beyond, the massive MIMO technology has demonstrated its advantageousness, integrability, and extendibility. Moreover, several evolutionary features and revolutionizing tren…
▽ More
At the dawn of the next-generation wireless systems and networks, massive multiple-input multiple-output (MIMO) has been envisioned as one of the enabling technologies. With the continued success of being applied in the 5G and beyond, the massive MIMO technology has demonstrated its advantageousness, integrability, and extendibility. Moreover, several evolutionary features and revolutionizing trends for massive MIMO have gradually emerged in recent years, which are expected to reshape the future 6G wireless systems and networks. Specifically, the functions and performance of future massive MIMO systems will be enabled and enhanced via combining other innovative technologies, architectures, and strategies such as intelligent omni-surfaces (IOSs)/intelligent reflecting surfaces (IRSs), artificial intelligence (AI), THz communications, cell free architecture. Also, more diverse vertical applications based on massive MIMO will emerge and prosper, such as wireless localization and sensing, vehicular communications, non-terrestrial communications, remote sensing, inter-planetary communications.
△ Less
Submitted 5 January, 2023; v1 submitted 4 January, 2023;
originally announced January 2023.
-
A Remote Baby Surveillance System with RFID and GPS Tracking
Authors:
Ruven A/L Sundarajoo,
Gwo Chin Chung,
Wai Leong Pang,
Soo Fun Tan
Abstract:
In the 21st century, sending babies or children to daycare centres has become more and more common among young guardians. The balance between full-time work and child care is increasingly challenging nowadays. In Malaysia, thousands of child abuse cases have been reported from babysitting centres every year, which indeed triggers the anxiety and stress of the guardians. Hence, this paper proposes…
▽ More
In the 21st century, sending babies or children to daycare centres has become more and more common among young guardians. The balance between full-time work and child care is increasingly challenging nowadays. In Malaysia, thousands of child abuse cases have been reported from babysitting centres every year, which indeed triggers the anxiety and stress of the guardians. Hence, this paper proposes to construct a remote baby surveillance system with radio-frequency identification (RFID) and global positioning system (GPS) tracking. With the incorporation of the Internet of Things (IoT), a sensor-based microcontroller is used to detect the conditions of the baby as well as the surrounding environment and then display the real-time data as well as notifications to alert the guardians via a mobile application. These conditions include the crying and waking of the baby, as well as temperature, the mattress's wetness, and moving objects around the baby. In addition, RFID and GPS location tracking are implemented to ensure the safety of the baby, while white noise is used to increase the comfort of the baby. In the end, a prototype has been successfully developed for functionality and reliability testing. Several experiments have been conducted to measure the efficiency of the mattress's wetness detection, the RFID transmission range, the frequency spectrum of white noise, and also the output power of the solar panel. The proposed system is expected to assist guardians in ensuring the safety and comfort of their babies remotely, as well as prevent any occurrence of child abuse.
△ Less
Submitted 26 November, 2022;
originally announced November 2022.
-
Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead
Authors:
Piyush Behre,
Naveen Parihar,
Sharman Tan,
Amy Shah,
Eva Sharma,
Geoffrey Liu,
Shuangyu Chang,
Hosam Khalil,
Chris Basoglu,
Sayan Pathak
Abstract:
Segmentation for continuous Automatic Speech Recognition (ASR) has traditionally used silence timeouts or voice activity detectors (VADs), which are both limited to acoustic features. This segmentation is often overly aggressive, given that people naturally pause to think as they speak. Consequently, segmentation happens mid-sentence, hindering both punctuation and downstream tasks like machine tr…
▽ More
Segmentation for continuous Automatic Speech Recognition (ASR) has traditionally used silence timeouts or voice activity detectors (VADs), which are both limited to acoustic features. This segmentation is often overly aggressive, given that people naturally pause to think as they speak. Consequently, segmentation happens mid-sentence, hindering both punctuation and downstream tasks like machine translation for which high-quality segmentation is critical. Model-based segmentation methods that leverage acoustic features are powerful, but without an understanding of the language itself, these approaches are limited. We present a hybrid approach that leverages both acoustic and language information to improve segmentation. Furthermore, we show that including one word as a look-ahead boosts segmentation quality. On average, our models improve segmentation-F0.5 score by 9.8% over baseline. We show that this approach works for multiple languages. For the downstream task of machine translation, it improves the translation BLEU score by an average of 1.05 points.
△ Less
Submitted 27 October, 2022; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Integrated Sensing and Communication with mmWave Massive MIMO: A Compressed Sampling Perspective
Authors:
Zhen Gao,
Ziwei Wan,
Dezhi Zheng,
Shufeng Tan,
Christos Masouros,
Derrick Wing Kwan Ng,
Sheng Chen
Abstract:
Integrated sensing and communication (ISAC) has opened up numerous game-changing opportunities for realizing future wireless systems. In this paper, we propose an ISAC processing framework relying on millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems. Specifically, we provide a compressed sampling (CS) perspective to facilitate ISAC processing, which can not only recove…
▽ More
Integrated sensing and communication (ISAC) has opened up numerous game-changing opportunities for realizing future wireless systems. In this paper, we propose an ISAC processing framework relying on millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems. Specifically, we provide a compressed sampling (CS) perspective to facilitate ISAC processing, which can not only recover the high-dimensional channel state information or/and radar imaging information, but also significantly reduce pilot overhead. First, an energy-efficient widely spaced array (WSA) architecture is tailored for the radar receiver, which enhances the angular resolution of radar sensing at the cost of angular ambiguity. Then, we propose an ISAC frame structure for time-varying ISAC systems considering different timescales. The pilot waveforms are judiciously designed by taking into account both CS theories and hardware constraints induced by hybrid beamforming (HBF) architecture. Next, we design the dedicated dictionary for WSA that serves as a building block for formulating the ISAC processing as sparse signal recovery problems. The orthogonal matching pursuit with support refinement (OMP-SR) algorithm is proposed to effectively solve the problems in the existence of the angular ambiguity. We also provide a framework for estimating the Doppler frequencies during payload data transmission to guarantee communication performances. Simulation results demonstrate the good performances of both communications and radar sensing under the proposed ISAC framework.
△ Less
Submitted 9 September, 2022; v1 submitted 15 January, 2022;
originally announced January 2022.
-
MD Loss: Efficient Training of 3D Seismic Fault Segmentation Network under Sparse Labels by Weakening Anomaly Annotation
Authors:
Yimin Dou,
Kewen Li,
Jianbing Zhu,
Timing Li,
Shaoquan Tan,
Zongchao Huang
Abstract:
Data-driven fault detection has been regarded as a 3D image segmentation task. The models trained from synthetic data are difficult to generalize in some surveys. Recently, training 3D fault segmentation using sparse manual 2D slices is thought to yield promising results, but manual labeling has many false negative labels (abnormal annotations), which is detrimental to training and consequently to…
▽ More
Data-driven fault detection has been regarded as a 3D image segmentation task. The models trained from synthetic data are difficult to generalize in some surveys. Recently, training 3D fault segmentation using sparse manual 2D slices is thought to yield promising results, but manual labeling has many false negative labels (abnormal annotations), which is detrimental to training and consequently to detection performance. Motivated to train 3D fault segmentation networks under sparse 2D labels while suppressing false negative labels, we analyze the training process gradient and propose the Mask Dice (MD) loss. Moreover, the fault is an edge feature, and current encoder-decoder architectures widely used for fault detection (e.g., U-shape network) are not conducive to edge representation. Consequently, Fault-Net is proposed, which is designed for the characteristics of faults, employs high-resolution propagation features, and embeds MultiScale Compression Fusion block to fuse multi-scale information, which allows the edge information to be fully preserved during propagation and fusion, thus enabling advanced performance via few computational resources. Experimental demonstrates that MD loss supports the inclusion of human experience in training and suppresses false negative labels therein, enabling baseline models to improve performance and generalize to more surveys. Fault-Net is capable to provide a more stable and reliable interpretation of faults, it uses extremely low computational resources and inference is significantly faster than other models. Our method indicates optimal performance in comparison with several mainstream methods.
△ Less
Submitted 21 June, 2022; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Dynamic Response and Stability Margin Improvement of Wireless Power Receiver Systems via Right-Half-Plane Zero Elimination
Authors:
Kerui Li,
Siew-Chong Tan,
Ron Shu Yuen Hui
Abstract:
The series-series compensation topology is widely adopted in many wireless power transfer applications. For such systems, their wireless power receiver part typically involves a DC-DC converter with front-stage full-bridge diode rectifier, to process the high-frequency transmitted AC power into a DC output voltage for the load. It is recently reported that the current source nature of the series-s…
▽ More
The series-series compensation topology is widely adopted in many wireless power transfer applications. For such systems, their wireless power receiver part typically involves a DC-DC converter with front-stage full-bridge diode rectifier, to process the high-frequency transmitted AC power into a DC output voltage for the load. It is recently reported that the current source nature of the series-series compensation will introduce right-half-plane (RHP) zeros into the small-signal transfer functions of the DC-DC converter of the wireless power receiver, which will severely affect the stability and dynamic response of the system. To resolve this issue, in this paper, it is proposed to adopt a different rectifier configuration for the system such that the input current to the DC-DC converter becomes controllable to eliminate the presence of RHP zeros of the small-signal transfer functions of the system. This rectifier can be applied to different wireless power receivers using the buck, buck-boost, or boost converters. As compared with the original wireless power receivers, the modified ones feature minimum-phase characteristics and hence ease the design of compensator. Theoretical and experimental results are provided. The comparative experimental results verify the elimination of the RHP zero, improved dynamic responses of reference tracking and against load disturbances, and a larger stability margin.
△ Less
Submitted 17 April, 2021;
originally announced June 2021.
-
CodedStereo: Learned Phase Masks for Large Depth-of-field Stereo
Authors:
Shiyu Tan,
Yicheng Wu,
Shoou-I Yu,
Ashok Veeraraghavan
Abstract:
Conventional stereo suffers from a fundamental trade-off between imaging volume and signal-to-noise ratio (SNR) -- due to the conflicting impact of aperture size on both these variables. Inspired by the extended depth of field cameras, we propose a novel end-to-end learning-based technique to overcome this limitation, by introducing a phase mask at the aperture plane of the cameras in a stereo ima…
▽ More
Conventional stereo suffers from a fundamental trade-off between imaging volume and signal-to-noise ratio (SNR) -- due to the conflicting impact of aperture size on both these variables. Inspired by the extended depth of field cameras, we propose a novel end-to-end learning-based technique to overcome this limitation, by introducing a phase mask at the aperture plane of the cameras in a stereo imaging system. The phase mask creates a depth-dependent point spread function, allowing us to recover sharp image texture and stereo correspondence over a significantly extended depth of field (EDOF) than conventional stereo. The phase mask pattern, the EDOF image reconstruction, and the stereo disparity estimation are all trained together using an end-to-end learned deep neural network. We perform theoretical analysis and characterization of the proposed approach and show a 6x increase in volume that can be imaged in simulation. We also build an experimental prototype and validate the approach using real-world results acquired using this prototype system.
△ Less
Submitted 9 April, 2021;
originally announced April 2021.
-
On Effect of Right-Half-Plane Zero Present in Buck Converters with Input Current Source in Wireless Power Receiver Systems
Authors:
Kerui Li,
Siew-Chong Tan,
Ron Shu Yuen Hui
Abstract:
In wireless power receiver systems, the buck converter is widely used to step down the higher rectified voltage derived from the wireless receiver coil, to a lower output voltage for the immediate battery charging process. In this work, the presence and effect of the right-half-plane (RHP) zeros found in the small-signal inductor-current-to-duty-ratio and output-voltage-to-duty ratio transfer func…
▽ More
In wireless power receiver systems, the buck converter is widely used to step down the higher rectified voltage derived from the wireless receiver coil, to a lower output voltage for the immediate battery charging process. In this work, the presence and effect of the right-half-plane (RHP) zeros found in the small-signal inductor-current-to-duty-ratio and output-voltage-to-duty ratio transfer functions of the buck converter in the wireless power receiver system on the control performance, are investigated. It is found and mathematically proved that the RHP zeros are introduced by the current source nature of the system attributed to the series-series compensation and finite DC-link capacitance. The RHP zero not only results in non-monotonic open-loop dynamic response but also complicates the design of feedback control and causes potential closed-loop instability. Theoretical and experimental results are provided to validate the presence of the RHP zeros and their effect on open-loop and closed-loop dynamic responses.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
A Modular Approach for Synchronized Wireless Multimodal Multisensor Data Acquisition in Highly Dynamic Social Settings
Authors:
Chirag Raman,
Stephanie Tan,
Hayley Hung
Abstract:
Existing data acquisition literature for human behavior research provides wired solutions, mainly for controlled laboratory setups. In uncontrolled free-standing conversation settings, where participants are free to walk around, these solutions are unsuitable. While wireless solutions are employed in the broadcasting industry, they can be prohibitively expensive. In this work, we propose a modular…
▽ More
Existing data acquisition literature for human behavior research provides wired solutions, mainly for controlled laboratory setups. In uncontrolled free-standing conversation settings, where participants are free to walk around, these solutions are unsuitable. While wireless solutions are employed in the broadcasting industry, they can be prohibitively expensive. In this work, we propose a modular and cost-effective wireless approach for synchronized multisensor data acquisition of social human behavior. Our core idea involves a cost-accuracy trade-off by using Network Time Protocol (NTP) as a source reference for all sensors. While commonly used as a reference in ubiquitous computing, NTP is widely considered to be insufficiently accurate as a reference for video applications, where Precision Time Protocol (PTP) or Global Positioning System (GPS) based references are preferred. We argue and show, however, that the latency introduced by using NTP as a source reference is adequate for human behavior research, and the subsequent cost and modularity benefits are a desirable trade-off for applications in this domain. We also describe one instantiation of the approach deployed in a real-world experiment to demonstrate the practicality of our setup in-the-wild.
△ Less
Submitted 9 August, 2020;
originally announced August 2020.
-
Implementation of UAV Coordination Based on a Hierarchical Multi-UAV Simulation Platform
Authors:
Kun Xiao,
Lan Ma,
Shaochang Tan,
Yirui Cong,
Xiangke Wang
Abstract:
In this paper, a hierarchical multi-UAV simulation platform,called XTDrone, is designed for UAV swarms, which is completely open-source 4 . There are six layers in XTDrone: communication, simulator,low-level control, high-level control, coordination, and human interac-tion layers. XTDrone has three advantages. Firstly, the simulation speedcan be adjusted to match the computer performance, based on…
▽ More
In this paper, a hierarchical multi-UAV simulation platform,called XTDrone, is designed for UAV swarms, which is completely open-source 4 . There are six layers in XTDrone: communication, simulator,low-level control, high-level control, coordination, and human interac-tion layers. XTDrone has three advantages. Firstly, the simulation speedcan be adjusted to match the computer performance, based on the lock-step mode. Thus, the simulations can be conducted on a work stationor on a personal laptop, for different purposes. Secondly, a simplifiedsimulator is also developed which enables quick algorithm designing sothat the approximated behavior of UAV swarms can be observed inadvance. Thirdly, XTDrone is based on ROS, Gazebo, and PX4, andhence the codes in simulations can be easily transplanted to embeddedsystems. Note that XTDrone can support various types of multi-UAVmissions, and we provide two important demos in this paper: one is aground-station-based multi-UAV cooperative search, and the other is adistributed UAV formation flight, including consensus-based formationcontrol, task assignment, and obstacle avoidance.
△ Less
Submitted 30 May, 2022; v1 submitted 3 May, 2020;
originally announced May 2020.
-
On Beat Frequency Oscillation of Two-Stage Wireless Power Receivers
Authors:
Kerui Li,
Siew-Chong Tan,
Ron Shu Yuen Hui
Abstract:
Two-stage wireless power receivers, which typically include an AC-DC diode rectifier and a DC-DC regulator, are popular solutions in low-power wireless power transfer applications. However, the interaction between the rectifier and the regulator may introduce beat frequency oscillation on both the DC-link and output capacitors. In this paper, the cause of the beat frequency oscillation and its rel…
▽ More
Two-stage wireless power receivers, which typically include an AC-DC diode rectifier and a DC-DC regulator, are popular solutions in low-power wireless power transfer applications. However, the interaction between the rectifier and the regulator may introduce beat frequency oscillation on both the DC-link and output capacitors. In this paper, the cause of the beat frequency oscillation and its related issues are investigated with the corresponding design solution on alleviating the oscillation discussed. Theoretical and experimental results verifying the presence of beat frequency oscillation in the two-stage wireless receiver system are provided. Our study shows that the beat frequency oscillation can be significantly alleviated if appropriate design solutions are applied.
△ Less
Submitted 5 October, 2020; v1 submitted 28 April, 2020;
originally announced April 2020.
-
EM-GAN: Fast Stress Analysis for Multi-Segment Interconnect Using Generative Adversarial Networks
Authors:
Wentian **,
Sheriff Sadiqbatcha,
**wei Zhang,
Sheldon X. -D. Tan
Abstract:
In this paper, we propose a fast transient hydrostatic stress analysis for electromigration (EM) failure assessment for multi-segment interconnects using generative adversarial networks (GANs). Our work leverages the image synthesis feature of GAN-based generative deep neural networks. The stress evaluation of multi-segment interconnects, modeled by partial differential equations, can be viewed as…
▽ More
In this paper, we propose a fast transient hydrostatic stress analysis for electromigration (EM) failure assessment for multi-segment interconnects using generative adversarial networks (GANs). Our work leverages the image synthesis feature of GAN-based generative deep neural networks. The stress evaluation of multi-segment interconnects, modeled by partial differential equations, can be viewed as time-varying 2D-images-to-image problem where the input is the multi-segment interconnects topology with current densities and the output is the EM stress distribution in those wire segments at the given aging time. Based on this observation, we train conditional GAN model using the images of many self-generated multi-segment wires and wire current densities and aging time (as conditions) against the COMSOL simulation results. Different hyperparameters of GAN were studied and compared. The proposed algorithm, called {\it EM-GAN}, can quickly give accurate stress distribution of a general multi-segment wire tree for a given aging time, which is important for full-chip fast EM failure assessment. Our experimental results show that the EM-GAN shows 6.6\% averaged error compared to COMSOL simulation results with orders of magnitude speedup. It also delivers 8.3X speedup over state-of-the-art analytic based EM analysis solver.
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
Highly-Efficient Single-Switch-Regulated Resonant Wireless Power Receiver with Hybrid Modulation
Authors:
Kerui Li,
Albert Ting Leung Lee,
Siew-Chong Tan,
Ron Shu Yuen Hui
Abstract:
In this paper, a highly-efficient single-switch-regulated resonant wireless power receiver with hybrid modulation is proposed. To achieve both high efficiency and good output voltage regulation, phase shift and pulse width hybrid modulation are simultaneously applied. The soft switching operation in this topology is achieved by the cycle-by-cycle phase shift adjustment between the input current an…
▽ More
In this paper, a highly-efficient single-switch-regulated resonant wireless power receiver with hybrid modulation is proposed. To achieve both high efficiency and good output voltage regulation, phase shift and pulse width hybrid modulation are simultaneously applied. The soft switching operation in this topology is achieved by the cycle-by-cycle phase shift adjustment between the input current and the gate drive signal and also attributed to the reactive components such as the series-compensated secondary coil and the parasitic capacitor of the active switch . The soft switching operation also leads to high efficiency and low EMI. By adjusting the duty ratio of the switch, tight regulation of the output voltage can be attained. The steady-state and dynamic models of the resonant receiver with hybrid modulation are analytically derived in order to properly design the feedback controller. An experimental setup of a two-coil wireless power transfer system, including the hardware prototype of the proposed receiver, is constructed for experimental verification. The experimental results show the effectiveness of the soft-switching operation in the receiver with high efficiency while maintaining good regulation of the output voltage, regardless of line and load variations.
△ Less
Submitted 5 January, 2021; v1 submitted 9 April, 2020;
originally announced April 2020.
-
Regional Registration of Whole Slide Image Stacks Containing Highly Deformed Artefacts
Authors:
Mahsa Paknezhad,
Sheng Yang Michael Loh,
Yukti Choudhury,
Valerie Koh Cui Koh,
TimothyTay Kwang Yong,
Hui Shan Tan,
Ravindran Kanesvaran,
Puay Hoon Tan,
John Yuen Shyi Peng,
Weimiao Yu,
Yongcheng Benjamin Tan,
Yong Zhen Loy,
Min-Han Tan,
Hwee Kuan Lee
Abstract:
Motivation: High resolution 2D whole slide imaging provides rich information about the tissue structure. This information can be a lot richer if these 2D images can be stacked into a 3D tissue volume. A 3D analysis, however, requires accurate reconstruction of the tissue volume from the 2D image stack. This task is not trivial due to the distortions that each individual tissue slice experiences wh…
▽ More
Motivation: High resolution 2D whole slide imaging provides rich information about the tissue structure. This information can be a lot richer if these 2D images can be stacked into a 3D tissue volume. A 3D analysis, however, requires accurate reconstruction of the tissue volume from the 2D image stack. This task is not trivial due to the distortions that each individual tissue slice experiences while cutting and mounting the tissue on the glass slide. Performing registration for the whole tissue slices may be adversely affected by the deformed tissue regions. Consequently, regional registration is found to be more effective. In this paper, we propose an accurate and robust regional registration algorithm for whole slide images which incrementally focuses registration on the area around the region of interest. Results: Using mean similarity index as the metric, the proposed algorithm (mean $\pm$ std: $0.84 \pm 0.11$) followed by a fine registration algorithm ($0.86 \pm 0.08$) outperformed the state-of-the-art linear whole tissue registration algorithm ($0.74 \pm 0.19$) and the regional version of this algorithm ($0.81 \pm 0.15$). The proposed algorithm also outperforms the state-of-the-art nonlinear registration algorithm (original : $0.82 \pm 0.12$, regional : $0.77 \pm 0.22$) for whole slide images and a recently proposed patch-based registration algorithm (patch size 256: $0.79 \pm 0.16$ , patch size 512: $0.77 \pm 0.16$) for medical images. Availability: The C++ implementation code is available online at the github repository: https://github.com/MahsaPaknezhad/WSIRegistration
△ Less
Submitted 28 February, 2020;
originally announced February 2020.
-
Single-Switch-Regulated Resonant WPT Receiver
Authors:
Kerui Li,
Siew Chong Tan,
Ron Shu Yuen Hui
Abstract:
A single-switch-regulated wireless power transfer (WPT) receiver is presented in this letter. Aiming at low-cost applications, the system involves only a single-switch class-E resonant rectifier, a frequency synchronization circuit, and a microcontroller. The number of power semiconductor devices required in this circuit is minimal. Only one active switch is used and no diode is required. As a sin…
▽ More
A single-switch-regulated wireless power transfer (WPT) receiver is presented in this letter. Aiming at low-cost applications, the system involves only a single-switch class-E resonant rectifier, a frequency synchronization circuit, and a microcontroller. The number of power semiconductor devices required in this circuit is minimal. Only one active switch is used and no diode is required. As a single-switch solution, this simplifies circuit implementation, improves reliability, and lowers hardware cost. The single-switch resonant rectifier provides a relatively constant quasi-sinusoidal voltage waveform to pick up the wireless power from the receiver coil. Due to the resonant nature of the rectifier, ZVS turn on and turn off are achieved. The steady-state analysis and discussions on the component sizing and the control design are provided. A prototype is built and experimental works are performed to verify the features.
△ Less
Submitted 18 December, 2019; v1 submitted 12 December, 2019;
originally announced December 2019.
-
Single-Stage Regulated Resonant WPT Receiver with Low Input Harmonic Distortion
Authors:
Kerui Li,
Siew Chong Tan,
Ron Shu Yuen Hui
Abstract:
Resonant rectifier topologies would be a promising candidate for achieving simple, compact, and reliable single-stage wireless power transfer (WPT) receiver if not for the lack of good DC regulation capability. This paper investigates the problems that prevent the feasibility of single-stage DC regulation in resonant rectifier topologies. A possible solution is the proposed differential resonant r…
▽ More
Resonant rectifier topologies would be a promising candidate for achieving simple, compact, and reliable single-stage wireless power transfer (WPT) receiver if not for the lack of good DC regulation capability. This paper investigates the problems that prevent the feasibility of single-stage DC regulation in resonant rectifier topologies. A possible solution is the proposed differential resonant rectifier topology, of which the rectifier is designed to have a relatively constant AC voltage, and that phase shift control is used to achieve relatively good output regulation. Design considerations on the reactive component sizing, magnetic component design, frequency and phase synchronization, small signal modelling, and closed-loop feedback control design, are discussed. Experimental results verified that the proposed WPT receiver system can achieve single-stage AC rectification and DC regulation while attaining the key features of low harmonic distortion in its AC output voltage, continuous DC current, and zero-voltage-switching (ZVS) operation over a wide operating range.
△ Less
Submitted 6 January, 2020; v1 submitted 12 December, 2019;
originally announced December 2019.
-
Light-weight Calibrator: a Separable Component for Unsupervised Domain Adaptation
Authors:
Shaokai Ye,
Kailu Wu,
Mu Zhou,
Yunfei Yang,
Sia huat Tan,
Kaidi Xu,
Jiebo Song,
Chenglong Bao,
Kaisheng Ma
Abstract:
Existing domain adaptation methods aim at learning features that can be generalized among domains. These methods commonly require to update source classifier to adapt to the target domain and do not properly handle the trade off between the source domain and the target domain. In this work, instead of training a classifier to adapt to the target domain, we use a separable component called data cal…
▽ More
Existing domain adaptation methods aim at learning features that can be generalized among domains. These methods commonly require to update source classifier to adapt to the target domain and do not properly handle the trade off between the source domain and the target domain. In this work, instead of training a classifier to adapt to the target domain, we use a separable component called data calibrator to help the fixed source classifier recover discrimination power in the target domain, while preserving the source domain's performance. When the difference between two domains is small, the source classifier's representation is sufficient to perform well in the target domain and outperforms GAN-based methods in digits. Otherwise, the proposed method can leverage synthetic images generated by GANs to boost performance and achieve state-of-the-art performance in digits datasets and driving scene semantic segmentation. Our method empirically reveals that certain intriguing hints, which can be mitigated by adversarial attack to domain discriminators, are one of the sources for performance degradation under the domain shift.
△ Less
Submitted 28 February, 2020; v1 submitted 28 November, 2019;
originally announced November 2019.
-
CALPA-NET: Channel-pruning-assisted Deep Residual Network for Steganalysis of Digital Images
Authors:
Shunquan Tan,
Weilong Wu,
Zilong Shao,
Qiushi Li,
Bin Li,
Jiwu Huang
Abstract:
Over the past few years, detection performance improvements of deep-learning based steganalyzers have been usually achieved through structure expansion. However, excessive expanded structure results in huge computational cost, storage overheads, and consequently difficulty in training and deployment. In this paper we propose CALPA-NET, a ChAnneL-Pruning-Assisted deep residual network architecture…
▽ More
Over the past few years, detection performance improvements of deep-learning based steganalyzers have been usually achieved through structure expansion. However, excessive expanded structure results in huge computational cost, storage overheads, and consequently difficulty in training and deployment. In this paper we propose CALPA-NET, a ChAnneL-Pruning-Assisted deep residual network architecture search approach to shrink the network structure of existing vast, over-parameterized deep-learning based steganalyzers. We observe that the broad inverted-pyramid structure of existing deep-learning based steganalyzers might contradict the well-established model diversity oriented philosophy, and therefore is not suitable for steganalysis. Then a hybrid criterion combined with two network pruning schemes is introduced to adaptively shrink every involved convolutional layer in a data-driven manner. The resulting network architecture presents a slender bottleneck-like structure. We have conducted extensive experiments on BOSSBase+BOWS2 dataset, more diverse ALASKA dataset and even a large-scale subset extracted from ImageNet CLS-LOC dataset. The experimental results show that the model structure generated by our proposed CALPA-NET can achieve comparative performance with less than two percent of parameters and about one third FLOPs compared to the original steganalytic model. The new model possesses even better adaptivity, transferability, and scalability.
△ Less
Submitted 23 June, 2020; v1 submitted 11 November, 2019;
originally announced November 2019.
-
Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery
Authors:
Shawn Tan,
Guillaume Androz,
Ahmad Chamseddine,
Pierre Fecteau,
Aaron Courville,
Yoshua Bengio,
Joseph Paul Cohen
Abstract:
We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats. Our goal is to enable semi-supervised ECG models to be made as well as to discover unknown subtypes of arrhythmia and anomalous ECG signal events. To this end, we propose an unsupervised representation learning task, evaluated in a semi-super…
▽ More
We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats. Our goal is to enable semi-supervised ECG models to be made as well as to discover unknown subtypes of arrhythmia and anomalous ECG signal events. To this end, we propose an unsupervised representation learning task, evaluated in a semi-supervised fashion. We provide a set of baselines for different feature extractors that can be built upon. Additionally, we perform qualitative evaluations on results from PCA embeddings, where we identify some clustering of known subtypes indicating the potential for representation learning in arrhythmia sub-type discovery.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Multi-Objective Optimization for Drone Delivery
Authors:
Suttinee Sawadsitang,
Dusit Niyato,
Puay Siew Tan,
Sarana Nutanong
Abstract:
Recently, an unmanned aerial vehicle (UAV), as known as drone, has become an alternative means of package delivery. Although the drone delivery scheduling has been studied in recent years, most existing models are formulated as a single objective optimization problem. However, in practice, the drone delivery scheduling has multiple objectives that the shipper has to achieve. Moreover, drone delive…
▽ More
Recently, an unmanned aerial vehicle (UAV), as known as drone, has become an alternative means of package delivery. Although the drone delivery scheduling has been studied in recent years, most existing models are formulated as a single objective optimization problem. However, in practice, the drone delivery scheduling has multiple objectives that the shipper has to achieve. Moreover, drone delivery typically faces with unexpected events, e.g., breakdown or unable to takeoff, that can significantly affect the scheduling problem. Therefore, in this paper, we propose a multi-objective and three-stage stochastic optimization model for the drone delivery scheduling, called multi-objective optimization for drone delivery (MODD) system. To handle the the multi-objective optimization in the MODD system, we apply $\varepsilon$-constraint method. The performance evaluation is performed by using a real dataset from Singapore delivery services.
△ Less
Submitted 24 July, 2019;
originally announced August 2019.