-
MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome
Authors:
Yixin Huang,
Yiqi **,
Ke Tao,
Kaijian Xia,
Jianfeng Gu,
Lei Yu,
Lan Du,
Cunjian Chen
Abstract:
May-Thurner Syndrome (MTS), also known as iliac vein compression syndrome or Cockett's syndrome, is a condition potentially impacting over 20 percent of the population, leading to an increased risk of iliofemoral deep venous thrombosis. In this paper, we present a 3D-based deep learning approach called MTS-Net for diagnosing May-Thurner Syndrome using CT scans. To effectively capture the spatial-t…
▽ More
May-Thurner Syndrome (MTS), also known as iliac vein compression syndrome or Cockett's syndrome, is a condition potentially impacting over 20 percent of the population, leading to an increased risk of iliofemoral deep venous thrombosis. In this paper, we present a 3D-based deep learning approach called MTS-Net for diagnosing May-Thurner Syndrome using CT scans. To effectively capture the spatial-temporal relationship among CT scans and emulate the clinical process of diagnosing MTS, we propose a novel attention module called the dual-enhanced positional multi-head self-attention (DEP-MHSA). The proposed DEP-MHSA reconsiders the role of positional embedding and incorporates a dual-enhanced positional embedding in both attention weights and residual connections. Further, we establish a new dataset, termed MTS-CT, consisting of 747 subjects. Experimental results demonstrate that our proposed approach achieves state-of-the-art MTS diagnosis results, and our self-attention design facilitates the spatial-temporal modeling. We believe that our DEP-MHSA is more suitable to handle CT image sequence modeling and the proposed dataset enables future research on MTS diagnosis. We make our code and dataset publicly available at: https://github.com/Nutingnon/MTS_dep_mhsa.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results
Authors:
Xin Li,
Kun Yuan,
Ya**g Pei,
Yiting Lu,
Ming Sun,
Chao Zhou,
Zhibo Chen,
Radu Timofte,
Wei Sun,
Haoning Wu,
Zicheng Zhang,
Jun Jia,
Zhichao Zhang,
Linhan Cao,
Qiubo Chen,
Xiongkuo Min,
Weisi Lin,
Guangtao Zhai,
Jianhui Sun,
Tianyi Wang,
Lei Li,
Han Kong,
Wenxuan Wang,
Bing Li,
Cheng Luo
, et al. (43 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The…
▽ More
This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Accel-NASBench: Sustainable Benchmarking for Accelerator-Aware NAS
Authors:
Afzal Ahmad,
Linfeng Du,
Zhiyao Xie,
Wei Zhang
Abstract:
One of the primary challenges impeding the progress of Neural Architecture Search (NAS) is its extensive reliance on exorbitant computational resources. NAS benchmarks aim to simulate runs of NAS experiments at zero cost, remediating the need for extensive compute. However, existing NAS benchmarks use synthetic datasets and model proxies that make simplified assumptions about the characteristics o…
▽ More
One of the primary challenges impeding the progress of Neural Architecture Search (NAS) is its extensive reliance on exorbitant computational resources. NAS benchmarks aim to simulate runs of NAS experiments at zero cost, remediating the need for extensive compute. However, existing NAS benchmarks use synthetic datasets and model proxies that make simplified assumptions about the characteristics of these datasets and models, leading to unrealistic evaluations. We present a technique that allows searching for training proxies that reduce the cost of benchmark construction by significant margins, making it possible to construct realistic NAS benchmarks for large-scale datasets. Using this technique, we construct an open-source bi-objective NAS benchmark for the ImageNet2012 dataset combined with the on-device performance of accelerators, including GPUs, TPUs, and FPGAs. Through extensive experimentation with various NAS optimizers and hardware platforms, we show that the benchmark is accurate and allows searching for state-of-the-art hardware-aware models at zero cost.
△ Less
Submitted 18 June, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
FFPN: Fourier Feature Pyramid Network for Ultrasound Image Segmentation
Authors:
Chaoyu Chen,
Xin Yang,
Rusi Chen,
Junxuan Yu,
Liwei Du,
Jian Wang,
Xindi Hu,
Yan Cao,
Yingying Liu,
Dong Ni
Abstract:
Ultrasound (US) image segmentation is an active research area that requires real-time and highly accurate analysis in many scenarios. The detect-to-segment (DTS) frameworks have been recently proposed to balance accuracy and efficiency. However, existing approaches may suffer from inadequate contour encoding or fail to effectively leverage the encoded results. In this paper, we introduce a novel F…
▽ More
Ultrasound (US) image segmentation is an active research area that requires real-time and highly accurate analysis in many scenarios. The detect-to-segment (DTS) frameworks have been recently proposed to balance accuracy and efficiency. However, existing approaches may suffer from inadequate contour encoding or fail to effectively leverage the encoded results. In this paper, we introduce a novel Fourier-anchor-based DTS framework called Fourier Feature Pyramid Network (FFPN) to address the aforementioned issues. The contributions of this paper are two fold. First, the FFPN utilizes Fourier Descriptors to adequately encode contours. Specifically, it maps Fourier series with similar amplitudes and frequencies into the same layer of the feature map, thereby effectively utilizing the encoded Fourier information. Second, we propose a Contour Sampling Refinement (CSR) module based on the contour proposals and refined features produced by the FFPN. This module extracts rich features around the predicted contours to further capture detailed information and refine the contours. Extensive experimental results on three large and challenging datasets demonstrate that our method outperforms other DTS methods in terms of accuracy and efficiency. Furthermore, our framework can generalize well to other detection or segmentation tasks.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
Bilateral boundary control of an input delayed 2-D reaction-diffusion equation
Authors:
Dandan Guan,
Yanmei Chen,
Jie Qi,
Linglong Du
Abstract:
In this paper, a delay compensation design method based on PDE backstep** is developed for a two-dimensional reaction-diffusion partial differential equation (PDE) with bilateral input delays. The PDE is defined in a rectangular domain, and the bilateral control is imposed on a pair of opposite sides of the rectangle. To represent the delayed bilateral inputs, we introduce two 2-D transport PDEs…
▽ More
In this paper, a delay compensation design method based on PDE backstep** is developed for a two-dimensional reaction-diffusion partial differential equation (PDE) with bilateral input delays. The PDE is defined in a rectangular domain, and the bilateral control is imposed on a pair of opposite sides of the rectangle. To represent the delayed bilateral inputs, we introduce two 2-D transport PDEs that form a cascade system with the original PDE. A novel set of backstep** transformations is proposed for delay compensator design, including one Volterra integral transformation and two affine Volterra integral transformations. Unlike the kernel equation for 1-D PDE systems with delayed boundary input, the resulting kernel equations for the 2-D system have singular initial conditions governed by the Dirac Delta function. Consequently, the kernel solutions are written as a double trigonometric series with singularities. To address the challenge of stability analysis posed by the singularities, we prove a set of inequalities by using the Cauchy-Schwarz inequality, the 2-D Fourier series, and the Parseval's theorem. A numerical simulation illustrates the effectiveness of the proposed delay-compensation method.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Continuous and Noninvasive Measurement of Arterial Pulse Pressure and Pressure Waveform using an Image-free Ultrasound System
Authors:
Lirui Xu,
Pang Wu,
Pan Xia,
Fanglin Geng,
Peng Wang,
Xianxiang Chen,
Zhenfeng Li,
Lidong Du,
Shu** Liu,
Li Li,
Hongbo Chang,
Zhen Fang
Abstract:
The local beat-to-beat local pulse pressure (PP) and blood pressure waveform of arteries, especially central arteries, are important indicators of the course of cardiovascular diseases (CVDs). Nevertheless, noninvasive measurement of them remains a challenge in the clinic. This work presents a three-element image-free ultrasound system with a low-computational method for real-time measurement of l…
▽ More
The local beat-to-beat local pulse pressure (PP) and blood pressure waveform of arteries, especially central arteries, are important indicators of the course of cardiovascular diseases (CVDs). Nevertheless, noninvasive measurement of them remains a challenge in the clinic. This work presents a three-element image-free ultrasound system with a low-computational method for real-time measurement of local pulse wave velocity (PWV) and diameter waveforms, enabling real-time and noninvasive continuous PP and blood pressure waveforms measurement without calibration. The developed system has been well-validated in vitro and in vivo. In in vitro cardiovascular phantom experiments, the results demonstrated high accuracy in the measurement of PP (error < 3 mmHg) and blood pressure waveform (root-mean-square-errors (RMSE) < 2 mmHg, correlation coefficient (r) > textgreater 0.99). In subsequent human carotid experiments, the system was compared with an arterial tonometer, which showed excellent PP accuracy (mean absolute error (MAE) = 3.7 +- 3.4 mmHg) and pressure waveform similarity (RMSE = 3.7 +- 1.6 mmHg, r = 0.98 +- 0.01). Furthermore, comparative experiments with the volume clamp device demonstrated the system's ability to accurately trace blood pressure changes (induced by deep breathing) over a period of one minute, with the MAE of DBP, MAP, and SBP within 5 +- 8 mmHg. The present results demonstrate the accuracy and reliability of the developed system for continuous and noninvasive measurement of arterial PP and blood pressure waveform measurements, with potential applications in the diagnosis and prevention of CVDs.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing
Authors:
Weidong Chen,
Xiaofen Xing,
Xiangmin Xu,
Jianxin Pang,
Lan Du
Abstract:
Paralinguistic speech processing is important in addressing many issues, such as sentiment and neurocognitive disorder analyses. Recently, Transformer has achieved remarkable success in the natural language processing field and has demonstrated its adaptation to speech. However, previous works on Transformer in the speech field have not incorporated the properties of speech, leaving the full poten…
▽ More
Paralinguistic speech processing is important in addressing many issues, such as sentiment and neurocognitive disorder analyses. Recently, Transformer has achieved remarkable success in the natural language processing field and has demonstrated its adaptation to speech. However, previous works on Transformer in the speech field have not incorporated the properties of speech, leaving the full potential of Transformer unexplored. In this paper, we consider the characteristics of speech and propose a general structure-based framework, called SpeechFormer++, for paralinguistic speech processing. More concretely, following the component relationship in the speech signal, we design a unit encoder to model the intra- and inter-unit information (i.e., frames, phones, and words) efficiently. According to the hierarchical relationship, we utilize merging blocks to generate features at different granularities, which is consistent with the structural pattern in the speech signal. Moreover, a word encoder is introduced to integrate word-grained features into each unit encoder, which effectively balances fine-grained and coarse-grained information. SpeechFormer++ is evaluated on the speech emotion recognition (IEMOCAP & MELD), depression classification (DAIC-WOZ) and Alzheimer's disease detection (Pitt) tasks. The results show that SpeechFormer++ outperforms the standard Transformer while greatly reducing the computational cost. Furthermore, it delivers superior results compared to the state-of-the-art approaches.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
DST: Deformable Speech Transformer for Emotion Recognition
Authors:
Weidong Chen,
Xiaofen Xing,
Xiangmin Xu,
Jianxin Pang,
Lan Du
Abstract:
Enabled by multi-head self-attention, Transformer has exhibited remarkable results in speech emotion recognition (SER). Compared to the original full attention mechanism, window-based attention is more effective in learning fine-grained features while greatly reducing model redundancy. However, emotional cues are present in a multi-granularity manner such that the pre-defined fixed window can seve…
▽ More
Enabled by multi-head self-attention, Transformer has exhibited remarkable results in speech emotion recognition (SER). Compared to the original full attention mechanism, window-based attention is more effective in learning fine-grained features while greatly reducing model redundancy. However, emotional cues are present in a multi-granularity manner such that the pre-defined fixed window can severely degrade the model flexibility. In addition, it is difficult to obtain the optimal window settings manually. In this paper, we propose a Deformable Speech Transformer, named DST, for SER task. DST determines the usage of window sizes conditioned on input speech via a light-weight decision network. Meanwhile, data-dependent offsets derived from acoustic features are utilized to adjust the positions of the attention windows, allowing DST to adaptively discover and attend to the valuable information embedded in the speech. Extensive experiments on IEMOCAP and MELD demonstrate the superiority of DST.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
An efficient deep neural network to find small objects in large 3D images
Authors:
Jungkyu Park,
Jakub Chłędowski,
Stanisław Jastrzębski,
Jan Witowski,
Yanqi Xu,
Linda Du,
Sushma Gaddam,
Eric Kim,
Alana Lewin,
Ujas Parikh,
Anastasia Plaunova,
Sardius Chen,
Alexandra Millet,
James Park,
Kristine Pysarenko,
Shalin Patel,
Julia Goldberg,
Melanie Wegener,
Linda Moy,
Laura Heacock,
Beatriu Reig,
Krzysztof J. Geras
Abstract:
3D imaging enables accurate diagnosis by providing spatial information about organ anatomy. However, using 3D images to train AI models is computationally challenging because they consist of 10x or 100x more pixels than their 2D counterparts. To be trained with high-resolution 3D images, convolutional neural networks resort to downsampling them or projecting them to 2D. We propose an effective alt…
▽ More
3D imaging enables accurate diagnosis by providing spatial information about organ anatomy. However, using 3D images to train AI models is computationally challenging because they consist of 10x or 100x more pixels than their 2D counterparts. To be trained with high-resolution 3D images, convolutional neural networks resort to downsampling them or projecting them to 2D. We propose an effective alternative, a neural network that enables efficient classification of full-resolution 3D medical images. Compared to off-the-shelf convolutional neural networks, our network, 3D Globally-Aware Multiple Instance Classifier (3D-GMIC), uses 77.98%-90.05% less GPU memory and 91.23%-96.02% less computation. While it is trained only with image-level labels, without segmentation labels, it explains its predictions by providing pixel-level saliency maps. On a dataset collected at NYU Langone Health, including 85,526 patients with full-field 2D mammography (FFDM), synthetic 2D mammography, and 3D mammography, 3D-GMIC achieves an AUC of 0.831 (95% CI: 0.769-0.887) in classifying breasts with malignant findings using 3D mammography. This is comparable to the performance of GMIC on FFDM (0.816, 95% CI: 0.737-0.878) and synthetic 2D (0.826, 95% CI: 0.754-0.884), which demonstrates that 3D-GMIC successfully classified large 3D images despite focusing computation on a smaller percentage of its input compared to GMIC. Therefore, 3D-GMIC identifies and utilizes extremely small regions of interest from 3D images consisting of hundreds of millions of pixels, dramatically reducing associated computational challenges. 3D-GMIC generalizes well to BCS-DBT, an external dataset from Duke University Hospital, achieving an AUC of 0.848 (95% CI: 0.798-0.896).
△ Less
Submitted 26 February, 2023; v1 submitted 16 October, 2022;
originally announced October 2022.
-
Jamming Modulation: An Active Anti-Jamming Scheme
Authors:
Jianhui Ma,
Qiang Li,
Zilong Liu,
Linsong Du,
Hongyang Chen,
Nirwan Ansari
Abstract:
Providing quality communications under adversarial electronic attacks, e.g., broadband jamming attacks, is a challenging task. Unlike state-of-the-art approaches which treat jamming signals as destructive interference, this paper presents a novel active anti-jamming (AAJ) scheme for a jammed channel to enhance the communication quality between a transmitter node (TN) and receiver node (RN), where…
▽ More
Providing quality communications under adversarial electronic attacks, e.g., broadband jamming attacks, is a challenging task. Unlike state-of-the-art approaches which treat jamming signals as destructive interference, this paper presents a novel active anti-jamming (AAJ) scheme for a jammed channel to enhance the communication quality between a transmitter node (TN) and receiver node (RN), where the TN actively exploits the jamming signal as a carrier to send messages. Specifically, the TN is equipped with a programmable-gain amplifier, which is capable of re-modulating the jamming signals for jamming modulation. Considering four typical jamming types, we derive both the bit error rates (BER) and the corresponding optimal detection thresholds of the AAJ scheme. The asymptotic performances of the AAJ scheme are discussed under the high jamming-to-noise ratio (JNR) and sampling rate cases. Our analysis shows that there exists a BER floor for sufficiently large JNR. Simulation results indicate that the proposed AAJ scheme allows the TN to communicate with the RN reliably even under extremely strong and/or broadband jamming. Additionally, we investigate the channel capacity of the proposed AAJ scheme and show that the channel capacity of the AAJ scheme outperforms that of the direct transmission when the JNR is relatively high.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
Deep Motion Network for Freehand 3D Ultrasound Reconstruction
Authors:
Mingyuan Luo,
Xin Yang,
Hongzhang Wang,
Liwei Du,
Dong Ni
Abstract:
Freehand 3D ultrasound (US) has important clinical value due to its low cost and unrestricted field of view. Recently deep learning algorithms have removed its dependence on bulky and expensive external positioning devices. However, improving reconstruction accuracy is still hampered by difficult elevational displacement estimation and large cumulative drift. In this context, we propose a novel de…
▽ More
Freehand 3D ultrasound (US) has important clinical value due to its low cost and unrestricted field of view. Recently deep learning algorithms have removed its dependence on bulky and expensive external positioning devices. However, improving reconstruction accuracy is still hampered by difficult elevational displacement estimation and large cumulative drift. In this context, we propose a novel deep motion network (MoNet) that integrates images and a lightweight sensor known as the inertial measurement unit (IMU) from a velocity perspective to alleviate the obstacles mentioned above. Our contribution is two-fold. First, we introduce IMU acceleration for the first time to estimate elevational displacements outside the plane. We propose a temporal and multi-branch structure to mine the valuable information of low signal-to-noise ratio (SNR) acceleration. Second, we propose a multi-modal online self-supervised strategy that leverages IMU information as weak labels for adaptive optimization to reduce drift errors and further ameliorate the impacts of acceleration noise. Experiments show that our proposed method achieves the superior reconstruction performance, exceeding state-of-the-art methods across the board.
△ Less
Submitted 30 June, 2022;
originally announced July 2022.
-
SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech
Authors:
Weidong Chen,
Xiaofen Xing,
Xiangmin Xu,
Jianxin Pang,
Lan Du
Abstract:
Transformer has obtained promising results on cognitive speech signal processing field, which is of interest in various applications ranging from emotion to neurocognitive disorder analysis. However, most works treat speech signal as a whole, leading to the neglect of the pronunciation structure that is unique to speech and reflects the cognitive process. Meanwhile, Transformer has heavy computati…
▽ More
Transformer has obtained promising results on cognitive speech signal processing field, which is of interest in various applications ranging from emotion to neurocognitive disorder analysis. However, most works treat speech signal as a whole, leading to the neglect of the pronunciation structure that is unique to speech and reflects the cognitive process. Meanwhile, Transformer has heavy computational burden due to its full attention operation. In this paper, a hierarchical efficient framework, called SpeechFormer, which considers the structural characteristics of speech, is proposed and can be served as a general-purpose backbone for cognitive speech signal processing. The proposed SpeechFormer consists of frame, phoneme, word and utterance stages in succession, each performing a neighboring attention according to the structural pattern of speech with high computational efficiency. SpeechFormer is evaluated on speech emotion recognition (IEMOCAP & MELD) and neurocognitive disorder detection (Pitt & DAIC-WOZ) tasks, and the results show that SpeechFormer outperforms the standard Transformer-based framework while greatly reducing the computational cost. Furthermore, our SpeechFormer achieves comparable results to the state-of-the-art approaches.
△ Less
Submitted 9 March, 2022; v1 submitted 7 March, 2022;
originally announced March 2022.
-
Inferring Network Structure with Unobservable Nodes from Time Series Data
Authors:
Mengyuan Chen,
Yan Zhang,
Zhang Zhang,
Lun Du,
Jiang Zhang
Abstract:
Network structures play important roles in social, technological and biological systems. However, the observable nodes and connections in real cases are often incomplete or unavailable due to measurement errors, private protection issues, or other problems. Therefore, inferring the complete network structure is useful for understanding human interactions and complex dynamics. The existing studies…
▽ More
Network structures play important roles in social, technological and biological systems. However, the observable nodes and connections in real cases are often incomplete or unavailable due to measurement errors, private protection issues, or other problems. Therefore, inferring the complete network structure is useful for understanding human interactions and complex dynamics. The existing studies have not fully solved the problem of inferring network structure with partial information about connections or nodes. In this paper, we tackle the problem by utilizing time-series data generated by network dynamics. We regard the network inference problem based on dynamical time series data as a problem of minimizing errors for predicting states of observable nodes and proposed a novel data-driven deep learning model called Gumbel-softmax Inference for Network (GIN) to solve the problem under incomplete information. The GIN framework includes three modules: a dynamics learner, a network generator, and an initial state generator to infer the unobservable parts of the network. We implement experiments on artificial and empirical social networks with discrete and continuous dynamics. The experiments show that our method can infer the unknown parts of the structure and the initial states of the observable nodes with up to 90\% accuracy. The accuracy declines linearly with the increase of the fractions of unobservable nodes. Our framework may have wide applications where the network structure is hard to obtain and the time series data is rich.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
A Data-Driven Democratized Control Architecture for Regional Transmission Operators
Authors:
Xiaoyuan Fan,
Daniel Moscovitz,
Liang Du,
Walid Saad
Abstract:
As probably the most complicated and critical infrastructure system, U.S. power grids become increasingly vulnerable to extreme events such as cyber-attacks and severe weather, as well as higher DER penetrations and growing information mismatch among system operators, utilities (transmission or generation owners), and end-users. This paper proposes a data-driven democratized control architecture c…
▽ More
As probably the most complicated and critical infrastructure system, U.S. power grids become increasingly vulnerable to extreme events such as cyber-attacks and severe weather, as well as higher DER penetrations and growing information mismatch among system operators, utilities (transmission or generation owners), and end-users. This paper proposes a data-driven democratized control architecture considering two democratization pathways to assist transmission system operators, with a targeted use case of develo** online proactive islanding strategies. Detailed discussions on load capability profiling at transmission buses and disaggregation of DER generations are provided and illustrated with real-world utility data. By Combining network and operational constraints, transmission system operators can be equipped with new tools built on top of this architecture, to derive accurate, proactive, and strategic islanding decisions to incorporate the wide range of dynamic portfolios and needs when facing extreme events or unseen grid contingencies.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Random vector functional link neural network based ensemble deep learning for short-term load forecasting
Authors:
Ruobin Gao,
Liang Du,
P. N. Suganthan,
Qin Zhou,
Kum Fai Yuen
Abstract:
Electricity load forecasting is crucial for the power systems' planning and maintenance. However, its un-stationary and non-linear characteristics impose significant difficulties in anticipating future demand. This paper proposes a novel ensemble deep Random Vector Functional Link (edRVFL) network for electricity load forecasting. The weights of hidden layers are randomly initialized and kept fixe…
▽ More
Electricity load forecasting is crucial for the power systems' planning and maintenance. However, its un-stationary and non-linear characteristics impose significant difficulties in anticipating future demand. This paper proposes a novel ensemble deep Random Vector Functional Link (edRVFL) network for electricity load forecasting. The weights of hidden layers are randomly initialized and kept fixed during the training process. The hidden layers are stacked to enforce deep representation learning. Then, the model generates the forecasts by ensembling the outputs of each layer. Moreover, we also propose to augment the random enhancement features by empirical wavelet transformation (EWT). The raw load data is decomposed by EWT in a walk-forward fashion, not introducing future data leakage problems in the decomposition process. Finally, all the sub-series generated by the EWT, including raw data, are fed into the edRVFL for forecasting purposes. The proposed model is evaluated on twenty publicly available time series from the Australian Energy Market Operator of the year 2020. The simulation results demonstrate the proposed model's superior performance over eleven forecasting methods in three error metrics and statistical tests on electricity load forecasting tasks.
△ Less
Submitted 29 July, 2021;
originally announced July 2021.
-
Differences between human and machine perception in medical diagnosis
Authors:
Taro Makino,
Stanislaw Jastrzebski,
Witold Oleszkiewicz,
Celin Chacko,
Robin Ehrenpreis,
Naziya Samreen,
Chloe Chhor,
Eric Kim,
Jiyon Lee,
Kristine Pysarenko,
Beatriu Reig,
Hildegard Toth,
Divya Awal,
Linda Du,
Alice Kim,
James Park,
Daniel K. Sodickson,
Laura Heacock,
Linda Moy,
Kyunghyun Cho,
Krzysztof J. Geras
Abstract:
Deep neural networks (DNNs) show promise in image-based medical diagnosis, but cannot be fully trusted since their performance can be severely degraded by dataset shifts to which human perception remains invariant. If we can better understand the differences between human and machine perception, we can potentially characterize and mitigate this effect. We therefore propose a framework for comparin…
▽ More
Deep neural networks (DNNs) show promise in image-based medical diagnosis, but cannot be fully trusted since their performance can be severely degraded by dataset shifts to which human perception remains invariant. If we can better understand the differences between human and machine perception, we can potentially characterize and mitigate this effect. We therefore propose a framework for comparing human and machine perception in medical diagnosis. The two are compared with respect to their sensitivity to the removal of clinically meaningful information, and to the regions of an image deemed most suspicious. Drawing inspiration from the natural image domain, we frame both comparisons in terms of perturbation robustness. The novelty of our framework is that separate analyses are performed for subgroups with clinically meaningful differences. We argue that this is necessary in order to avert Simpson's paradox and draw correct conclusions. We demonstrate our framework with a case study in breast cancer screening, and reveal significant differences between radiologists and DNNs. We compare the two with respect to their robustness to Gaussian low-pass filtering, performing a subgroup analysis on microcalcifications and soft tissue lesions. For microcalcifications, DNNs use a separate set of high frequency components than radiologists, some of which lie outside the image regions considered most suspicious by radiologists. These features run the risk of being spurious, but if not, could represent potential new biomarkers. For soft tissue lesions, the divergence between radiologists and DNNs is even starker, with DNNs relying heavily on spurious high frequency components ignored by radiologists. Importantly, this deviation in soft tissue lesions was only observable through subgroup analysis, which highlights the importance of incorporating medical domain knowledge into our comparison framework.
△ Less
Submitted 27 November, 2020;
originally announced November 2020.
-
Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection
Authors:
Liang Du,
Xiaoqing Ye,
Xiao Tan,
Jianfeng Feng,
Zhenbo Xu,
Errui Ding,
Shilei Wen
Abstract:
Object detection from 3D point clouds remains a challenging task, though recent studies pushed the envelope with the deep learning techniques. Owing to the severe spatial occlusion and inherent variance of point density with the distance to sensors, appearance of a same object varies a lot in point cloud data. Designing robust feature representation against such appearance changes is hence the key…
▽ More
Object detection from 3D point clouds remains a challenging task, though recent studies pushed the envelope with the deep learning techniques. Owing to the severe spatial occlusion and inherent variance of point density with the distance to sensors, appearance of a same object varies a lot in point cloud data. Designing robust feature representation against such appearance changes is hence the key issue in a 3D object detection method. In this paper, we innovatively propose a domain adaptation like approach to enhance the robustness of the feature representation. More specifically, we bridge the gap between the perceptual domain where the feature comes from a real scene and the conceptual domain where the feature is extracted from an augmented scene consisting of non-occlusion point cloud rich of detailed information. This domain adaptation approach mimics the functionality of the human brain when proceeding object perception. Extensive experiments demonstrate that our simple yet effective approach fundamentally boosts the performance of 3D point cloud object detection and achieves the state-of-the-art results.
△ Less
Submitted 8 June, 2020;
originally announced June 2020.
-
Capacity Characterization for Reconfigurable Intelligent Surfaces Assisted Multiple-Antenna Multicast
Authors:
Linsong Du,
Shihai Shao Gang Yang,
Jianhui Ma,
Qinpeng Liang,
Youxi Tang
Abstract:
The reconfigurable intelligent surface (RIS), which consists of a large number of passive and low-cost reflecting elements, has been recognized as a revolutionary technology to enhance the performance of future wireless networks. This paper considers an RIS assisted multicast transmission, where a base station (BS) with multiple-antenna multicasts common message to multiple single-antenna mobile u…
▽ More
The reconfigurable intelligent surface (RIS), which consists of a large number of passive and low-cost reflecting elements, has been recognized as a revolutionary technology to enhance the performance of future wireless networks. This paper considers an RIS assisted multicast transmission, where a base station (BS) with multiple-antenna multicasts common message to multiple single-antenna mobile users (MUs) under the assistance of an RIS. An equivalent channel model for the considered multicast transmission is analyzed, and then an optimization problem for the corresponding channel capacity is formulated to obtain the optimal covariance matrix and phase shifts. In order to solve the above non-convex and non-differentiable problem, this paper first exploits the gradient descent method and alternating optimization, to approach the locally optimal solution for any number of MUs. Then, this paper considers a special case, which can obtain the global optimal solution, and shows the sufficient and necessary condition for this special case. Finally, the order growth of the maximal capacity is obtained when the numbers of the reflecting elements, the BS antennas, and the MUs go to infinity.
△ Less
Submitted 24 May, 2021; v1 submitted 17 December, 2019;
originally announced December 2019.
-
Interference and Efficient Transmission Range via V2V Communication at Roads Traffic Intersections
Authors:
Ala Alobeidyeen,
Lili Du
Abstract:
Vehicle-to-Vehicle (V2V) communication technology has dramatically promoted many promising applications to enhance traffic safety, mobility, and sustainability. However, However, we still lack the understanding of some fundamental properties of V2V technology under urban traffic conditions, such as interference at traffic intersections. Motivated by this view, this study develops the mathematical…
▽ More
Vehicle-to-Vehicle (V2V) communication technology has dramatically promoted many promising applications to enhance traffic safety, mobility, and sustainability. However, However, we still lack the understanding of some fundamental properties of V2V technology under urban traffic conditions, such as interference at traffic intersections. Motivated by this view, this study develops the mathematical formulations to capture the worst-case interference at traffic intersections, considering the macroscopic traffic flow conditions and critical road geometric features including intersection diameter D, and intersection angle α. Built upon these formulations, we develop a mathematical model to approximate a conservative transmission range to sustain the successful V2V transmission at a traffic intersection. Our experiments illustrate that the proposed analytical formulations can provide accurate approximations for the interference and the corresponding transmission range at orthogonal (non-orthogonal) traffic intersections under various traffic congestion levels. Furthermore, this study conducted other experiments to understand how intersection geometric features (such as (D, α)) impact V2V communication at traffic intersections. The results illustrate that severer interference and smaller transmission range occur at a smaller intersection (with smaller diameter D) under heavy traffic congestion level. And the orthogonal intersection gives critical thresholds (such as severest interference and minimum transmission range) under all different traffic conditions, which help in understanding the V2V communication performance at an urban traffic intersection. These findings will potentially help to develop efficient MAC algorithms adaptive to urban traffic conditions, and further support various ITS applications using V2V communication. interference and transmission range.
△ Less
Submitted 11 November, 2019;
originally announced November 2019.
-
Visualization of Multi-Objective Switched Reluctance Machine Optimization at Multiple Operating Conditions with t-SNE
Authors:
Shen Zhang,
Shibo Zhang,
Sufei Li,
Liang Du,
Thomas G. Habetler
Abstract:
The optimization of electric machines at multiple operating points is crucial for applications that require frequent changes on speeds and loads, such as the electric vehicles, to strive for the machine optimal performance across the entire driving cycle. However, the number of objectives that would need to be optimized would significantly increase with the number of operating points considered in…
▽ More
The optimization of electric machines at multiple operating points is crucial for applications that require frequent changes on speeds and loads, such as the electric vehicles, to strive for the machine optimal performance across the entire driving cycle. However, the number of objectives that would need to be optimized would significantly increase with the number of operating points considered in the optimization, thus posting a potential problem in regards to the visualization techniques currently in use, such as in the scatter plots of Pareto fronts, the parallel coordinates, and in the principal component analysis (PCA), inhibiting their ability to provide machine designers with intuitive and informative visualizations of all of the design candidates and their ability to pick a few for further fine-tuning with performance verification. Therefore, this paper proposes the utilization of t-distributed stochastic neighbor embedding (t-SNE) to visualize all of the optimization objectives of various electric machines design candidates with various operating conditions, which constitute a high-dimensional set of data that would lie on several different, but related, low-dimensional manifolds. Finally, two case studies of switched reluctance machines (SRM) are presented to illustrate the superiority of then t-SNE when compared to traditional visualization techniques used in electric machine optimizations.
△ Less
Submitted 3 November, 2019;
originally announced November 2019.
-
Privacy-preserving Distributed Machine Learning via Local Randomization and ADMM Perturbation
Authors:
Xin Wang,
Hideaki Ishii,
Linkang Du,
Peng Cheng,
Jiming Chen
Abstract:
With the proliferation of training data, distributed machine learning (DML) is becoming more competent for large-scale learning tasks. However, privacy concerns have to be given priority in DML, since training data may contain sensitive information of users. In this paper, we propose a privacy-preserving ADMM-based DML framework with two novel features: First, we remove the assumption commonly mad…
▽ More
With the proliferation of training data, distributed machine learning (DML) is becoming more competent for large-scale learning tasks. However, privacy concerns have to be given priority in DML, since training data may contain sensitive information of users. In this paper, we propose a privacy-preserving ADMM-based DML framework with two novel features: First, we remove the assumption commonly made in the literature that the users trust the server collecting their data. Second, the framework provides heterogeneous privacy for users depending on data's sensitive levels and servers' trust degrees. The challenging issue is to keep the accumulation of privacy losses over ADMM iterations minimal. In the proposed framework, a local randomization approach, which is differentially private, is adopted to provide users with self-controlled privacy guarantee for the most sensitive information. Further, the ADMM algorithm is perturbed through a combined noise-adding method, which simultaneously preserves privacy for users' less sensitive information and strengthens the privacy protection of the most sensitive information. We provide detailed analyses on the performance of the trained model according to its generalization error. Finally, we conduct extensive experiments using real-world datasets to validate the theoretical results and evaluate the classification performance of the proposed framework.
△ Less
Submitted 9 September, 2019; v1 submitted 30 July, 2019;
originally announced August 2019.
-
Robust Vector Perturbation Precoding Design for MIMO Broadcast Channel
Authors:
Liutong Du,
Lihua Li,
** Zhang
Abstract:
We consider the vector perturbation (VP) precoder design for multiuser multiple-input single output (MU-MISO) broadcast channel systems which is robust to power scaling factor errors. VP precoding has so far been developed and analyzed under the assumption that receivers could have known the power scaling factor in advance of tranmission perfectly, which is hard to obtain due to the large dynamic…
▽ More
We consider the vector perturbation (VP) precoder design for multiuser multiple-input single output (MU-MISO) broadcast channel systems which is robust to power scaling factor errors. VP precoding has so far been developed and analyzed under the assumption that receivers could have known the power scaling factor in advance of tranmission perfectly, which is hard to obtain due to the large dynamic range and limited feedforward. However, as demonstrated in our results the performance of VP precoding is quite sensitive to the accuracy of power scaling factor and always encounter an error floor at mid to high signal-to-noise ratio (SNR) regimes. Motivated by such observations, we propose a robust VP precoder based on the minimum mean square error (MMSE) criterion. Simulation results show that, the robust VP precoder outperforms conventional VP precoding designs, as less sensitive to power scaling factor errors.
△ Less
Submitted 26 January, 2019;
originally announced January 2019.