-
Expansive Synthesis: Generating Large-Scale Datasets from Minimal Samples
Authors:
Vahid Jebraeeli,
Bo Jiang,
Hamid Krim,
Derya Cansever
Abstract:
The challenge of limited availability of data for training in machine learning arises in many applications and the impact on performance and generalization is serious. Traditional data augmentation methods aim to enhance training with a moderately sufficient data set. Generative models like Generative Adversarial Networks (GANs) often face problematic convergence when generating significant and di…
▽ More
The challenge of limited availability of data for training in machine learning arises in many applications and the impact on performance and generalization is serious. Traditional data augmentation methods aim to enhance training with a moderately sufficient data set. Generative models like Generative Adversarial Networks (GANs) often face problematic convergence when generating significant and diverse data samples. Diffusion models, though effective, still struggle with high computational cost and long training times. This paper introduces an innovative Expansive Synthesis model that generates large-scale, high-fidelity datasets from minimal samples. The proposed approach exploits expander graph map**s and feature interpolation to synthesize expanded datasets while preserving the intrinsic data distribution and feature structural relationships. The rationale of the model is rooted in the non-linear property of neural networks' latent space and in its capture by a Koopman operator to yield a linear space of features to facilitate the construction of larger and enriched consistent datasets starting with a much smaller dataset. This process is optimized by an autoencoder architecture enhanced with self-attention layers and further refined for distributional consistency by optimal transport. We validate our Expansive Synthesis by training classifiers on the generated datasets and comparing their performance to classifiers trained on larger, original datasets. Experimental results demonstrate that classifiers trained on synthesized data achieve performance metrics on par with those trained on full-scale datasets, showcasing the model's potential to effectively augment training data. This work represents a significant advancement in data generation, offering a robust solution to data scarcity and paving the way for enhanced data availability in machine learning applications.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework
Authors:
Yiyang Zhao,
Yunzhuo Liu,
Bo Jiang,
Tian Guo
Abstract:
This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-le…
▽ More
This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-learning agent to dynamically adjust GPU resources based on carbon intensity, predicted by a time-series transformer, to balance energy-efficient sampling and energy-intensive evaluation tasks. Furthermore, CE-NAS leverages a recently proposed multi-objective optimizer to effectively reduce the NAS search space. We demonstrate the efficacy of CE-NAS in lowering carbon emissions while achieving SOTA results for both NAS datasets and open-domain NAS tasks. For example, on the HW-NasBench dataset, CE-NAS reduces carbon emissions by up to 7.22X while maintaining a search efficiency comparable to vanilla NAS. For open-domain NAS tasks, CE-NAS achieves SOTA results with 97.35% top-1 accuracy on CIFAR-10 with only 1.68M parameters and a carbon consumption of 38.53 lbs of CO2. On ImageNet, our searched model achieves 80.6% top-1 accuracy with a 0.78 ms TensorRT latency using FP16 on NVIDIA V100, consuming only 909.86 lbs of CO2, making it comparable to other one-shot-based NAS baselines.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Design, Calibration, and Control of Compliant Force-sensing Grip** Pads for Humanoid Robots
Authors:
Yuanfeng Han,
Boren Jiang,
Gregory S. Chirikjian
Abstract:
This paper introduces a pair of low-cost, light-weight and compliant force-sensing grip** pads used for manipulating box-like objects with smaller-sized humanoid robots. These pads measure normal grip** forces and center of pressure (CoP). A calibration method is developed to improve the CoP measurement accuracy. A hybrid force-alignment-position control framework is proposed to regulate the g…
▽ More
This paper introduces a pair of low-cost, light-weight and compliant force-sensing grip** pads used for manipulating box-like objects with smaller-sized humanoid robots. These pads measure normal grip** forces and center of pressure (CoP). A calibration method is developed to improve the CoP measurement accuracy. A hybrid force-alignment-position control framework is proposed to regulate the grip** forces and to ensure the surface alignment between the grippers and the object. Limit surface theory is incorporated as a contact friction modeling approach to determine the magnitude of grip** forces for slippage avoidance. The integrated hardware and software system is demonstrated with a NAO humanoid robot. Experiments show the effectiveness of the overall approach.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Global and local observability of hypergraphs
Authors:
Chencheng Zhang,
Hao Yang,
Shaoxuan Cui,
Bin Jiang,
Ming Cao
Abstract:
This paper studies observability for non-uniform hypergraphs with inputs and outputs. To capture higher-order interactions, we define a canonical non-homogeneous dynamical system with nonlinear outputs on hypergraphs. We then construct algebraic necessary and sufficient conditions based on polynomial ideals and varieties for global observability at an initial state of hypergraphs. An example is gi…
▽ More
This paper studies observability for non-uniform hypergraphs with inputs and outputs. To capture higher-order interactions, we define a canonical non-homogeneous dynamical system with nonlinear outputs on hypergraphs. We then construct algebraic necessary and sufficient conditions based on polynomial ideals and varieties for global observability at an initial state of hypergraphs. An example is given to illustrate the proposed criteria for observability. Further, necessary and sufficient conditions for local observability are derived based on rank conditions of observability matrices, which provide a framework to study local observability for non-uniform hypergraphs. Finally, the similarity of observability for hypergraphs is proposed using similarity of tensors, which reveals the relation of observability between two hypergraphs and helps to check the observability intuitively.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Koopcon: A new approach towards smarter and less complex learning
Authors:
Vahid Jebraeeli,
Bo Jiang,
Derya Cansever,
Hamid Krim
Abstract:
In the era of big data, the sheer volume and complexity of datasets pose significant challenges in machine learning, particularly in image processing tasks. This paper introduces an innovative Autoencoder-based Dataset Condensation Model backed by Koopman operator theory that effectively packs large datasets into compact, information-rich representations. Inspired by the predictive coding mechanis…
▽ More
In the era of big data, the sheer volume and complexity of datasets pose significant challenges in machine learning, particularly in image processing tasks. This paper introduces an innovative Autoencoder-based Dataset Condensation Model backed by Koopman operator theory that effectively packs large datasets into compact, information-rich representations. Inspired by the predictive coding mechanisms of the human brain, our model leverages a novel approach to encode and reconstruct data, maintaining essential features and label distributions. The condensation process utilizes an autoencoder neural network architecture, coupled with Optimal Transport theory and Wasserstein distance, to minimize the distributional discrepancies between the original and synthesized datasets. We present a two-stage implementation strategy: first, condensing the large dataset into a smaller synthesized subset; second, evaluating the synthesized data by training a classifier and comparing its performance with a classifier trained on an equivalent subset of the original data. Our experimental results demonstrate that the classifiers trained on condensed data exhibit comparable performance to those trained on the original datasets, thus affirming the efficacy of our condensation model. This work not only contributes to the reduction of computational resources but also paves the way for efficient data handling in constrained environments, marking a significant step forward in data-efficient machine learning.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Pre-training on High Definition X-ray Images: An Experimental Study
Authors:
Xiao Wang,
Yuehang Li,
Wentao Wu,
Jiandong **,
Yao Rong,
Bo Jiang,
Chuanfu Li,
** Tang
Abstract:
Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficul…
▽ More
Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficult miscellaneous diseases. In this paper, we address these issues by proposing the first high-definition (1280 $\times$ 1280) X-ray based pre-trained foundation vision model on our newly collected large-scale dataset which contains more than 1 million X-ray images. Our model follows the masked auto-encoder framework which takes the tokens after mask processing (with a high rate) is used as input, and the masked image patches are reconstructed by the Transformer encoder-decoder network. More importantly, we introduce a novel context-aware masking strategy that utilizes the chest contour as a boundary for adaptive masking operations. We validate the effectiveness of our model on two downstream tasks, including X-ray report generation and disease recognition. Extensive experiments demonstrate that our pre-trained medical foundation vision model achieves comparable or even new state-of-the-art performance on downstream benchmark datasets. The source code and pre-trained models of this paper will be released on https://github.com/Event-AHU/Medical_Image_Analysis.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Ultrasound SAM Adapter: Adapting SAM for Breast Lesion Segmentation in Ultrasound Images
Authors:
Zhengzheng Tu,
Le Gu,
Xixi Wang,
Bo Jiang
Abstract:
Segment Anything Model (SAM) has recently achieved amazing results in the field of natural image segmentation. However, it is not effective for medical image segmentation, owing to the large domain gap between natural and medical images. In this paper, we mainly focus on ultrasound image segmentation. As we know that it is very difficult to train a foundation model for ultrasound image data due to…
▽ More
Segment Anything Model (SAM) has recently achieved amazing results in the field of natural image segmentation. However, it is not effective for medical image segmentation, owing to the large domain gap between natural and medical images. In this paper, we mainly focus on ultrasound image segmentation. As we know that it is very difficult to train a foundation model for ultrasound image data due to the lack of large-scale annotated ultrasound image data. To address these issues, in this paper, we develop a novel Breast Ultrasound SAM Adapter, termed Breast Ultrasound Segment Anything Model (BUSSAM), which migrates the SAM to the field of breast ultrasound image segmentation by using the adapter technique. To be specific, we first design a novel CNN image encoder, which is fully trained on the BUS dataset. Our CNN image encoder is more lightweight, and focuses more on features of local receptive field, which provides the complementary information to the ViT branch in SAM. Then, we design a novel Cross-Branch Adapter to allow the CNN image encoder to fully interact with the ViT image encoder in SAM module. Finally, we add both of the Position Adapter and the Feature Adapter to the ViT branch to fine-tune the original SAM. The experimental results on AMUBUS and BUSI datasets demonstrate that our proposed model outperforms other medical image segmentation models significantly. Our code will be available at: https://github.com/bscs12/BUSSAM.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
A Spatial-Temporal Progressive Fusion Network for Breast Lesion Segmentation in Ultrasound Videos
Authors:
Zhengzheng Tu,
Zigang Zhu,
Yayang Duan,
Bo Jiang,
Qishun Wang,
Chaoxue Zhang
Abstract:
Ultrasound video-based breast lesion segmentation provides a valuable assistance in early breast lesion detection and treatment. However, existing works mainly focus on lesion segmentation based on ultrasound breast images which usually can not be adapted well to obtain desirable results on ultrasound videos. The main challenge for ultrasound video-based breast lesion segmentation is how to exploi…
▽ More
Ultrasound video-based breast lesion segmentation provides a valuable assistance in early breast lesion detection and treatment. However, existing works mainly focus on lesion segmentation based on ultrasound breast images which usually can not be adapted well to obtain desirable results on ultrasound videos. The main challenge for ultrasound video-based breast lesion segmentation is how to exploit the lesion cues of both intra-frame and inter-frame simultaneously. To address this problem, we propose a novel Spatial-Temporal Progressive Fusion Network (STPFNet) for video based breast lesion segmentation problem. The main aspects of the proposed STPFNet are threefold. First, we propose to adopt a unified network architecture to capture both spatial dependences within each ultrasound frame and temporal correlations between different frames together for ultrasound data representation. Second, we propose a new fusion module, termed Multi-Scale Feature Fusion (MSFF), to fuse spatial and temporal cues together for lesion detection. MSFF can help to determine the boundary contour of lesion region to overcome the issue of lesion boundary blurring. Third, we propose to exploit the segmentation result of previous frame as the prior knowledge to suppress the noisy background and learn more robust representation. In particular, we introduce a new publicly available ultrasound video breast lesion segmentation dataset, termed UVBLS200, which is specifically dedicated to breast lesion segmentation. It contains 200 videos, including 80 videos of benign lesions and 120 videos of malignant lesions. Experiments on the proposed dataset demonstrate that the proposed STPFNet achieves better breast lesion detection performance than state-of-the-art methods.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Budget Recycling Differential Privacy
Authors:
Bo Jiang,
Jian Du,
Sagar Shamar,
Qiang Yan
Abstract:
Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing "out-of-bound" noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By "soft-bounded," we refer to the mechanism's ability to release most outputs within…
▽ More
Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing "out-of-bound" noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By "soft-bounded," we refer to the mechanism's ability to release most outputs within a predefined error boundary, thereby improving utility and maintaining privacy simultaneously. The core of BR-DP consists of two components: a DP kernel responsible for generating a noisy answer per iteration, and a recycler that probabilistically recycles/regenerates or releases the noisy answer. We delve into the privacy accounting of BR-DP, culminating in the development of a budgeting principle that optimally sub-allocates the available budget between the DP kernel and the recycler. Furthermore, we introduce algorithms for tight BR-DP accounting in composition scenarios, and our findings indicate that BR-DP achieves reduced privacy leakage post-composition compared to DP. Additionally, we explore the concept of privacy amplification via subsampling within the BR-DP framework and propose optimal sampling rates for BR-DP across various queries. We experiment with real data, and the results demonstrate BR-DP's effectiveness in lifting the utility-privacy tradeoff provided by DP mechanisms.
△ Less
Submitted 16 April, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Multilingual Turn-taking Prediction Using Voice Activity Projection
Authors:
Koji Inoue,
Bing'er Jiang,
Erik Ekstedt,
Tatsuya Kawahara,
Gabriel Skantze
Abstract:
This paper investigates the application of voice activity projection (VAP), a predictive turn-taking model for spoken dialogue, on multilingual data, encompassing English, Mandarin, and Japanese. The VAP model continuously predicts the upcoming voice activities of participants in dyadic dialogue, leveraging a cross-attention Transformer to capture the dynamic interplay between participants. The re…
▽ More
This paper investigates the application of voice activity projection (VAP), a predictive turn-taking model for spoken dialogue, on multilingual data, encompassing English, Mandarin, and Japanese. The VAP model continuously predicts the upcoming voice activities of participants in dyadic dialogue, leveraging a cross-attention Transformer to capture the dynamic interplay between participants. The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages. However, a multilingual model, trained on all three languages, demonstrates predictive performance on par with monolingual models across all languages. Further analyses show that the multilingual model has learned to discern the language of the input signal. We also analyze the sensitivity to pitch, a prosodic cue that is thought to be important for turn-taking. Finally, we compare two different audio encoders, contrastive predictive coding (CPC) pre-trained on English, with a recent model based on multilingual wav2vec 2.0 (MMS).
△ Less
Submitted 14 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Real-time and Continuous Turn-taking Prediction Using Voice Activity Projection
Authors:
Koji Inoue,
Bing'er Jiang,
Erik Ekstedt,
Tatsuya Kawahara,
Gabriel Skantze
Abstract:
A demonstration of a real-time and continuous turn-taking prediction system is presented. The system is based on a voice activity projection (VAP) model, which directly maps dialogue stereo audio to future voice activities. The VAP model includes contrastive predictive coding (CPC) and self-attention transformers, followed by a cross-attention transformer. We examine the effect of the input contex…
▽ More
A demonstration of a real-time and continuous turn-taking prediction system is presented. The system is based on a voice activity projection (VAP) model, which directly maps dialogue stereo audio to future voice activities. The VAP model includes contrastive predictive coding (CPC) and self-attention transformers, followed by a cross-attention transformer. We examine the effect of the input context audio length and demonstrate that the proposed system can operate in real-time with CPU settings, with minimal performance degradation.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Adaptive Boosting with Fairness-aware Reweighting Technique for Fair Classification
Authors:
Xiaobin Song,
Zeyuan Liu,
Benben Jiang
Abstract:
Machine learning methods based on AdaBoost have been widely applied to various classification problems across many mission-critical applications including healthcare, law and finance. However, there is a growing concern about the unfairness and discrimination of data-driven classification models, which is inevitable for classical algorithms including AdaBoost. In order to achieve fair classificati…
▽ More
Machine learning methods based on AdaBoost have been widely applied to various classification problems across many mission-critical applications including healthcare, law and finance. However, there is a growing concern about the unfairness and discrimination of data-driven classification models, which is inevitable for classical algorithms including AdaBoost. In order to achieve fair classification, a novel fair AdaBoost (FAB) approach is proposed that is an interpretable fairness-improving variant of AdaBoost. We mainly investigate binary classification problems and focus on the fairness of three different indicators (i.e., accuracy, false positive rate and false negative rate). By utilizing a fairness-aware reweighting technique for base classifiers, the proposed FAB approach can achieve fair classification while maintaining the advantage of AdaBoost with negligible sacrifice of predictive performance. In addition, a hyperparameter is introduced in FAB to show preferences for the fairness-accuracy trade-off. An upper bound for the target loss function that quantifies error rate and unfairness is theoretically derived for FAB, which provides a strict theoretical support for the fairness-improving methods designed for AdaBoost. The effectiveness of the proposed method is demonstrated on three real-world datasets (i.e., Adult, COMPAS and HSLS) with respect to the three fairness indicators. The results are accordant with theoretic analyses, and show that (i) FAB significantly improves classification fairness at a small cost of accuracy compared with AdaBoost; and (ii) FAB outperforms state-of-the-art fair classification methods including equalized odds method, exponentiated gradient method, and disparate mistreatment method in terms of the fairness-accuracy trade-off.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Hybrid Aerodynamics-Based Model Predictive Control for a Tail-Sitter UAV
Authors:
Bailun Jiang,
Boyang Li,
Ching-Wei Chang,
Chih-Yung Wen
Abstract:
It is challenging to model and control a tail-sitter unmanned aerial vehicle (UAV) because its blended wing body generates complicated nonlinear aerodynamic effects, such as wing lift, fuselage drag, and propeller-wing interactions. We therefore devised a hybrid aerodynamic modeling method and model predictive control (MPC) design for a quadrotor tail-sitter UAV. The hybrid model consists of the N…
▽ More
It is challenging to model and control a tail-sitter unmanned aerial vehicle (UAV) because its blended wing body generates complicated nonlinear aerodynamic effects, such as wing lift, fuselage drag, and propeller-wing interactions. We therefore devised a hybrid aerodynamic modeling method and model predictive control (MPC) design for a quadrotor tail-sitter UAV. The hybrid model consists of the Newton-Euler equation, which describes quadrotor dynamics, and a feedforward neural network, which learns residual aerodynamic effects. This hybrid model exhibits high predictive accuracy at a low computational cost and was used to implement hybrid MPC, which optimizes the throttle, pitch angle, and roll angle for position tracking. The controller performance was validated in real-world experiments, which obtained a 57% tracking error reduction compared with conventional nonlinear MPC. External wind disturbance was also introduced and the experimental results confirmed the robustness of the controller to these conditions.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Parallel Bayesian Optimization Using Satisficing Thompson Sampling for Time-Sensitive Black-Box Optimization
Authors:
Xiaobin Song,
Benben Jiang
Abstract:
Bayesian optimization (BO) is widely used for black-box optimization problems, and have been shown to perform well in various real-world tasks. However, most of the existing BO methods aim to learn the optimal solution, which may become infeasible when the parameter space is extremely large or the problem is time-sensitive. In these contexts, switching to a satisficing solution that requires less…
▽ More
Bayesian optimization (BO) is widely used for black-box optimization problems, and have been shown to perform well in various real-world tasks. However, most of the existing BO methods aim to learn the optimal solution, which may become infeasible when the parameter space is extremely large or the problem is time-sensitive. In these contexts, switching to a satisficing solution that requires less information can result in better performance. In this work, we focus on time-sensitive black-box optimization problems and propose satisficing Thompson sampling-based parallel Bayesian optimization (STS-PBO) approaches, including synchronous and asynchronous versions. We shift the target from an optimal solution to a satisficing solution that is easier to learn. The rate-distortion theory is introduced to construct a loss function that balances the amount of information that needs to be learned with sub-optimality, and the Blahut-Arimoto algorithm is adopted to compute the target solution that reaches the minimum information rate under the distortion limit at each step. Both discounted and undiscounted Bayesian cumulative regret bounds are theoretically derived for the proposed STS-PBO approaches. The effectiveness of the proposed methods is demonstrated on a fast-charging design problem of Lithium-ion batteries. The results are accordant with theoretical analyses, and show that our STS-PBO methods outperform both sequential counterparts and parallel BO with traditional Thompson sampling in both synchronous and asynchronous settings.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Joint Beamforming and Compression Design for Per-Antenna Power Constrained Cooperative Cellular Networks
Authors:
Xilai Fan,
Ya-Feng Liu,
Bo Jiang
Abstract:
In the cooperative cellular network, relay-like base stations are connected to the central processor (CP) via rate-limited fronthaul links and the joint processing is performed at the CP, which thus can effectively mitigate the multiuser interference. In this paper, we consider the joint beamforming and compression problem with per-antenna power constraints in the cooperative cellular network. We…
▽ More
In the cooperative cellular network, relay-like base stations are connected to the central processor (CP) via rate-limited fronthaul links and the joint processing is performed at the CP, which thus can effectively mitigate the multiuser interference. In this paper, we consider the joint beamforming and compression problem with per-antenna power constraints in the cooperative cellular network. We first establish the equivalence between the considered problem and its semidefinite relaxation (SDR). Then we further derive the partial Lagrangian dual of the SDR problem and show that the objective function of the obtained dual problem is differentiable. Based on the differentiability, we propose two efficient projected gradient ascent algorithms for solving the dual problem, which are projected exact gradient ascent (PEGA) and projected inexact gradient ascent (PIGA). While PEGA is guaranteed to find the global solution of the dual problem (and hence the global solution of the original problem), PIGA is more computationally efficient due to the lower complexity in inexactly computing the gradient. Global optimality and high efficiency of the proposed algorithms are demonstrated via numerical experiments.
△ Less
Submitted 23 December, 2023; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Flocking control against the malicious agent
Authors:
Chencheng Zhang,
Hao Yang,
Bin Jiang,
Ming Cao
Abstract:
This paper investigates the flocking control of a swarm with a malicious agent that falsifies its controller parameters to cause collision, division, and escape of agents in the swarm. A novel geometric flocking condition is established by designing the configuration of the malicious agent and its neighbors, under which we propose a hierarchal geometric configuration-based flocking control method.…
▽ More
This paper investigates the flocking control of a swarm with a malicious agent that falsifies its controller parameters to cause collision, division, and escape of agents in the swarm. A novel geometric flocking condition is established by designing the configuration of the malicious agent and its neighbors, under which we propose a hierarchal geometric configuration-based flocking control method. To help detect the malicious agent, a parameter estimate mechanism is also provided. The proposed method can achieve the flocking control goal and meanwhile contain the malicious agent in the swarm without removing it. Experimental result shows the effectiveness of the theoretical result.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Implicit Bayes Adaptation: A Collaborative Transport Approach
Authors:
Bo Jiang,
Hamid Krim,
Tianfu Wu,
Derya Cansever
Abstract:
The power and flexibility of Optimal Transport (OT) have pervaded a wide spectrum of problems, including recent Machine Learning challenges such as unsupervised domain adaptation. Its essence of quantitatively relating two probability distributions by some optimal metric, has been creatively exploited and shown to hold promise for many real-world data challenges. In a related theme in the present…
▽ More
The power and flexibility of Optimal Transport (OT) have pervaded a wide spectrum of problems, including recent Machine Learning challenges such as unsupervised domain adaptation. Its essence of quantitatively relating two probability distributions by some optimal metric, has been creatively exploited and shown to hold promise for many real-world data challenges. In a related theme in the present work, we posit that domain adaptation robustness is rooted in the intrinsic (latent) representations of the respective data, which are inherently lying in a non-linear submanifold embedded in a higher dimensional Euclidean space. We account for the geometric properties by refining the $l^2$ Euclidean metric to better reflect the geodesic distance between two distinct representations. We integrate a metric correction term as well as a prior cluster structure in the source data of the OT-driven adaptation. We show that this is tantamount to an implicit Bayesian framework, which we demonstrate to be viable for a more robust and better-performing approach to domain adaptation. Substantiating experiments are also included for validation purposes.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Fast Charging of Lithium-Ion Batteries Using Deep Bayesian Optimization with Recurrent Neural Network
Authors:
Benben Jiang,
Yixing Wang,
Zhenghua Ma,
Qiugang Lu
Abstract:
Fast charging has attracted increasing attention from the battery community for electrical vehicles (EVs) to alleviate range anxiety and reduce charging time for EVs. However, inappropriate charging strategies would cause severe degradation of batteries or even hazardous accidents. To optimize fast-charging strategies under various constraints, particularly safety limits, we propose a novel deep B…
▽ More
Fast charging has attracted increasing attention from the battery community for electrical vehicles (EVs) to alleviate range anxiety and reduce charging time for EVs. However, inappropriate charging strategies would cause severe degradation of batteries or even hazardous accidents. To optimize fast-charging strategies under various constraints, particularly safety limits, we propose a novel deep Bayesian optimization (BO) approach that utilizes Bayesian recurrent neural network (BRNN) as the surrogate model, given its capability in handling sequential data. In addition, a combined acquisition function of expected improvement (EI) and upper confidence bound (UCB) is developed to better balance the exploitation and exploration. The effectiveness of the proposed approach is demonstrated on the PETLION, a porous electrode theory-based battery simulator. Our method is also compared with the state-of-the-art BO methods that use Gaussian process (GP) and non-recurrent network as surrogate models. The results verify the superior performance of the proposed fast charging approaches, which mainly results from that: (i) the BRNN-based surrogate model provides a more precise prediction of battery lifetime than that based on GP or non-recurrent network; and (ii) the combined acquisition function outperforms traditional EI or UCB criteria in exploring the optimal charging protocol that maintains the longest battery lifetime.
△ Less
Submitted 9 April, 2023;
originally announced April 2023.
-
Pi-ViMo: Physiology-inspired Robust Vital Sign Monitoring using mmWave Radars
Authors:
Bo Zhang,
Boyu Jiang,
Rong Zheng,
** Zhang,
Jun Li,
Qiang Xu
Abstract:
Continuous monitoring of human vital signs using non-contact mmWave radars is attractive due to their ability to penetrate garments and operate under different lighting conditions. Unfortunately, most prior research requires subjects to stay at a fixed distance from radar sensors and to remain still during monitoring. These restrictions limit the applications of radar vital sign monitoring in real…
▽ More
Continuous monitoring of human vital signs using non-contact mmWave radars is attractive due to their ability to penetrate garments and operate under different lighting conditions. Unfortunately, most prior research requires subjects to stay at a fixed distance from radar sensors and to remain still during monitoring. These restrictions limit the applications of radar vital sign monitoring in real life scenarios. In this paper, we address these limitations and present "Pi-ViMo", a non-contact Physiology-inspired Robust Vital Sign Monitoring system, using mmWave radars. We first derive a multi-scattering point model for the human body, and introduce a coherent combining of multiple scatterings to enhance the quality of estimated chest-wall movements. It enables vital sign estimations of subjects at any location in a radar's field of view. We then propose a template matching method to extract human vital signs by adopting physical models of respiration and cardiac activities. The proposed method is capable to separate respiration and heartbeat in the presence of micro-level random body movements (RBM) when a subject is at any location within the field of view of a radar. Experiments in a radar testbed show average respiration rate errors of 6% and heart rate errors of 11.9% for the stationary subjects and average errors of 13.5% for respiration rate and 13.6% for heart rate for subjects under different RBMs.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Uncertainty-weighted Multi-tasking for $T_{1ρ}$ and T$_2$ Map** in the Liver with Self-supervised Learning
Authors:
Chaoxing Huang,
Yurui Qian,
Jian Hou,
Baiyan Jiang,
Queenie Chan,
Vincent WS Wong,
Winnie CW Chu,
Weitian Chen
Abstract:
Multi-parametric map** of MRI relaxations in liver has the potential of revealing pathological information of the liver. A self-supervised learning based multi-parametric map** method is proposed to map T$T_{1ρ}$ and T$_2$ simultaneously, by utilising the relaxation constraint in the learning process. Data noise of different map** tasks is utilised to make the model uncertainty-aware, which…
▽ More
Multi-parametric map** of MRI relaxations in liver has the potential of revealing pathological information of the liver. A self-supervised learning based multi-parametric map** method is proposed to map T$T_{1ρ}$ and T$_2$ simultaneously, by utilising the relaxation constraint in the learning process. Data noise of different map** tasks is utilised to make the model uncertainty-aware, which adaptively weight different map** tasks during learning. The method was examined on a dataset of 51 patients with non-alcoholic fatter liver disease. Results showed that the proposed method can produce comparable parametric maps to the traditional multi-contrast pixel wise fitting method, with a reduced number of images and less computation time. The uncertainty weighting also improves the model performance. It has the potential of accelerating MRI quantitative imaging.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Control Co-design of a Hydrokinetic Turbine: A Comparative Study of Open-loop Optimal Control and Feedback Control
Authors:
Mohammad Reza Amini,
Boxi Jiang,
Yingqian Liao,
Kartik Naik,
Joaquim R. R. A. Martins,
**g Sun
Abstract:
Control co-design (CCD) explores physical and control design spaces simultaneously to optimize a system's performance. A commonly used CCD framework aims to achieve open-loop optimal control (OLOC) trajectory while optimizing the physical design variables subject to constraints on control and design parameters. In this study, in contrast with the conventional CCD methods based on OLOC schemes, we…
▽ More
Control co-design (CCD) explores physical and control design spaces simultaneously to optimize a system's performance. A commonly used CCD framework aims to achieve open-loop optimal control (OLOC) trajectory while optimizing the physical design variables subject to constraints on control and design parameters. In this study, in contrast with the conventional CCD methods based on OLOC schemes, we present a CCD formulation that explicitly considers a feedback controller. In the formulation, we consider two control laws based on proportional linear and quadratic state feedback, where the control gain is optimized. The simulation results show that the OLOC trajectory could be approximated by a feedback controller. While the total energy generated from the CCD with a feedback controller is slightly lower than that of the CCD with OLOC, it results in a much simpler control structure and more robust performance in the presence of uncertainties and disturbances, making it suitable for real-time control. The study in this paper investigates the performance of optimal hydrokinetic turbine design with a feedback controller in the presence of uncertainties and disturbances to demonstrate the benefits and highlight challenges associated with incorporating the feedback controller explicitly in the CCD stage.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Learning to Dub Movies via Hierarchical Prosody Models
Authors:
Gaoxiang Cong,
Liang Li,
Yuankai Qi,
Zhengjun Zha,
Qi Wu,
Wenyu Wang,
Bin Jiang,
Ming-Hsuan Yang,
Qingming Huang
Abstract:
Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference. V2C is more challenging than conventional text-to-speech tasks as it additionally requires the generated speech to exactly match the varying emotions a…
▽ More
Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference. V2C is more challenging than conventional text-to-speech tasks as it additionally requires the generated speech to exactly match the varying emotions and speaking speed presented in the video. Unlike previous works, we propose a novel movie dubbing architecture to tackle these problems via hierarchical prosody modelling, which bridges the visual information to corresponding speech prosody from three aspects: lip, face, and scene. Specifically, we align lip movement to the speech duration, and convey facial expression to speech energy and pitch via attention mechanism based on valence and arousal representations inspired by recent psychology findings. Moreover, we design an emotion booster to capture the atmosphere from global video scenes. All these embeddings together are used to generate mel-spectrogram and then convert to speech waves via existing vocoder. Extensive experimental results on the Chem and V2C benchmark datasets demonstrate the favorable performance of the proposed method. The source code and trained models will be released to the public.
△ Less
Submitted 4 April, 2023; v1 submitted 7 December, 2022;
originally announced December 2022.
-
Gradient and Channel Aware Dynamic Scheduling for Over-the-Air Computation in Federated Edge Learning Systems
Authors:
Jun Du,
Bingqing Jiang,
Chunxiao Jiang,
Yuanming Shi,
Zhu Han
Abstract:
To satisfy the expected plethora of computation-heavy applications, federated edge learning (FEEL) is a new paradigm featuring distributed learning to carry the capacities of low-latency and privacy-preserving. To further improve the efficiency of wireless data aggregation and model learning, over-the-air computation (AirComp) is emerging as a promising solution by using the superposition characte…
▽ More
To satisfy the expected plethora of computation-heavy applications, federated edge learning (FEEL) is a new paradigm featuring distributed learning to carry the capacities of low-latency and privacy-preserving. To further improve the efficiency of wireless data aggregation and model learning, over-the-air computation (AirComp) is emerging as a promising solution by using the superposition characteristics of wireless channels. However, the fading and noise of wireless channels can cause aggregate distortions in AirComp enabled federated learning. In addition, the quality of collected data and energy consumption of edge devices may also impact the accuracy and efficiency of model aggregation as well as convergence. To solve these problems, this work proposes a dynamic device scheduling mechanism, which can select qualified edge devices to transmit their local models with a proper power control policy so as to participate the model training at the server in federated learning via AirComp. In this mechanism, the data importance is measured by the gradient of local model parameter, channel condition and energy consumption of the device jointly. In particular, to fully use distributed datasets and accelerate the convergence rate of federated learning, the local updates of unselected devices are also retained and accumulated for future potential transmission, instead of being discarded directly. Furthermore, the Lyapunov drift-plus-penalty optimization problem is formulated for searching the optimal device selection strategy. Simulation results validate that the proposed scheduling mechanism can achieve higher test accuracy and faster convergence rate, and is robust against different channel conditions.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Feature-aggregated spatiotemporal spine surface estimation for wearable patch ultrasound volumetric imaging
Authors:
Baichuan Jiang,
Keshuai Xu,
Ahbay Moghekar,
Peter Kazanzides,
Emad Boctor
Abstract:
Clear identification of bone structures is crucial for ultrasound-guided lumbar interventions, but it can be challenging due to the complex shapes of the self-shadowing vertebra anatomy and the extensive background speckle noise from the surrounding soft tissue structures. Therefore, we propose to use a patch-like wearable ultrasound solution to capture the reflective bone surfaces from multiple i…
▽ More
Clear identification of bone structures is crucial for ultrasound-guided lumbar interventions, but it can be challenging due to the complex shapes of the self-shadowing vertebra anatomy and the extensive background speckle noise from the surrounding soft tissue structures. Therefore, we propose to use a patch-like wearable ultrasound solution to capture the reflective bone surfaces from multiple imaging angles and create 3D bone representations for interventional guidance. In this work, we will present our method for estimating the vertebra bone surfaces by using a spatiotemporal U-Net architecture learning from the B-Mode image and aggregated feature maps of hand-crafted filters. The methods are evaluated on spine phantom image data collected by our proposed miniaturized wearable "patch" ultrasound device, and the results show that a significant improvement on baseline method can be achieved with promising accuracy. Equipped with this surface estimation framework, our wearable ultrasound system can potentially provide intuitive and accurate interventional guidance for clinicians in augmented reality setting.
△ Less
Submitted 10 November, 2022;
originally announced November 2022.
-
An Efficient Alternating Riemannian/Projected Gradient Descent Ascent Algorithm for Fair Principal Component Analysis
Authors:
Meng Xu,
Bo Jiang,
Wenqiang Pu,
Ya-Feng Liu,
Anthony Man-Cho So
Abstract:
Fair principal component analysis (FPCA), a ubiquitous dimensionality reduction technique in signal processing and machine learning, aims to find a low-dimensional representation for a high-dimensional dataset in view of fairness. The FPCA problem involves optimizing a non-convex and non-smooth function over the Stiefel manifold. The state-of-the-art methods for solving the problem are subgradient…
▽ More
Fair principal component analysis (FPCA), a ubiquitous dimensionality reduction technique in signal processing and machine learning, aims to find a low-dimensional representation for a high-dimensional dataset in view of fairness. The FPCA problem involves optimizing a non-convex and non-smooth function over the Stiefel manifold. The state-of-the-art methods for solving the problem are subgradient methods and semidefinite relaxation-based methods. However, these two types of methods have their obvious limitations and thus are only suitable for efficiently solving the FPCA problem in special scenarios. This paper aims at develo** efficient algorithms for solving the FPCA problem in general, especially large-scale, settings. In this paper, we first transform FPCA into a smooth non-convex linear minimax optimization problem over the Stiefel manifold. To solve the above general problem, we propose an efficient alternating Riemannian/projected gradient descent ascent (ARPGDA) algorithm, which performs a Riemannian gradient descent step and an ordinary projected gradient ascent step at each iteration. We prove that ARPGDA can find an $\varepsilon$-stationary point of the above problem within $\mathcal{O}(\varepsilon^{-3})$ iterations. Simulation results show that, compared with the state-of-the-art methods, our proposed ARPGDA algorithm can achieve a better performance in terms of solution quality and speed for solving the FPCA problems.
△ Less
Submitted 23 December, 2023; v1 submitted 28 October, 2022;
originally announced October 2022.
-
Efficient Quantized Constant Envelope Precoding for Multiuser Downlink Massive MIMO Systems
Authors:
Zheyu Wu,
Ya-Feng Liu,
Bo Jiang,
Yu-Hong Dai
Abstract:
Quantized constant envelope (QCE) precoding, a new transmission scheme that only discrete QCE transmit signals are allowed at each antenna, has gained growing research interests due to its ability of reducing the hardware cost and the energy consumption of massive multiple-input multiple-output (MIMO) systems. However, the discrete nature of QCE transmit signals greatly complicates the precoding d…
▽ More
Quantized constant envelope (QCE) precoding, a new transmission scheme that only discrete QCE transmit signals are allowed at each antenna, has gained growing research interests due to its ability of reducing the hardware cost and the energy consumption of massive multiple-input multiple-output (MIMO) systems. However, the discrete nature of QCE transmit signals greatly complicates the precoding design. In this paper, we consider the QCE precoding problem for a massive MIMO system with phase shift keying (PSK) modulation and develop an efficient approach for solving the constructive interference (CI) based problem formulation. Our approach is based on a custom-designed (continuous) penalty model that is equivalent to the original discrete problem. Specifically, the penalty model relaxes the discrete QCE constraint and penalizes it in the objective with a negative $\ell_2$-norm term, which leads to a non-smooth non-convex optimization problem. To tackle it, we resort to our recently proposed alternating optimization (AO) algorithm. We show that the AO algorithm admits closed-form updates at each iteration when applied to our problem and thus can be efficiently implemented. Simulation results demonstrate the superiority of the proposed approach over the existing algorithms.
△ Less
Submitted 20 February, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Physics-informed Deep Diffusion MRI Reconstruction with Synthetic Data: Break Training Data Bottleneck in Artificial Intelligence
Authors:
Chen Qian,
Yuncheng Gao,
Mingyang Han,
Zi Wang,
Dan Ruan,
Yu Shen,
Ya** Wu,
Yirong Zhou,
Chengyan Wang,
Boyu Jiang,
Ran Tao,
Zhigang Wu,
Jiazheng Wang,
Liuhong Zhu,
Yi Guo,
Taishan Kang,
Jianzhong Lin,
Tao Gong,
Chen Yang,
Guoqiang Fei,
Mei** Lin,
Di Guo,
Jianjun Zhou,
Meiyun Wang,
Xiaobo Qu
Abstract:
Diffusion magnetic resonance imaging (MRI) is the only imaging modality for non-invasive movement detection of in vivo water molecules, with significant clinical and research applications. Diffusion MRI (DWI) acquired by multi-shot techniques can achieve higher resolution, better signal-to-noise ratio, and lower geometric distortion than single-shot, but suffers from inter-shot motion-induced arti…
▽ More
Diffusion magnetic resonance imaging (MRI) is the only imaging modality for non-invasive movement detection of in vivo water molecules, with significant clinical and research applications. Diffusion MRI (DWI) acquired by multi-shot techniques can achieve higher resolution, better signal-to-noise ratio, and lower geometric distortion than single-shot, but suffers from inter-shot motion-induced artifacts. These artifacts cannot be removed prospectively, leading to the absence of artifact-free training labels. Thus, the potential of deep learning in multi-shot DWI reconstruction remains largely untapped. To break the training data bottleneck, here, we propose a Physics-Informed Deep DWI reconstruction method (PIDD) to synthesize high-quality paired training data by leveraging the physical diffusion model (magnitude synthesis) and inter-shot motion-induced phase model (motion phase synthesis). The network is trained only once with 100,000 synthetic samples, achieving encouraging results on multiple realistic in vivo data reconstructions. Advantages over conventional methods include: (a) Better motion artifact suppression and reconstruction stability; (b) Outstanding generalization to multi-scenario reconstructions, including multi-resolution, multi-b-value, multi-undersampling, multi-vendor, and multi-center; (c) Excellent clinical adaptability to patients with verifications by seven experienced doctors (p<0.001). In conclusion, PIDD presents a novel deep learning framework by exploiting the power of MRI physics, providing a cost-effective and explainable way to break the data bottleneck in deep learning medical imaging.
△ Less
Submitted 5 February, 2024; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Flexible Alignment Super-Resolution Network for Multi-Contrast MRI
Authors:
Yiming Liu,
Mengxi Zhang,
Weiqin Zhang,
Bo Jiang,
Bo Hou,
Dan Liu,
Jie Chen,
Heqing Lian
Abstract:
Magnetic resonance imaging plays an essential role in clinical diagnosis by acquiring the structural information of biological tissue. Recently, many multi-contrast MRI super-resolution networks achieve good effects. However, most studies ignore the impact of the inappropriate foreground scale and patch size of multi-contrast MRI, which probably leads to inappropriate feature alignment. To tackle…
▽ More
Magnetic resonance imaging plays an essential role in clinical diagnosis by acquiring the structural information of biological tissue. Recently, many multi-contrast MRI super-resolution networks achieve good effects. However, most studies ignore the impact of the inappropriate foreground scale and patch size of multi-contrast MRI, which probably leads to inappropriate feature alignment. To tackle this problem, we propose the Flexible Alignment Super-Resolution Network (FASR-Net) for multi-contrast MRI Super-Resolution. The Flexible Alignment module of FASR-Net consists of two modules for feature alignment. (1) The Single-Multi Pyramid Alignment(S-A) module solves the situation where low-resolution (LR) images and reference (Ref) images have different scales. (2) The Multi-Multi Pyramid Alignment(M-A) module solves the situation where LR and Ref images have the same scale. Besides, we propose the Cross-Hierarchical Progressive Fusion (CHPF) module aiming at fusing the features effectively, further improving the image quality. Compared with other state-of-the-art methods, FASR-net achieves the most competitive results on FastMRI and IXI datasets. Our code will be available at \href{https://github.com/yimingliu123/FASR-Net}{https://github.com/yimingliu123/FASR-Net}.
△ Less
Submitted 8 January, 2023; v1 submitted 7 October, 2022;
originally announced October 2022.
-
On Embeddings and Inverse Embeddings of Input Design for Regularized System Identification
Authors:
Biqiang Mu,
Tianshi Chen,
He Kong,
Bo Jiang,
Lei Wang,
Junfeng Wu
Abstract:
Input design is an important problem for system identification and has been well studied for the classical system identification, i.e., the maximum likelihood/prediction error method. For the emerging regularized system identification, the study on input design has just started, and it is often formulated as a non-convex optimization problem that minimizes a scalar measure of the Bayesian mean squ…
▽ More
Input design is an important problem for system identification and has been well studied for the classical system identification, i.e., the maximum likelihood/prediction error method. For the emerging regularized system identification, the study on input design has just started, and it is often formulated as a non-convex optimization problem that minimizes a scalar measure of the Bayesian mean squared error matrix subject to certain constraints, and the state-of-art method is the so-called quadratic map** and inverse embedding (QMIE) method, where a time domain inverse embedding (TDIE) is proposed to find the inverse of the quadratic map**. In this paper, we report some new results on the embeddings/inverse embeddings of the QMIE method. Firstly, we present a general result on the frequency domain inverse embedding (FDIE) that is to find the inverse of the quadratic map** described by the discrete-time Fourier transform. Then we show the relation between the TDIE and the FDIE from a graph signal processing perspective. Finally, motivated by this perspective, we further propose a graph induced embedding and its inverse, which include the previously introduced embeddings as special cases. This deepens the understanding of input design from a new viewpoint beyond the real domain and the frequency domain viewpoints.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Uncertainty-Aware Self-supervised Neural Network for Liver $T_{1ρ}$ Map** with Relaxation Constraint
Authors:
Chaoxing Huang,
Yurui Qian,
Simon Chun Ho Yu,
Jian Hou,
Baiyan Jiang,
Queenie Chan,
Vincent Wai-Sun Wong,
Winnie Chiu-Wing Chu,
Weitian Chen
Abstract:
$T_{1ρ}$ map** is a promising quantitative MRI technique for the non-invasive assessment of tissue properties. Learning-based approaches can map $T_{1ρ}$ from a reduced number of $T_{1ρ}$ weighted images, but requires significant amounts of high quality training data. Moreover, existing methods do not provide the confidence level of the $T_{1ρ}…
▽ More
$T_{1ρ}$ map** is a promising quantitative MRI technique for the non-invasive assessment of tissue properties. Learning-based approaches can map $T_{1ρ}$ from a reduced number of $T_{1ρ}$ weighted images, but requires significant amounts of high quality training data. Moreover, existing methods do not provide the confidence level of the $T_{1ρ}$ estimation. To address these problems, we proposed a self-supervised learning neural network that learns a $T_{1ρ}$ map** using the relaxation constraint in the learning process. Epistemic uncertainty and aleatoric uncertainty are modelled for the $T_{1ρ}$ quantification network to provide a Bayesian confidence estimation of the $T_{1ρ}$ map**. The uncertainty estimation can also regularize the model to prevent it from learning imperfect data. We conducted experiments on $T_{1ρ}$ data collected from 52 patients with non-alcoholic fatty liver disease. The results showed that our method outperformed the existing methods for $T_{1ρ}$ quantification of the liver using as few as two $T_{1ρ}$-weighted images. Our uncertainty estimation provided a feasible way of modelling the confidence of the self-supervised learning based $T_{1ρ}$ estimation, which is consistent with the reality in liver $T_{1ρ}$ imaging.
△ Less
Submitted 25 October, 2022; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Mixed-UNet: Refined Class Activation Map** for Weakly-Supervised Semantic Segmentation with Multi-scale Inference
Authors:
Yang Liu,
Ersi Zhang,
Lulu Xu,
Chufan Xiao,
Xiaoyun Zhong,
Li** Lian,
Fang Li,
Bin Jiang,
Yuhan Dong,
Lan Ma,
Qiming Huang,
Ming Xu,
Yongbing Zhang,
Dongmei Yu,
Chenggang Yan,
Peiwu Qin
Abstract:
Deep learning techniques have shown great potential in medical image processing, particularly through accurate and reliable image segmentation on magnetic resonance imaging (MRI) scans or computed tomography (CT) scans, which allow the localization and diagnosis of lesions. However, training these segmentation models requires a large number of manually annotated pixel-level labels, which are time-…
▽ More
Deep learning techniques have shown great potential in medical image processing, particularly through accurate and reliable image segmentation on magnetic resonance imaging (MRI) scans or computed tomography (CT) scans, which allow the localization and diagnosis of lesions. However, training these segmentation models requires a large number of manually annotated pixel-level labels, which are time-consuming and labor-intensive, in contrast to image-level labels that are easier to obtain. It is imperative to resolve this problem through weakly-supervised semantic segmentation models using image-level labels as supervision since it can significantly reduce human annotation efforts. Most of the advanced solutions exploit class activation map** (CAM). However, the original CAMs rarely capture the precise boundaries of lesions. In this study, we propose the strategy of multi-scale inference to refine CAMs by reducing the detail loss in single-scale reasoning. For segmentation, we develop a novel model named Mixed-UNet, which has two parallel branches in the decoding phase. The results can be obtained after fusing the extracted features from two branches. We evaluate the designed Mixed-UNet against several prevalent deep learning-based segmentation approaches on our dataset collected from the local hospital and public datasets. The validation results demonstrate that our model surpasses available methods under the same supervision level in the segmentation of various lesions from brain imaging.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
Control Co-design of a Hydrokinetic Turbine with Open-loop Optimal Control
Authors:
Boxi Jiang,
Mohammad Reza Amini,
Yingqian Liao,
Joaquim R. R. A. Martins,
**g Sun
Abstract:
This paper introduces a control co-design (CCD) framework to simultaneously explore the physical parameters and control spaces for a hydro-kinetic turbine (HKT) rotor optimization. The optimization formulation incorporates a coupled dynamic-hydrodynamic model to maximize the rotor power efficiency for various time-variant flow profiles. The open-loop optimal control is applied for maximum power tr…
▽ More
This paper introduces a control co-design (CCD) framework to simultaneously explore the physical parameters and control spaces for a hydro-kinetic turbine (HKT) rotor optimization. The optimization formulation incorporates a coupled dynamic-hydrodynamic model to maximize the rotor power efficiency for various time-variant flow profiles. The open-loop optimal control is applied for maximum power tracking, and the blade element momentum theory (BEMT) is used to model the hydrodynamics. Case studies with different control constraints are investigated for CCD. Sensitivity analyses were conducted with respect to different flow profiles and initial geometries. Comparisons are made between CCD and the sequential process, with physical design followed by a control design process under the same conditions. The results demonstrate the benefits of CCD and reveal that, with control constraints, CCD leads to increased energy production compared to the design obtained from the sequential design process.
△ Less
Submitted 3 April, 2022;
originally announced April 2022.
-
A Paired Phase and Magnitude Reconstruction for Advanced Diffusion-Weighted Imaging
Authors:
Chen Qian,
Zi Wang,
Xinlin Zhang,
Boxuan Shi,
Boyu Jiang,
Ran Tao,
**g Li,
Yuwei Ge,
Taishan Kang,
Jianzhong Lin,
Di Guo,
Xiaobo Qu
Abstract:
Objective: Multi-shot interleaved echo planer imaging can obtain diffusion-weighted images (DWI) with high spatial resolution and low distortion, but suffers from ghost artifacts introduced by phase variations between shots. In this work, we aim at solving the challenging reconstructions under inter-shot motions between shots and a low signal-to-noise ratio. Methods: An explicit phase model with p…
▽ More
Objective: Multi-shot interleaved echo planer imaging can obtain diffusion-weighted images (DWI) with high spatial resolution and low distortion, but suffers from ghost artifacts introduced by phase variations between shots. In this work, we aim at solving the challenging reconstructions under inter-shot motions between shots and a low signal-to-noise ratio. Methods: An explicit phase model with paired phase and magnitude priors is proposed to regularize the reconstruction (PAIR). The former prior is derived from the smoothness of the shot phase and enforced with low-rankness in the k-space domain. The latter explores similar edges among multi-b-value and multi-direction DWI with weighted total variation in the image domain. Results: Extensive simulation and in vivo results show that PAIR can remove ghost artifacts very well under a high number of shots (8 shots) and significantly suppress the noise under the ultra-high b-value (4000 s/mm2). Conclusion: The explicit phase model PAIR with complementary priors has a good performance on challenging reconstructions under inter-shot motions between shots and a low signal-to-noise ratio. Significance: PAIR has great potential in advanced clinical DWI applications and brain function research.
△ Less
Submitted 8 December, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
Refining Self-Supervised Learning in Imaging: Beyond Linear Metric
Authors:
Bo Jiang,
Hamid Krim,
Tianfu Wu,
Derya Cansever
Abstract:
We introduce in this paper a new statistical perspective, exploiting the Jaccard similarity metric, as a measure-based metric to effectively invoke non-linear features in the loss of self-supervised contrastive learning. Specifically, our proposed metric may be interpreted as a dependence measure between two adapted projections learned from the so-called latent representations. This is in contrast…
▽ More
We introduce in this paper a new statistical perspective, exploiting the Jaccard similarity metric, as a measure-based metric to effectively invoke non-linear features in the loss of self-supervised contrastive learning. Specifically, our proposed metric may be interpreted as a dependence measure between two adapted projections learned from the so-called latent representations. This is in contrast to the cosine similarity measure in the conventional contrastive learning model, which accounts for correlation information. To the best of our knowledge, this effectively non-linearly fused information embedded in the Jaccard similarity, is novel to self-supervision learning with promising results. The proposed approach is compared to two state-of-the-art self-supervised contrastive learning methods on three image datasets. We not only demonstrate its amenable applicability in current ML problems, but also its improved performance and training efficiency.
△ Less
Submitted 13 October, 2022; v1 submitted 25 February, 2022;
originally announced February 2022.
-
Cerebrovascular morphology in aging and disease -- imaging biomarkers for ischemic stroke and Alzheimers disease
Authors:
Aditi Deshpande,
Nitya Kari,
Jordan Elliott McKenzie,
Bin Jiang,
Patrik Michel,
Nima Toosizadeh,
Pouya Tahsili Fahadan,
Chelsea Kidwell,
Max Wintermark,
Kaveh Laksari
Abstract:
Background and Purpose: Altered brain vasculature is a key phenomenon in several neurologic disorders. This paper presents a quantitative assessment of vascular morphology in healthy and diseased adults including changes during aging and the anatomical variations in the Circle of Willis (CoW). Methods: We used our automatic method to segment and extract novel geometric features of the cerebral vas…
▽ More
Background and Purpose: Altered brain vasculature is a key phenomenon in several neurologic disorders. This paper presents a quantitative assessment of vascular morphology in healthy and diseased adults including changes during aging and the anatomical variations in the Circle of Willis (CoW). Methods: We used our automatic method to segment and extract novel geometric features of the cerebral vasculature from MRA scans of 175 healthy subjects, 45 AIS, and 50 AD patients after spatial registration. This is followed by quantification and statistical analysis of vascular alterations in acute ischemic stroke (AIS) and Alzheimer's disease (AD), the biggest cerebrovascular and neurodegenerative disorders. Results: We determined that the CoW is fully formed in only 35 percent of healthy adults and found significantly increased tortuosity and fractality, with increasing age and with disease -- both AIS and AD. We also found significantly decreased vessel length, volume and number of branches in AIS patients. Lastly, we found that AD cerebral vessels exhibited significantly smaller diameter and more complex branching patterns, compared to age-matched healthy adults. These changes were significantly heightened with progression of AD from early onset to moderate-severe dementia. Conclusion: Altered vessel geometry in AIS patients shows that there is pathological morphology coupled with stroke. In AD due to pathological alterations in the endothelium or amyloid depositions leading to neuronal damage and hypoperfusion, vessel geometry is significantly altered even in mild or early dementia. The specific geometric features and quantitative comparisons demonstrate potential for using vascular morphology as a non-invasive imaging biomarker for neurologic disorders.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
EVBattery: A Large-Scale Electric Vehicle Dataset for Battery Health and Capacity Estimation
Authors:
Haowei He,
**gzhao Zhang,
Yanan Wang,
Benben Jiang,
Shaobo Huang,
Chen Wang,
Yang Zhang,
Gengang Xiong,
Xuebing Han,
Dongxu Guo,
Guannan He,
Minggao Ouyang
Abstract:
Electric vehicles (EVs) play an important role in reducing carbon emissions. As EV adoption accelerates, safety issues caused by EV batteries have become an important research topic. In order to benchmark and develop data-driven methods for this task, we introduce a large and comprehensive dataset of EV batteries. Our dataset includes charging records collected from hundreds of EVs from three manu…
▽ More
Electric vehicles (EVs) play an important role in reducing carbon emissions. As EV adoption accelerates, safety issues caused by EV batteries have become an important research topic. In order to benchmark and develop data-driven methods for this task, we introduce a large and comprehensive dataset of EV batteries. Our dataset includes charging records collected from hundreds of EVs from three manufacturers over several years. Our dataset is the first large-scale public dataset on real-world battery data, as existing data either include only several vehicles or is collected in the lab environment. Meanwhile, our dataset features two types of labels, corresponding to two key tasks - battery health estimation and battery capacity estimation. In addition to demonstrating how existing deep learning algorithms can be applied to this task, we further develop an algorithm that exploits the data structure of battery systems. Our algorithm achieves better results and shows that a customized method can improve model performances. We hope that this public dataset provides valuable resources for researchers, policymakers, and industry professionals to better understand the dynamics of EV battery aging and support the transition toward a sustainable transportation system.
△ Less
Submitted 1 November, 2023; v1 submitted 28 January, 2022;
originally announced January 2022.
-
Automatic ultrasound vessel segmentation with deep spatiotemporal context learning
Authors:
Baichuan Jiang,
Alvin Chen,
Shyam Bharat,
Mingxin Zheng
Abstract:
Accurate, real-time segmentation of vessel structures in ultrasound image sequences can aid in the measurement of lumen diameters and assessment of vascular diseases. This, however, remains a challenging task, particularly for extremely small vessels that are difficult to visualize. We propose to leverage the rich spatiotemporal context available in ultrasound to improve segmentation of small-scal…
▽ More
Accurate, real-time segmentation of vessel structures in ultrasound image sequences can aid in the measurement of lumen diameters and assessment of vascular diseases. This, however, remains a challenging task, particularly for extremely small vessels that are difficult to visualize. We propose to leverage the rich spatiotemporal context available in ultrasound to improve segmentation of small-scale lower-extremity arterial vasculature. We describe efficient deep learning methods that incorporate temporal, spatial, and feature-aware contextual embeddings at multiple resolution scales while jointly utilizing information from B-mode and Color Doppler signals. Evaluating on femoral and tibial artery scans performed on healthy subjects by an expert ultrasonographer, and comparing to consensus expert ground-truth annotations of inner lumen boundaries, we demonstrate real-time segmentation using the context-aware models and show that they significantly outperform comparable baseline approaches.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
Efficient CI-Based One-Bit Precoding for Multiuser Downlink Massive MIMO Systems with PSK Modulation
Authors:
Zheyu Wu,
Bo Jiang,
Ya-Feng Liu,
Mingjie Shao,
Yu-Hong Dai
Abstract:
In this paper, we consider the one-bit precoding problem for the multiuser downlink massive multiple-input multiple-output (MIMO) system with phase shift keying (PSK) modulation. We focus on the celebrated constructive interference (CI)-based problem formulation. We first establish the NP-hardness of the problem (even in the single-user case), which reveals the intrinsic difficulty of globally sol…
▽ More
In this paper, we consider the one-bit precoding problem for the multiuser downlink massive multiple-input multiple-output (MIMO) system with phase shift keying (PSK) modulation. We focus on the celebrated constructive interference (CI)-based problem formulation. We first establish the NP-hardness of the problem (even in the single-user case), which reveals the intrinsic difficulty of globally solving the problem. Then, we propose a novel negative $\ell_1$ penalty model for the considered problem, which penalizes the one-bit constraint into the objective by a negative $\ell_1$-norm term, and show the equivalence between (global and local) solutions of the original problem and the penalty problem when the penalty parameter is sufficiently large. We further transform the penalty model into an equivalent min-max problem and propose an efficient alternating proximal/projection gradient descent ascent (APGDA) algorithm for solving it, which performs a proximal gradient decent over one block of variables and a projection gradient ascent over the other block of variables alternately. The APGDA algorithm enjoys a low per-iteration complexity and is guaranteed to converge to a stationary point of the min-max problem and a local minimizer of the penalty problem. To further reduce the computational cost, we also propose a low-complexity implementation of the APGDA algorithm, where the values of the variables will be fixed in later iterations once they satisfy the one-bit constraint. Numerical results show that, compared to the state-of-the-art CI-based algorithms, both of the proposed algorithms generally achieve better bit-error-rate (BER) performance with lower computational cost.
△ Less
Submitted 10 October, 2023; v1 submitted 22 October, 2021;
originally announced October 2021.
-
A Novel Negative $\ell_1$ Penalty Approach for Multiuser One-Bit Massive MIMO Downlink with PSK Signaling
Authors:
Zheyu Wu,
Bo Jiang,
Ya-Feng Liu,
Yu-Hong Dai
Abstract:
This paper considers the one-bit precoding problem for the multiuser downlink massive multiple-input multiple-output (MIMO) system with phase shift keying (PSK) modulation and focuses on the celebrated constructive interference (CI)-based problem formulation. The existence of the discrete one-bit constraint makes the problem generally hard to solve. In this paper, we propose an efficient negative…
▽ More
This paper considers the one-bit precoding problem for the multiuser downlink massive multiple-input multiple-output (MIMO) system with phase shift keying (PSK) modulation and focuses on the celebrated constructive interference (CI)-based problem formulation. The existence of the discrete one-bit constraint makes the problem generally hard to solve. In this paper, we propose an efficient negative $\ell_1$ penalty approach for finding a high-quality solution of the considered problem. Specifically, we first propose a novel negative $\ell_1$ penalty model, which penalizes the one-bit constraint into the objective with a negative $\ell_1$-norm term, and show the equivalence between (global and local) solutions of the original problem and the penalty problem when the penalty parameter is sufficiently large. We further transform the penalty model into an equivalent min-max problem and propose an efficient alternating optimization (AO) algorithm for solving it. The AO algorithm enjoys low per-iteration complexity and is guaranteed to converge to the stationary point of the min-max problem. Numerical results show that, compared against the state-of-the-art CI-based algorithms, the proposed algorithm generally achieves better bit-error-rate (BER) performance with lower computational cost.
△ Less
Submitted 7 February, 2022; v1 submitted 10 October, 2021;
originally announced October 2021.
-
S2Looking: A Satellite Side-Looking Dataset for Building Change Detection
Authors:
Li Shen,
Yao Lu,
Hao Chen,
Hao Wei,
Donghai Xie,
Jiabao Yue,
Rui Chen,
Shouye Lv,
Bitao Jiang
Abstract:
Building-change detection underpins many important applications, especially in the military and crisis-management domains. Recent methods used for change detection have shifted towards deep learning, which depends on the quality of its training data. The assembly of large-scale annotated satellite imagery datasets is therefore essential for global building-change surveillance. Existing datasets al…
▽ More
Building-change detection underpins many important applications, especially in the military and crisis-management domains. Recent methods used for change detection have shifted towards deep learning, which depends on the quality of its training data. The assembly of large-scale annotated satellite imagery datasets is therefore essential for global building-change surveillance. Existing datasets almost exclusively offer near-nadir viewing angles. This limits the range of changes that can be detected. By offering larger observation ranges, the scroll imaging mode of optical satellites presents an opportunity to overcome this restriction. This paper therefore introduces S2Looking, a building-change-detection dataset that contains large-scale side-looking satellite images captured at various off-nadir angles. The dataset consists of 5000 bitemporal image pairs of rural areas and more than 65,920 annotated instances of changes throughout the world. The dataset can be used to train deep-learning-based change-detection algorithms. It expands upon existing datasets by providing (1) larger viewing angles; (2) large illumination variances; and (3) the added complexity of rural images. To facilitate {the} use of the dataset, a benchmark task has been established, and preliminary tests suggest that deep-learning algorithms find the dataset significantly more challenging than the closest-competing near-nadir dataset, LEVIR-CD+. S2Looking may therefore promote important advances in existing building-change-detection algorithms. The dataset is available at https://github.com/S2Looking/.
△ Less
Submitted 11 January, 2022; v1 submitted 19 July, 2021;
originally announced July 2021.
-
A new method for vehicle system safety design based on data mining with uncertainty modeling
Authors:
** Du,
Binhui Jiang,
Feng Zhu
Abstract:
In this research, a new data mining-based design approach has been developed for designing complex mechanical systems such as a crashworthy passenger car with uncertainty modeling. The method allows exploring the big crash simulation dataset to design the vehicle at multi-levels in a top-down manner (main energy absorbing system, components, and geometric features) and derive design rules based on…
▽ More
In this research, a new data mining-based design approach has been developed for designing complex mechanical systems such as a crashworthy passenger car with uncertainty modeling. The method allows exploring the big crash simulation dataset to design the vehicle at multi-levels in a top-down manner (main energy absorbing system, components, and geometric features) and derive design rules based on the whole vehicle body safety requirements to make decisions towards the component and sub-component level design. Full vehicle and component simulation datasets are mined to build decision trees, where the interrelationship among parameters can be revealed and the design rules are derived to produce designs with good performance. This method has been extended by accounting for the uncertainty in the design variables. A new decision tree algorithm for uncertain data (DTUD) is developed to produce the desired designs and evaluate the design performance variations due to the uncertainty in design variables. The framework of this method is implemented by combining the design of experiments (DOE) and crash finite element analysis (FEA) and then demonstrated by designing a passenger car subject to front impact. The results show that the new methodology could achieve the design objectives efficiently and effectively. By applying the new method, the reliability of the final designs is also increased greatly. This approach has the potential to be applied as a general design methodology for a wide range of complex structures and mechanical systems.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
Similarity Embedding Networks for Robust Human Activity Recognition
Authors:
Chenglin Li,
Carrie Lu Tong,
Di Niu,
Bei Jiang,
Xiao Zuo,
Lei Cheng,
Jian Xiong,
Jianming Yang
Abstract:
Deep learning models for human activity recognition (HAR) based on sensor data have been heavily studied recently. However, the generalization ability of deep models on complex real-world HAR data is limited by the availability of high-quality labeled activity data, which are hard to obtain. In this paper, we design a similarity embedding neural network that maps input sensor signals onto real vec…
▽ More
Deep learning models for human activity recognition (HAR) based on sensor data have been heavily studied recently. However, the generalization ability of deep models on complex real-world HAR data is limited by the availability of high-quality labeled activity data, which are hard to obtain. In this paper, we design a similarity embedding neural network that maps input sensor signals onto real vectors through carefully designed convolutional and LSTM layers. The embedding network is trained with a pairwise similarity loss, encouraging the clustering of samples from the same class in the embedded real space, and can be effectively trained on a small dataset and even on a noisy dataset with mislabeled samples. Based on the learned embeddings, we further propose both nonparametric and parametric approaches for activity recognition. Extensive evaluation based on two public datasets has shown that the proposed similarity embedding network significantly outperforms state-of-the-art deep models on HAR classification tasks, is robust to mislabeled samples in the training set, and can also be used to effectively denoise a noisy dataset.
△ Less
Submitted 31 May, 2021;
originally announced June 2021.
-
Meta-HAR: Federated Representation Learning for Human Activity Recognition
Authors:
Chenglin Li,
Di Niu,
Bei Jiang,
Xiao Zuo,
Jianming Yang
Abstract:
Human activity recognition (HAR) based on mobile sensors plays an important role in ubiquitous computing. However, the rise of data regulatory constraints precludes collecting private and labeled signal data from personal devices at scale. Federated learning has emerged as a decentralized alternative solution to model training, which iteratively aggregates locally updated models into a shared glob…
▽ More
Human activity recognition (HAR) based on mobile sensors plays an important role in ubiquitous computing. However, the rise of data regulatory constraints precludes collecting private and labeled signal data from personal devices at scale. Federated learning has emerged as a decentralized alternative solution to model training, which iteratively aggregates locally updated models into a shared global model, therefore being able to leverage decentralized, private data without central collection. However, the effectiveness of federated learning for HAR is affected by the fact that each user has different activity types and even a different signal distribution for the same activity type. Furthermore, it is uncertain if a single global model trained can generalize well to individual users or new users with heterogeneous data. In this paper, we propose Meta-HAR, a federated representation learning framework, in which a signal embedding network is meta-learned in a federated manner, while the learned signal representations are further fed into a personalized classification network at each user for activity prediction. In order to boost the representation ability of the embedding network, we treat the HAR problem at each user as a different task and train the shared embedding network through a Model-Agnostic Meta-learning framework, such that the embedding network can generalize to any individual user. Personalization is further achieved on top of the robustly learned representations in an adaptation procedure. We conducted extensive experiments based on two publicly available HAR datasets as well as a newly created HAR dataset. Results verify that Meta-HAR is effective at maintaining high test accuracies for individual users, including new users, and significantly outperforms several baselines, including Federated Averaging, Reptile and even centralized learning in certain cases.
△ Less
Submitted 31 May, 2021;
originally announced June 2021.
-
Tightness and Equivalence of Semidefinite Relaxations for MIMO Detection
Authors:
Ruichen Jiang,
Ya-Feng Liu,
Chenglong Bao,
Bo Jiang
Abstract:
The multiple-input multiple-output (MIMO) detection problem, a fundamental problem in modern digital communications, is to detect a vector of transmitted symbols from the noisy outputs of a fading MIMO channel. The maximum likelihood detector can be formulated as a complex least-squares problem with discrete variables, which is NP-hard in general. Various semidefinite relaxation (SDR) methods have…
▽ More
The multiple-input multiple-output (MIMO) detection problem, a fundamental problem in modern digital communications, is to detect a vector of transmitted symbols from the noisy outputs of a fading MIMO channel. The maximum likelihood detector can be formulated as a complex least-squares problem with discrete variables, which is NP-hard in general. Various semidefinite relaxation (SDR) methods have been proposed in the literature to solve the problem due to their polynomial-time worst-case complexity and good detection error rate performance. In this paper, we consider two popular classes of SDR-based detectors and study the conditions under which the SDRs are tight and the relationship between different SDR models. For the enhanced complex and real SDRs proposed recently by Lu et al., we refine their analysis and derive the necessary and sufficient condition for the complex SDR to be tight, as well as a necessary condition for the real SDR to be tight. In contrast, we also show that another SDR proposed by Mobasher et al. is not tight with high probability under mild conditions. Moreover, we establish a general theorem that shows the equivalence between two subsets of positive semidefinite matrices in different dimensions by exploiting a special "separable" structure in the constraints. Our theorem recovers two existing equivalence results of SDRs defined in different settings and has the potential to find other applications due to its generality.
△ Less
Submitted 8 February, 2021;
originally announced February 2021.
-
Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction
Authors:
Anzhu Yu,
Wenyue Guo,
Bing Liu,
Xin Chen,
Xin Wang,
Xuefeng Cao,
Bingchuan Jiang
Abstract:
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images. While previous learning based reconstruction approaches performed quite well, most of them estimate depth maps at a fixed resolution using plane sweep volumes with a fixed depth hypothesis at each plane, which requires densely sampled planes for desired accuracy and therefore is difficult to achiev…
▽ More
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images. While previous learning based reconstruction approaches performed quite well, most of them estimate depth maps at a fixed resolution using plane sweep volumes with a fixed depth hypothesis at each plane, which requires densely sampled planes for desired accuracy and therefore is difficult to achieve high resolution depth maps. In this paper we introduce a coarseto-fine depth inference strategy to achieve high resolution depth. This strategy estimates the depth map at coarsest level, while the depth maps at finer levels are considered as the upsampled depth map from previous level with pixel-wise depth residual. Thus, we narrow the depth searching range with priori information from previous level and construct new cost volumes from the pixel-wise depth residual to perform depth map refinement. Then the final depth map could be achieved iteratively since all the parameters are shared between different levels. At each level, the self-attention layer is introduced to the feature extraction block for capturing the long range dependencies for depth inference task, and the cost volume is generated using similarity measurement instead of the variance based methods used in previous work. Experiments were conducted on both the DTU benchmark dataset and recently released BlendedMVS dataset. The results demonstrated that our model could outperform most state-of-the-arts (SOTA) methods. The codebase of this project is at https://github.com/ArthasMil/AACVP-MVSNet.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
Dimensionality Reduction via Diffusion Map Improved with Supervised Linear Projection
Authors:
Bowen Jiang,
Maohao Shen
Abstract:
When performing classification tasks, raw high dimensional features often contain redundant information, and lead to increased computational complexity and overfitting. In this paper, we assume the data samples lie on a single underlying smooth manifold, and define intra-class and inter-class similarities using pairwise local kernel distances. We aim to find a linear projection to maximize the int…
▽ More
When performing classification tasks, raw high dimensional features often contain redundant information, and lead to increased computational complexity and overfitting. In this paper, we assume the data samples lie on a single underlying smooth manifold, and define intra-class and inter-class similarities using pairwise local kernel distances. We aim to find a linear projection to maximize the intra-class similarities and minimize the inter-class similarities simultaneously, so that the projected low dimensional data has optimized pairwise distances based on the label information, which is more suitable for a Diffusion Map to do further dimensionality reduction. Numerical experiments on several benchmark datasets show that our proposed approaches are able to extract low dimensional discriminate features that could help us achieve higher classification accuracy.
△ Less
Submitted 8 August, 2020;
originally announced August 2020.
-
DCAF: A Dynamic Computation Allocation Framework for Online Serving System
Authors:
Biye Jiang,
Pengye Zhang,
Rihan Chen,
Binding Dai,
Xinchen Luo,
Yin Yang,
Guan Wang,
Guorui Zhou,
Xiaoqiang Zhu,
Kun Gai
Abstract:
Modern large-scale systems such as recommender system and online advertising system are built upon computation-intensive infrastructure. The typical objective in these applications is to maximize the total revenue, e.g. GMV~(Gross Merchandise Volume), under a limited computation resource. Usually, the online serving system follows a multi-stage cascade architecture, which consists of several stage…
▽ More
Modern large-scale systems such as recommender system and online advertising system are built upon computation-intensive infrastructure. The typical objective in these applications is to maximize the total revenue, e.g. GMV~(Gross Merchandise Volume), under a limited computation resource. Usually, the online serving system follows a multi-stage cascade architecture, which consists of several stages including retrieval, pre-ranking, ranking, etc. These stages usually allocate resource manually with specific computing power budgets, which requires the serving configuration to adapt accordingly. As a result, the existing system easily falls into suboptimal solutions with respect to maximizing the total revenue. The limitation is due to the face that, although the value of traffic requests vary greatly, online serving system still spends equal computing power among them.
In this paper, we introduce a novel idea that online serving system could treat each traffic request differently and allocate "personalized" computation resource based on its value. We formulate this resource allocation problem as a knapsack problem and propose a Dynamic Computation Allocation Framework~(DCAF). Under some general assumptions, DCAF can theoretically guarantee that the system can maximize the total revenue within given computation budget. DCAF brings significant improvement and has been deployed in the display advertising system of Taobao for serving the main traffic. With DCAF, we are able to maintain the same business performance with 20\% computation resource reduction.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
Automatic Segmentation, Feature Extraction and Comparison of Healthy and Stroke Cerebral Vasculature
Authors:
Aditi Deshpande,
Nima Jamilpour,
Bin Jiang,
Chelsea Kidwell,
Max Wintermark,
Kaveh Laksari
Abstract:
Accurate segmentation of cerebral vasculature and a quantitative assessment of cerebrovascular morphology is critical to various diagnostic and therapeutic purposes and is pertinent to studying brain health and disease. However, this is still a challenging task due to the complexity of the vascular imaging data. We propose an automated method for cerebral vascular segmentation without the need of…
▽ More
Accurate segmentation of cerebral vasculature and a quantitative assessment of cerebrovascular morphology is critical to various diagnostic and therapeutic purposes and is pertinent to studying brain health and disease. However, this is still a challenging task due to the complexity of the vascular imaging data. We propose an automated method for cerebral vascular segmentation without the need of any manual intervention as well as a method to skeletonize the binary volume to extract vascular geometric features which can characterize vessel structure. We combine a probabilistic vessel-enhancing filtering with an active-contour technique to segment magnetic resonance and computed tomography angiograms (MRA and CTA) and subsequently extract the vessel centerlines and diameters to calculate the geometrical properties of the vasculature. Our method was validated using a 3D phantom of the Circle-of-Willis region with 84% mean Dice Similarity and 85% mean Pearson Correlation with minimal modified Hausdorff distance error. We applied this method to a dataset of healthy subjects and stroke patients and present a quantitative comparison between them. We found significant differences in the geometric features including total length (2.88 +/- 0.38 m for healthy and 2.20 +/- 0.67 m for stroke), volume (40.18 +/- 25.55 ml for healthy and 34.43 +/- 21.83 ml for stroke), tortuosity (3.24 +/- 0.88 rad/cm for healthy and 5.80 +/- 0.92 rad/cm for stroke) and fractality (box dimension 1.36 +/- 0.28 for healthy vs. 1.69 +/- 0.20 for stroke). This technique can be applied on any imaging modality and can be used in the future to automatically obtain the 3D segmented vasculature for diagnosis and treatment planning of Stroke and other cerebrovascular diseases (CVD) in the clinic and also to study the morphological changes caused by various CVD.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Dynamic Graph Learning based on Graph Laplacian
Authors:
Bo Jiang,
Ashkan Panahi,
Hamid Krim,
Yiyi Yu,
Spencer L. Smith
Abstract:
The purpose of this paper is to infer a global (collective) model of time-varying responses of a set of nodes as a dynamic graph, where the individual time series are respectively observed at each of the nodes. The motivation of this work lies in the search for a connectome model which properly captures brain functionality upon observing activities in different regions of the brain and possibly of…
▽ More
The purpose of this paper is to infer a global (collective) model of time-varying responses of a set of nodes as a dynamic graph, where the individual time series are respectively observed at each of the nodes. The motivation of this work lies in the search for a connectome model which properly captures brain functionality upon observing activities in different regions of the brain and possibly of individual neurons. We formulate the problem as a quadratic objective functional of observed node signals over short time intervals, subjected to the proper regularization reflecting the graph smoothness and other dynamics involving the underlying graph's Laplacian, as well as the time evolution smoothness of the underlying graph. The resulting joint optimization is solved by a continuous relaxation and an introduced novel gradient-projection scheme. We apply our algorithm to a real-world dataset comprising recorded activities of individual brain cells. The resulting model is shown to not only be viable but also efficiently computable.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
Local Information Privacy and Its Application to Privacy-Preserving Data Aggregation
Authors:
Bo Jiang,
Ming Li,
Ravi Tandon
Abstract:
In this paper, we study local information privacy (LIP), and design LIP based mechanisms for statistical aggregation while protecting users' privacy without relying on a trusted third party. The notion of context-awareness is incorporated in LIP, which can be viewed as explicit modeling of the adversary's background knowledge. It enables the design of privacy-preserving mechanisms leveraging the p…
▽ More
In this paper, we study local information privacy (LIP), and design LIP based mechanisms for statistical aggregation while protecting users' privacy without relying on a trusted third party. The notion of context-awareness is incorporated in LIP, which can be viewed as explicit modeling of the adversary's background knowledge. It enables the design of privacy-preserving mechanisms leveraging the prior distribution, which can potentially achieve a better utility-privacy tradeoff than context-free notions such as Local Differential Privacy (LDP). We present an optimization framework to minimize the mean square error in the data aggregation while protecting the privacy of each individual user's input data or a correlated latent variable while satisfying LIP constraints. Then, we study two different types of applications: (weighted) summation and histogram estimation and derive the optimal context-aware data perturbation parameters for each case, based on randomized response type of mechanism. We further compare the utility-privacy tradeoff between LIP and LDP and theoretically explain why the incorporation of prior knowledge enlarges feasible regions of the perturbation parameters, which thereby leads to higher utility. We also extend the LIP-based privacy mechanisms to the more general case when exact prior knowledge is not available. Finally, we validate our analysis by simulations using both synthetic and real-world data. Results show that our LIP-based privacy mechanism provides better utility-privacy tradeoffs than LDP, and the advantage of LIP is even more significant when the prior distribution is more skewed.
△ Less
Submitted 28 November, 2020; v1 submitted 8 January, 2020;
originally announced January 2020.