Search | arXiv e-print repository

arXiv:2406.09931 [pdf, other]

SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms

Authors: Yifei Chen, Zhu Zhu, Shenghao Zhu, Linwei Qiu, Binfeng Zou, Fan Jia, Yunpeng Zhu, Chenyan Zhang, Zhaojie Fang, Feiwei Qin, ** Fan, Changmiao Wang, Yu Gao, Gang Yu

Abstract: The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund… ▽ More The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redundant feature extraction when processing high-dimensional microimage data. We propose a novel fine-grained classification model, SCKansformer, for bone marrow blood cells, which addresses these challenges and enhances classification accuracy and efficiency. The model integrates the Kansformer Encoder, SCConv Encoder, and Global-Local Attention Encoder. The Kansformer Encoder replaces the traditional MLP layer with the KAN, improving nonlinear feature representation and interpretability. The SCConv Encoder, with its Spatial and Channel Reconstruction Units, enhances feature representation and reduces redundancy. The Global-Local Attention Encoder combines Multi-head Self-Attention with a Local Part module to capture both global and local features. We validated our model using the Bone Marrow Blood Cell Fine-Grained Classification Dataset (BMCD-FGCD), comprising over 10,000 samples and nearly 40 classifications, developed with a partner hospital. Comparative experiments on our private dataset, as well as the publicly available PBC and ALL-IDB datasets, demonstrate that SCKansformer outperforms both typical and advanced microcell classification methods across all datasets. Our source code and private BMCD-FGCD dataset are available at https://github.com/JustlfC03/SCKansformer. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 15 pages, 6 figures

arXiv:2406.03875 [pdf, other]

Energy-storing analysis and fishtail stiffness optimization for a wire-driven elastic robotic fish

Authors: Xiaocun Liao, Chao Zhou, Junfeng Fan, Zhuoliang Zhang, Zhaoran Yin, Liangwei Deng

Abstract: The robotic fish with high propulsion efficiency and good maneuverability achieves underwater fishlike propulsion by commonly adopting the motor to drive the fishtail, causing the significant fluctuations of the motor power due to the uneven swing speed of the fishtail in one swing cycle. Hence, we propose a wire-driven robotic fish with a spring-steel-based active-segment elastic spine. This bion… ▽ More The robotic fish with high propulsion efficiency and good maneuverability achieves underwater fishlike propulsion by commonly adopting the motor to drive the fishtail, causing the significant fluctuations of the motor power due to the uneven swing speed of the fishtail in one swing cycle. Hence, we propose a wire-driven robotic fish with a spring-steel-based active-segment elastic spine. This bionic spine can produce elastic deformation to store energy under the action of the wire driving and motor for responding to the fluctuations of the motor power. Further, we analyze the effects of the energy-storing of the active-segment elastic spine on the smoothness of motor power. Based on the developed Lagrangian dynamic model and cantilever beam model, the power-variance-based nonlinear optimization model for the stiffness of the active-segment elastic spine is established to respond to the sharp fluctuations of motor power during each fishtail swing cycle. Results validate that the energy-storing of the active-segment elastic spine plays a vital role in improving the power fluctuations and maximum frequency of the motor by adjusting its stiffness reasonably, which is beneficial to achieving high propulsion and high speed for robotic fish. Compared with the active-segment rigid spine that is incapable of storing energy, the energy-storing of the active-segment elastic spine is beneficial to increase the maximum frequency of the motor and the average thrust of the fishtail by 0.41 Hz, and 0.06 N, respectively. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 14 pages, 19 figures

arXiv:2404.17400 [pdf, other]

Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement

Authors: Zishu Yao, Guodong Fan, **fu Fan, Min Gan, C. L. Philip Chen

Abstract: Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range corre… ▽ More Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range correlations in such images. On the other hand, transformer-based methods that focus on global information face high computational complexities when processing high-resolution remote sensing images. From another perspective, Fourier transform can compute global information without introducing a large number of parameters, enabling the network to more efficiently capture the overall image structure and establish long-range correlations. Therefore, we propose a Dual-Domain Feature Fusion Network (DFFN) for low-light remote sensing image enhancement. Specifically, this challenging task of low-light enhancement is divided into two more manageable sub-tasks: the first phase learns amplitude information to restore image brightness, and the second phase learns phase information to refine details. To facilitate information exchange between the two phases, we designed an information fusion affine block that combines data from different phases and scales. Additionally, we have constructed two dark light remote sensing datasets to address the current lack of datasets in dark light remote sensing image enhancement. Extensive evaluations show that our method outperforms existing state-of-the-art methods. The code is available at https://github.com/iijjlk/DFFN. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 14 page

arXiv:2403.13236 [pdf, other]

Safety-Aware Reinforcement Learning for Electric Vehicle Charging Station Management in Distribution Network

Authors: Jiarong Fan, Ariel Liebman, Hao Wang

Abstract: The increasing integration of electric vehicles (EVs) into the grid can pose a significant risk to the distribution system operation in the absence of coordination. In response to the need for effective coordination of EVs within the distribution network, this paper presents a safety-aware reinforcement learning (RL) algorithm designed to manage EV charging stations while ensuring the satisfaction… ▽ More The increasing integration of electric vehicles (EVs) into the grid can pose a significant risk to the distribution system operation in the absence of coordination. In response to the need for effective coordination of EVs within the distribution network, this paper presents a safety-aware reinforcement learning (RL) algorithm designed to manage EV charging stations while ensuring the satisfaction of system constraints. Unlike existing methods, our proposed algorithm does not rely on explicit penalties for constraint violations, eliminating the need for penalty coefficient tuning. Furthermore, managing EV charging stations is further complicated by multiple uncertainties, notably the variability in solar energy generation and energy prices. To address this challenge, we develop an off-policy RL algorithm to efficiently utilize data to learn patterns in such uncertain environments. Our algorithm also incorporates a maximum entropy framework to enhance the RL algorithm's exploratory process, preventing convergence to local optimal solutions. Simulation results demonstrate that our algorithm outperforms traditional RL algorithms in managing EV charging in the distribution network. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 2024 IEEE Power & Energy Society General Meeting (PESGM)

arXiv:2310.00384 [pdf, ps, other]

Joint Power and 3D Trajectory Optimization for UAV-enabled Wireless Powered Communication Networks with Obstacles

Authors: Hongyang Pan, Yanheng Liu, Geng Sun, Junsong Fan, Shuang Liang, Chau Yuen

Abstract: Unmanned aerial vehicle (UAV)-enabled wireless powered communication networks (WPCNs) are promising technologies in 5G/6G wireless communications, while there are several challenges about UAV power allocation and scheduling to enhance the energy utilization efficiency, considering the existence of obstacles. In this work, we consider a UAV-enabled WPCN scenario that a UAV needs to cover the ground… ▽ More Unmanned aerial vehicle (UAV)-enabled wireless powered communication networks (WPCNs) are promising technologies in 5G/6G wireless communications, while there are several challenges about UAV power allocation and scheduling to enhance the energy utilization efficiency, considering the existence of obstacles. In this work, we consider a UAV-enabled WPCN scenario that a UAV needs to cover the ground wireless devices (WDs). During the coverage process, the UAV needs to collect data from the WDs and charge them simultaneously. To this end, we formulate a joint-UAV power and three-dimensional (3D) trajectory optimization problem (JUPTTOP) to simultaneously increase the total number of the covered WDs, increase the time efficiency, and reduce the total flying distance of UAV so as to improve the energy utilization efficiency in the network. Due to the difficulties and complexities, we decompose it into two sub optimization problems, which are the UAV power allocation optimization problem (UPAOP) and UAV 3D trajectory optimization problem (UTTOP), respectively. Then, we propose an improved non-dominated sorting genetic algorithm-II with K-means initialization operator and Variable dimension mechanism (NSGA-II-KV) for solving the UPAOP. For UTTOP, we first introduce a pretreatment method, and then use an improved particle swarm optimization with Normal distribution initialization, Genetic mechanism, Differential mechanism and Pursuit operator (PSO-NGDP) to deal with this sub optimization problem. Simulation results verify the effectiveness of the proposed strategies under different scales and settings of the networks. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2308.14111 [pdf, other]

MARL for Decentralized Electric Vehicle Charging Coordination with V2V Energy Exchange

Authors: Jiarong Fan, Hao Wang, Ariel Liebman

Abstract: Effective energy management of electric vehicle (EV) charging stations is critical to supporting the transport sector's sustainable energy transition. This paper addresses the EV charging coordination by considering vehicle-to-vehicle (V2V) energy exchange as the flexibility to harness in EV charging stations. Moreover, this paper takes into account EV user experiences, such as charging satisfacti… ▽ More Effective energy management of electric vehicle (EV) charging stations is critical to supporting the transport sector's sustainable energy transition. This paper addresses the EV charging coordination by considering vehicle-to-vehicle (V2V) energy exchange as the flexibility to harness in EV charging stations. Moreover, this paper takes into account EV user experiences, such as charging satisfaction and fairness. We propose a Multi-Agent Reinforcement Learning (MARL) approach to coordinate EV charging with V2V energy exchange while considering uncertainties in the EV arrival time, energy price, and solar energy generation. The exploration capability of MARL is enhanced by introducing parameter noise into MARL's neural network models. Experimental results demonstrate the superior performance and scalability of our proposed method compared to traditional optimization baselines. The decentralized execution of the algorithm enables it to effectively deal with partial system faults in the charging station. △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: IEEE IECON 2023 (The 49th Annual Conference of the IEEE Industrial Electronics Society)

arXiv:2308.09103 [pdf, other]

Efficient collision avoidance for autonomous vehicles in polygonal domains

Authors: Jiayu Fan, Nikolce Murgovski, Jun Liang

Abstract: This research focuses on trajectory planning problems for autonomous vehicles utilizing numerical optimal control techniques. The study reformulates the constrained optimization problem into a nonlinear programming problem, incorporating explicit collision avoidance constraints. We present three novel, exact formulations to describe collision constraints. The first formulation is derived from a pr… ▽ More This research focuses on trajectory planning problems for autonomous vehicles utilizing numerical optimal control techniques. The study reformulates the constrained optimization problem into a nonlinear programming problem, incorporating explicit collision avoidance constraints. We present three novel, exact formulations to describe collision constraints. The first formulation is derived from a proposition concerning the separation of a point and a convex set. We prove the separating proposition through De Morgan's laws. Then, leveraging the hyperplane separation theorem we propose two efficient reformulations. Compared with the existing dual formulations and the first formulation, they significantly reduce the number of auxiliary variables to be optimized and inequality constraints within the nonlinear programming problem. Finally, the efficacy of the proposed formulations is demonstrated in the context of typical autonomous parking scenarios compared with state of the art. For generality, we design three initial guesses to assess the computational effort required for convergence to solutions when using the different collision formulations. The results illustrate that the scheme employing De Morgan's laws performs equally well with those utilizing dual formulations, while the other two schemes based on hyperplane separation theorem exhibit the added benefit of requiring lower computational resources. △ Less

Submitted 12 December, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

Comments: 10 pages,2 figures

arXiv:2308.07946 [pdf, other]

DSFNet: Dual-GCN and Location-fused Self-attention with Weighted Fast Normalized Fusion for Polyps Segmentation

Authors: Juntong Fan, Debesh Jha, Tieyong Zeng, Dayang Wang

Abstract: Polyps segmentation poses a significant challenge in medical imaging due to the flat surface of polyps and their texture similarity to surrounding tissues. This similarity gives rise to difficulties in establishing a clear boundary between polyps and the surrounding mucosa, leading to complications such as local overexposure and the presence of bright spot reflections in imaging. To counter this p… ▽ More Polyps segmentation poses a significant challenge in medical imaging due to the flat surface of polyps and their texture similarity to surrounding tissues. This similarity gives rise to difficulties in establishing a clear boundary between polyps and the surrounding mucosa, leading to complications such as local overexposure and the presence of bright spot reflections in imaging. To counter this problem, we propose a new dual graph convolution network (Dual-GCN) and location self-attention mechanisms with weighted fast normalization fusion model, named DSFNet. First, we introduce a feature enhancement block module based on Dual-GCN module to enhance local spatial and structural information extraction with fine granularity. Second, we introduce a location fused self-attention module to enhance the model's awareness and capacity to capture global information. Finally, the weighted fast normalized fusion method with trainable weights is introduced to efficiently integrate the feature maps from encoder, bottleneck, and decoder, thus promoting information transmission and facilitating the semantic consistency. Experimental results show that the proposed model surpasses other state-of-the-art models in gold standard indicators, such as Dice, MAE, and IoU. Both quantitative and qualitative analysis indicate that the proposed model demonstrates exceptional capability in polyps segmentation and has great potential clinical significance. We have shared our code on anonymous website for evaluation. △ Less

Submitted 27 November, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

Comments: 10 pages, 6 figures, 3 tables

arXiv:2307.10166 [pdf, other]

Adversarial Latent Autoencoder with Self-Attention for Structural Image Synthesis

Authors: Jiajie Fan, Laure Vuaille, Hao Wang, Thomas Bäck

Abstract: Generative Engineering Design approaches driven by Deep Generative Models (DGM) have been proposed to facilitate industrial engineering processes. In such processes, designs often come in the form of images, such as blueprints, engineering drawings, and CAD models depending on the level of detail. DGMs have been successfully employed for synthesis of natural images, e.g., displaying animals, human… ▽ More Generative Engineering Design approaches driven by Deep Generative Models (DGM) have been proposed to facilitate industrial engineering processes. In such processes, designs often come in the form of images, such as blueprints, engineering drawings, and CAD models depending on the level of detail. DGMs have been successfully employed for synthesis of natural images, e.g., displaying animals, human faces and landscapes. However, industrial design images are fundamentally different from natural scenes in that they contain rich structural patterns and long-range dependencies, which are challenging for convolution-based DGMs to generate. Moreover, DGM-driven generation process is typically triggered based on random noisy inputs, which outputs unpredictable samples and thus cannot perform an efficient industrial design exploration. We tackle these challenges by proposing a novel model Self-Attention Adversarial Latent Autoencoder (SA-ALAE), which allows generating feasible design images of complex engineering parts. With SA-ALAE, users can not only explore novel variants of an existing design, but also control the generation process by operating in latent space. The potential of SA-ALAE is shown by generating engineering blueprints in a real automotive design task. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 18 pages, 8 figures

arXiv:2306.16342 [pdf, other]

Specific Beamforming for Multi-UAV Networks: A Dual Identity-based ISAC Approach

Authors: Yanpeng Cui, Qixun Zhang, Zhiyong Feng, Fan Liu, Ce Shi, **po Fan, ** Zhang

Abstract: Beam alignment is essential to compensate for the high path loss in the millimeter-wave (mmWave) Unmanned Aerial Vehicle (UAV) network. The integrated sensing and communication (ISAC) technology has been envisioned as a promising solution to enable efficient beam alignment in the dynamic UAV network. However, since the digital identity (D-ID) is not contained in the reflected echoes, the conventio… ▽ More Beam alignment is essential to compensate for the high path loss in the millimeter-wave (mmWave) Unmanned Aerial Vehicle (UAV) network. The integrated sensing and communication (ISAC) technology has been envisioned as a promising solution to enable efficient beam alignment in the dynamic UAV network. However, since the digital identity (D-ID) is not contained in the reflected echoes, the conventional ISAC solution has to either periodically feed back the D-ID to distinguish beams for multi-UAVs or suffer the beam errors induced by the separation of D-ID and physical identity (P-ID). This paper presents a novel dual identity association (DIA)-based ISAC approach, the first solution that enables specific, fast, and accurate beamforming towards multiple UAVs. In particular, the P-IDs extracted from echo signals are distinguished dynamically by calculating the feature similarity according to their prevalence, and thus the DIA is accurately achieved. We also present the extended Kalman filtering scheme to track and predict P-IDs, and the specific beam is thereby effectively aligned toward the intended UAVs in dynamic networks. Numerical results show that the proposed DIA-based ISAC solution significantly outperforms the conventional methods in association accuracy and communication performance. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: 7 pages, 8 figures

arXiv:2304.07278 [pdf, ps, other]

Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning

Authors: Gen Li, Yuling Yan, Yuxin Chen, Jianqing Fan

Abstract: This paper studies reward-agnostic exploration in reinforcement learning (RL) -- a scenario where the learner is unware of the reward functions during the exploration stage -- and designs an algorithm that improves over the state of the art. More precisely, consider a finite-horizon inhomogeneous Markov decision process with $S$ states, $A$ actions, and horizon length $H$, and suppose that there a… ▽ More This paper studies reward-agnostic exploration in reinforcement learning (RL) -- a scenario where the learner is unware of the reward functions during the exploration stage -- and designs an algorithm that improves over the state of the art. More precisely, consider a finite-horizon inhomogeneous Markov decision process with $S$ states, $A$ actions, and horizon length $H$, and suppose that there are no more than a polynomial number of given reward functions of interest. By collecting an order of \begin{align*} \frac{SAH^3}{\varepsilon^2} \text{ sample episodes (up to log factor)} \end{align*} without guidance of the reward information, our algorithm is able to find $\varepsilon$-optimal policies for all these reward functions, provided that $\varepsilon$ is sufficiently small. This forms the first reward-agnostic exploration scheme in this context that achieves provable minimax optimality. Furthermore, once the sample size exceeds $\frac{S^2AH^3}{\varepsilon^2}$ episodes (up to log factor), our algorithm is able to yield $\varepsilon$ accuracy for arbitrarily many reward functions (even when they are adversarially designed), a task commonly dubbed as ``reward-free exploration.'' The novelty of our algorithm design draws on insights from offline RL: the exploration scheme attempts to maximize a critical reward-agnostic quantity that dictates the performance of offline RL, while the policy learning paradigm leverages ideas from sample-optimal offline RL paradigms. △ Less

Submitted 23 May, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: accepted for presentation in COLT 2024

arXiv:2304.01218 [pdf, other]

POLAR-Express: Efficient and Precise Formal Reachability Analysis of Neural-Network Controlled Systems

Authors: Yixuan Wang, Weichao Zhou, Jiameng Fan, Zhilu Wang, Jiajun Li, Xin Chen, Chao Huang, Wenchao Li, Qi Zhu

Abstract: Neural networks (NNs) playing the role of controllers have demonstrated impressive empirical performances on challenging control problems. However, the potential adoption of NN controllers in real-life applications also gives rise to a growing concern over the safety of these neural-network controlled systems (NNCSs), especially when used in safety-critical applications. In this work, we present P… ▽ More Neural networks (NNs) playing the role of controllers have demonstrated impressive empirical performances on challenging control problems. However, the potential adoption of NN controllers in real-life applications also gives rise to a growing concern over the safety of these neural-network controlled systems (NNCSs), especially when used in safety-critical applications. In this work, we present POLAR-Express, an efficient and precise formal reachability analysis tool for verifying the safety of NNCSs. POLAR-Express uses Taylor model arithmetic to propagate Taylor models (TMs) across a neural network layer-by-layer to compute an overapproximation of the neural-network function. It can be applied to analyze any feed-forward neural network with continuous activation functions. We also present a novel approach to propagate TMs more efficiently and precisely across ReLU activation functions. In addition, POLAR-Express provides parallel computation support for the layer-by-layer propagation of TMs, thus significantly improving the efficiency and scalability over its earlier prototype POLAR. Across the comparison with six other state-of-the-art tools on a diverse set of benchmarks, POLAR-Express achieves the best verification efficiency and tightness in the reachable set analysis. △ Less

Submitted 5 April, 2023; v1 submitted 31 March, 2023; originally announced April 2023.

arXiv:2301.01997 [pdf, ps, other]

Data-Driven Inverse Reinforcement Learning for Expert-Learner Zero-Sum Games

Authors: Wenqian Xue, Bosen Lian, Jialu Fan, Tianyou Chai, Frank L. Lewis

Abstract: In this paper, we formulate inverse reinforcement learning (IRL) as an expert-learner interaction whereby the optimal performance intent of an expert or target agent is unknown to a learner agent. The learner observes the states and controls of the expert and hence seeks to reconstruct the expert's cost function intent and thus mimics the expert's optimal response. Next, we add non-cooperative dis… ▽ More In this paper, we formulate inverse reinforcement learning (IRL) as an expert-learner interaction whereby the optimal performance intent of an expert or target agent is unknown to a learner agent. The learner observes the states and controls of the expert and hence seeks to reconstruct the expert's cost function intent and thus mimics the expert's optimal response. Next, we add non-cooperative disturbances that seek to disrupt the learning and stability of the learner agent. This leads to the formulation of a new interaction we call zero-sum game IRL. We develop a framework to solve the zero-sum game IRL problem that is a modified extension of RL policy iteration (PI) to allow unknown expert performance intentions to be computed and non-cooperative disturbances to be rejected. The framework has two parts: a value function and control action update based on an extension of PI, and a cost function update based on standard inverse optimal control. Then, we eventually develop an off-policy IRL algorithm that does not require knowledge of the expert and learner agent dynamics and performs single-loop learning. Rigorous proofs and analyses are given. Finally, simulation experiments are presented to show the effectiveness of the new approach. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Comments: 9 pages, 3 figures

arXiv:2211.14548 [pdf, other]

Contextual Expressive Text-to-Speech

Authors: Jianhong Tu, Zeyu Cui, Xiaohuan Zhou, Siqi Zheng, Kai Hu, Ju Fan, Chang Zhou

Abstract: The goal of expressive Text-to-speech (TTS) is to synthesize natural speech with desired content, prosody, emotion, or timbre, in high expressiveness. Most of previous studies attempt to generate speech from given labels of styles and emotions, which over-simplifies the problem by classifying styles and emotions into a fixed number of pre-defined categories. In this paper, we introduce a new task… ▽ More The goal of expressive Text-to-speech (TTS) is to synthesize natural speech with desired content, prosody, emotion, or timbre, in high expressiveness. Most of previous studies attempt to generate speech from given labels of styles and emotions, which over-simplifies the problem by classifying styles and emotions into a fixed number of pre-defined categories. In this paper, we introduce a new task setting, Contextual TTS (CTTS). The main idea of CTTS is that how a person speaks depends on the particular context she is in, where the context can typically be represented as text. Thus, in the CTTS task, we propose to utilize such context to guide the speech synthesis process instead of relying on explicit labels of styles and emotions. To achieve this task, we construct a synthetic dataset and develop an effective framework. Experiments show that our framework can generate high-quality expressive speech based on the given context both in synthetic datasets and real-world scenarios. △ Less

Submitted 26 November, 2022; originally announced November 2022.

Comments: Submitted to ICASSP 2023

arXiv:2211.02419 [pdf, other]

High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss

Authors: Yucong Lin, **hua Su, Yuhang Li, Yuhao Wei, Hanchao Yan, Saining Zhang, Jiaan Luo, Danni Ai, Hong Song, **gfan Fan, Tianyu Fu, Deqiang Xiao, Feifei Wang, Jue Hou, Jian Yang

Abstract: Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We develope… ▽ More Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2206.15069 [pdf, other]

PVT-COV19D: Pyramid Vision Transformer for COVID-19 Diagnosis

Authors: Lilang Zheng, Jiaxuan Fang, Xiaorun Tang, Hanzhang Li, Jiaxin Fan, Tianyi Wang, Rui Zhou, Zhaoyan Yan

Abstract: With the outbreak of COVID-19, a large number of relevant studies have emerged in recent years. We propose an automatic COVID-19 diagnosis framework based on lung CT scan images, the PVT-COV19D. In order to accommodate the different dimensions of the image input, we first classified the images using Transformer models, then sampled the images in the dataset according to normal distribution, and fe… ▽ More With the outbreak of COVID-19, a large number of relevant studies have emerged in recent years. We propose an automatic COVID-19 diagnosis framework based on lung CT scan images, the PVT-COV19D. In order to accommodate the different dimensions of the image input, we first classified the images using Transformer models, then sampled the images in the dataset according to normal distribution, and fed the sampling results into the modified PVTv2 model for training. A large number of experiments on the COV19-CT-DB dataset demonstrate the effectiveness of the proposed method. △ Less

Submitted 30 June, 2022; originally announced June 2022.

Comments: 8 pages,1 figure

arXiv:2203.12296 [pdf, ps, other]

Active Intelligent Reflecting Surface Assisted Secure Air-to-Ground Communication with UAV Jittering

Authors: Yimeng Ge, Jiancun Fan

Abstract: Unmanned Aerial Vehicles (UAV)-enabled communication is a promising solution for secure air-to-ground (A2G) networks due to the additional secure degrees of freedom afforded by mobility. However, the jittering characteristics caused by the random airflow and the body vibration of the UAV itself have a non-negligible impact on the performance of UAV communication. Considering the impact of UAV jitt… ▽ More Unmanned Aerial Vehicles (UAV)-enabled communication is a promising solution for secure air-to-ground (A2G) networks due to the additional secure degrees of freedom afforded by mobility. However, the jittering characteristics caused by the random airflow and the body vibration of the UAV itself have a non-negligible impact on the performance of UAV communication. Considering the impact of UAV jittering, this paper propose a robust and secure transmission design assisted by an novel active intelligent reflecting surface (IRS),where the reflecting elements in IRS not only adjust the phase shift but also amplify the amplitude of signals. Specifically, under the worst-case secrecy rate constraints, we aim to minimize the transmission power by the robust joint design of active IRS's reflecting coefficient and beamforming at the UAV-borne base station (UBS). However,it is challenging to solve the joint optimization problem due to its non-convexity. To tackle this problem, the non-convex problem is reformulated with linear approximation for the channel variations and linear matrix inequality transformed by S-procedure and Schur's complement. Then, we decouple this problem into two sub-problems, namely, passive beamforming and active IRS's reflecting coefficient optimization, and solve them through alternate optimization (AO). Finally, the numerical results demonstrate the potential of active IRS on power saving under secure transmission constraints and the impact of UAV jittering. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2201.00767 [pdf, other]

doi 10.1117/12.2606785

BDG-Net: Boundary Distribution Guided Network for Accurate Polyp Segmentation

Authors: Zihuan Qiu, Zhichuan Wang, Miaomiao Zhang, Ziyong Xu, Jie Fan, Linfeng Xu

Abstract: Colorectal cancer (CRC) is one of the most common fatal cancer in the world. Polypectomy can effectively interrupt the progression of adenoma to adenocarcinoma, thus reducing the risk of CRC development. Colonoscopy is the primary method to find colonic polyps. However, due to the different sizes of polyps and the unclear boundary between polyps and their surrounding mucosa, it is challenging to s… ▽ More Colorectal cancer (CRC) is one of the most common fatal cancer in the world. Polypectomy can effectively interrupt the progression of adenoma to adenocarcinoma, thus reducing the risk of CRC development. Colonoscopy is the primary method to find colonic polyps. However, due to the different sizes of polyps and the unclear boundary between polyps and their surrounding mucosa, it is challenging to segment polyps accurately. To address this problem, we design a Boundary Distribution Guided Network (BDG-Net) for accurate polyp segmentation. Specifically, under the supervision of the ideal Boundary Distribution Map (BDM), we use Boundary Distribution Generate Module (BDGM) to aggregate high-level features and generate BDM. Then, BDM is sent to the Boundary Distribution Guided Decoder (BDGD) as complementary spatial information to guide the polyp segmentation. Moreover, a multi-scale feature interaction strategy is adopted in BDGD to improve the segmentation accuracy of polyps with different sizes. Extensive quantitative and qualitative evaluations demonstrate the effectiveness of our model, which outperforms state-of-the-art models remarkably on five public polyp datasets while maintaining low computational complexity. Code: https://github.com/zihuanqiu/BDG-Net △ Less

Submitted 17 April, 2022; v1 submitted 3 January, 2022; originally announced January 2022.

Comments: Accepted by SPIE Medical Imaging 2022

Journal ref: Proc. SPIE 12032, Medical Imaging 2022: Image Processing, 1203230 (4 April 2022)

arXiv:2111.15638 [pdf, other]

Radio-Frequency Multi-Mode OAM Detection Based on UCA Samples Learning

Authors: Jiabei Fan, Rui Chen, Wen-Xuan Long, Marco Moretti, Jiandong Li

Abstract: Orbital angular momentum (OAM) at radio-frequency provides a novel approach of multiplexing a set of orthogonal modes on the same frequency channel to achieve high spectral efficiencies. However, classical phase gradient-based OAM mode detection methods require perfect alignment of transmit and receive antennas, which greatly challenges the practical application of OAM communications. In this pape… ▽ More Orbital angular momentum (OAM) at radio-frequency provides a novel approach of multiplexing a set of orthogonal modes on the same frequency channel to achieve high spectral efficiencies. However, classical phase gradient-based OAM mode detection methods require perfect alignment of transmit and receive antennas, which greatly challenges the practical application of OAM communications. In this paper, we first show the effect of non-parallel misalignment on the OAM phase structure, and then propose the OAM mode detection method based on uniform circular array (UCA) samples learning for the more general alignment or non-parallel misalignment case. Specifically, we applied three classifiers: K-nearest neighbor (KNN), support vector machine (SVM), and back-propagation neural network (BPNN) to both single-mode and multi-mode OAM detection. The simulation results validate that the proposed learning-based OAM mode detection methods are robust to misalignment errors and especially BPNN classifier has the best generalization performance. △ Less

Submitted 29 November, 2021; originally announced November 2021.

arXiv:2106.14591 [pdf, other]

ACN: Adversarial Co-training Network for Brain Tumor Segmentation with Missing Modalities

Authors: Yixin Wang, Yang Zhang, Yang Liu, Zihao Lin, Jiang Tian, Cheng Zhong, Zhongchao Shi, Jian** Fan, Zhiqiang He

Abstract: Accurate segmentation of brain tumors from magnetic resonance imaging (MRI) is clinically relevant in diagnoses, prognoses and surgery treatment, which requires multiple modalities to provide complementary morphological and physiopathologic information. However, missing modality commonly occurs due to image corruption, artifacts, different acquisition protocols or allergies to certain contrast age… ▽ More Accurate segmentation of brain tumors from magnetic resonance imaging (MRI) is clinically relevant in diagnoses, prognoses and surgery treatment, which requires multiple modalities to provide complementary morphological and physiopathologic information. However, missing modality commonly occurs due to image corruption, artifacts, different acquisition protocols or allergies to certain contrast agents in clinical practice. Though existing efforts demonstrate the possibility of a unified model for all missing situations, most of them perform poorly when more than one modality is missing. In this paper, we propose a novel Adversarial Co-training Network (ACN) to solve this issue, in which a series of independent yet related models are trained dedicated to each missing situation with significantly better results. Specifically, ACN adopts a novel co-training network, which enables a coupled learning process for both full modality and missing modality to supplement each other's domain and feature representations, and more importantly, to recover the `missing' information of absent modalities. Then, two unsupervised modules, i.e., entropy and knowledge adversarial learning modules are proposed to minimize the domain gap while enhancing prediction reliability and encouraging the alignment of latent representations, respectively. We also adapt modality-mutual information knowledge transfer learning to ACN to retain the rich mutual information among modalities. Extensive experiments on BraTS2018 dataset show that our proposed method significantly outperforms all state-of-the-art methods under any missing situation. △ Less

Submitted 29 June, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: MICCAI 2021

arXiv:2106.13867 [pdf, other]

POLAR: A Polynomial Arithmetic Framework for Verifying Neural-Network Controlled Systems

Authors: Chao Huang, Jiameng Fan, Zhilu Wang, Yixuan Wang, Weichao Zhou, Jiajun Li, Xin Chen, Wenchao Li, Qi Zhu

Abstract: We present POLAR, a polynomial arithmetic-based framework for efficient bounded-time reachability analysis of neural-network controlled systems (NNCSs). Existing approaches that leverage the standard Taylor Model (TM) arithmetic for approximating the neural-network controller cannot deal with non-differentiable activation functions and suffer from rapid explosion of the remainder when propagating… ▽ More We present POLAR, a polynomial arithmetic-based framework for efficient bounded-time reachability analysis of neural-network controlled systems (NNCSs). Existing approaches that leverage the standard Taylor Model (TM) arithmetic for approximating the neural-network controller cannot deal with non-differentiable activation functions and suffer from rapid explosion of the remainder when propagating the TMs. POLAR overcomes these shortcomings by integrating TM arithmetic with \textbf{Bernstein B{é}zier Form} and \textbf{symbolic remainder}. The former enables TM propagation across non-differentiable activation functions and local refinement of TMs, and the latter reduces error accumulation in the TM remainder for linear map**s in the network. Experimental results show that POLAR significantly outperforms the current state-of-the-art tools in terms of both efficiency and tightness of the reachable set overapproximation. The source code can be found in https://github.com/ChaoHuang2018/POLAR_Tool △ Less

Submitted 24 December, 2022; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: Accepted by ATVA 2022

arXiv:2012.08496 [pdf, other]

doi 10.1561/2200000079

Spectral Methods for Data Science: A Statistical Perspective

Authors: Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma

Abstract: Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data. In a nutshell, spectral methods refer to a collection of algorithms built upon the eigenvalues (resp. singular values) and eigenvectors (resp. singular vectors) of some properly designed matrices constructed from data. A diverse array of applications hav… ▽ More Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data. In a nutshell, spectral methods refer to a collection of algorithms built upon the eigenvalues (resp. singular values) and eigenvectors (resp. singular vectors) of some properly designed matrices constructed from data. A diverse array of applications have been found in machine learning, data science, and signal processing. Due to their simplicity and effectiveness, spectral methods are not only used as a stand-alone estimator, but also frequently employed to initialize other more sophisticated algorithms to improve performance. While the studies of spectral methods can be traced back to classical matrix perturbation theory and methods of moments, the past decade has witnessed tremendous theoretical advances in demystifying their efficacy through the lens of statistical modeling, with the aid of non-asymptotic random matrix theory. This monograph aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective, highlighting their algorithmic implications in diverse large-scale applications. In particular, our exposition gravitates around several central questions that span various applications: how to characterize the sample efficiency of spectral methods in reaching a target level of statistical accuracy, and how to assess their stability in the face of random noise, missing data, and adversarial corruptions? In addition to conventional $\ell_2$ perturbation analysis, we present a systematic $\ell_{\infty}$ and $\ell_{2,\infty}$ perturbation theory for eigenspace and singular subspaces, which has only recently become available owing to a powerful "leave-one-out" analysis framework. △ Less

Submitted 18 September, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

Journal ref: Foundations and Trends in Machine Learning: Vol. 14: No. 5, pp. 566-806, 2021

arXiv:2011.14611 [pdf, other]

SIR: Self-supervised Image Rectification via Seeing the Same Scene from Multiple Different Lenses

Authors: **long Fan, **g Zhang, Dacheng Tao

Abstract: Deep learning has demonstrated its power in image rectification by leveraging the representation capacity of deep neural networks via supervised training based on a large-scale synthetic dataset. However, the model may overfit the synthetic images and generalize not well on real-world fisheye images due to the limited universality of a specific distortion model and the lack of explicitly modeling… ▽ More Deep learning has demonstrated its power in image rectification by leveraging the representation capacity of deep neural networks via supervised training based on a large-scale synthetic dataset. However, the model may overfit the synthetic images and generalize not well on real-world fisheye images due to the limited universality of a specific distortion model and the lack of explicitly modeling the distortion and rectification process. In this paper, we propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of a same scene from different lens should be the same. Specifically, we devise a new network architecture with a shared encoder and several prediction heads, each of which predicts the distortion parameter of a specific distortion model. We further leverage a differentiable war** module to generate the rectified images and re-distorted images from the distortion parameters and exploit the intra- and inter-model consistency between them during training, thereby leading to a self-supervised learning scheme without the need for ground-truth distortion parameters or normal images. Experiments on synthetic dataset and real-world fisheye images demonstrate that our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods. Self-supervised learning also improves the universality of distortion models while kee** their self-consistency. △ Less

Submitted 18 June, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

arXiv:2011.12108 [pdf, other]

Wide-angle Image Rectification: A Survey

Authors: **long Fan, **g Zhang, Stephen J. Maybank, Dacheng Tao

Abstract: Wide field-of-view (FOV) cameras, which capture a larger scene area than narrow FOV cameras, are used in many applications including 3D reconstruction, autonomous driving, and video surveillance. However, wide-angle images contain distortions that violate the assumptions underlying pinhole camera models, resulting in object distortion, difficulties in estimating scene distance, area, and direction… ▽ More Wide field-of-view (FOV) cameras, which capture a larger scene area than narrow FOV cameras, are used in many applications including 3D reconstruction, autonomous driving, and video surveillance. However, wide-angle images contain distortions that violate the assumptions underlying pinhole camera models, resulting in object distortion, difficulties in estimating scene distance, area, and direction, and preventing the use of off-the-shelf deep models trained on undistorted images for downstream computer vision tasks. Image rectification, which aims to correct these distortions, can solve these problems. In this paper, we comprehensively survey progress in wide-angle image rectification from transformation models to rectification methods. Specifically, we first present a detailed description and discussion of the camera models used in different approaches. Then, we summarize several distortion models including radial distortion and projection distortion. Next, we review both traditional geometry-based image rectification methods and deep learning-based methods, where the former formulate distortion parameter estimation as an optimization problem and the latter treat it as a regression problem by leveraging the power of deep neural networks. We evaluate the performance of state-of-the-art methods on public datasets and show that although both kinds of methods can achieve good results, these methods only work well for specific camera models and distortion types. We also provide a strong baseline model and carry out an empirical study of different distortion models on synthetic datasets and real-world wide-angle images. Finally, we discuss several potential research directions that are expected to further advance this area in the future. △ Less

Submitted 1 December, 2021; v1 submitted 30 October, 2020; originally announced November 2020.

Comments: Accepted by the International Journal of Computer Vision (IJCV). Both the datasets and source code are available at https://github.com/loong8888/WAIR

arXiv:2010.14009 [pdf, other]

Long Short-Term Memory Neuron Equalizer

Authors: Zihao Wang, Zhifei Xu, Jiayi He, Chulsoon Hwang, Jun Fan, Hervé Delingette

Abstract: In this work we propose a neuromorphic hardware based signal equalizer by based on the deep learning implementation. The proposed neural equalizer is plasticity trainable equalizer which is different from traditional model designed based DFE. A trainable Long Short-Term memory neural network based DFE architecture is proposed for signal recovering and digital implementation is evaluated through FP… ▽ More In this work we propose a neuromorphic hardware based signal equalizer by based on the deep learning implementation. The proposed neural equalizer is plasticity trainable equalizer which is different from traditional model designed based DFE. A trainable Long Short-Term memory neural network based DFE architecture is proposed for signal recovering and digital implementation is evaluated through FPGA implementation. Constructing with modelling based equalization methods, the proposed approach is compatible to multiple frequency signal equalization instead of single type signal equalization. We shows quantitatively that the neuronmorphic equalizer which is amenable both analog and digital implementation outperforms in different metrics in comparison with benchmarks approaches. The proposed method is adaptable both for general neuromorphic computing or ASIC instruments. △ Less

Submitted 26 October, 2020; originally announced October 2020.

arXiv:2008.11659 [pdf]

doi 10.1038/s41566-021-00796-w

Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit

Authors: Tiankuang Zhou, Xing Lin, Jiamin Wu, Yitong Chen, Hao Xie, Yipeng Li, **tao Fan, Huaqiang Wu, Lu Fang, Qionghai Dai

Abstract: Application-specific optical processors have been considered disruptive technologies for modern computing that can fundamentally accelerate the development of artificial intelligence (AI) by offering substantially improved computing performance. Recent advancements in optical neural network architectures for neural information processing have been applied to perform various machine learning tasks.… ▽ More Application-specific optical processors have been considered disruptive technologies for modern computing that can fundamentally accelerate the development of artificial intelligence (AI) by offering substantially improved computing performance. Recent advancements in optical neural network architectures for neural information processing have been applied to perform various machine learning tasks. However, the existing architectures have limited complexity and performance; and each of them requires its own dedicated design that cannot be reconfigured to switch between different neural network models for different applications after deployment. Here, we propose an optoelectronic reconfigurable computing paradigm by constructing a diffractive processing unit (DPU) that can efficiently support different neural networks and achieve a high model complexity with millions of neurons. It allocates almost all of its computational operations optically and achieves extremely high speed of data modulation and large-scale network parameter updating by dynamically programming optical modulators and photodetectors. We demonstrated the reconfiguration of the DPU to implement various diffractive feedforward and recurrent neural networks and developed a novel adaptive training approach to circumvent the system imperfections. We applied the trained networks for high-speed classifying of handwritten digit images and human action videos over benchmark datasets, and the experimental results revealed a comparable classification accuracy to the electronic computing approaches. Furthermore, our prototype system built with off-the-shelf optoelectronic components surpasses the performance of state-of-the-art graphics processing units (GPUs) by several times on computing speed and more than an order of magnitude on system energy efficiency. △ Less

Submitted 26 August, 2020; originally announced August 2020.

arXiv:2008.01724 [pdf, other]

Convex and Nonconvex Optimization Are Both Minimax-Optimal for Noisy Blind Deconvolution under Random Designs

Authors: Yuxin Chen, Jianqing Fan, Bingyan Wang, Yuling Yan

Abstract: We investigate the effectiveness of convex relaxation and nonconvex optimization in solving bilinear systems of equations under two different designs (i.e.$~$a sort of random Fourier design and Gaussian design). Despite the wide applicability, the theoretical understanding about these two paradigms remains largely inadequate in the presence of random noise. The current paper makes two contribution… ▽ More We investigate the effectiveness of convex relaxation and nonconvex optimization in solving bilinear systems of equations under two different designs (i.e.$~$a sort of random Fourier design and Gaussian design). Despite the wide applicability, the theoretical understanding about these two paradigms remains largely inadequate in the presence of random noise. The current paper makes two contributions by demonstrating that: (1) a two-stage nonconvex algorithm attains minimax-optimal accuracy within a logarithmic number of iterations. (2) convex relaxation also achieves minimax-optimal statistical accuracy vis-à-vis random noise. Both results significantly improve upon the state-of-the-art theoretical guarantees. △ Less

Submitted 12 July, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

arXiv:2007.05220 [pdf, other]

Efficient Unpaired Image Dehazing with Cyclic Perceptual-Depth Supervision

Authors: Chen Liu, Jiaqi Fan, Guosheng Yin

Abstract: Image dehazing without paired haze-free images is of immense importance, as acquiring paired images often entails significant cost. However, we observe that previous unpaired image dehazing approaches tend to suffer from performance degradation near depth borders, where depth tends to vary abruptly. Hence, we propose to anneal the depth border degradation in unpaired image dehazing with cyclic per… ▽ More Image dehazing without paired haze-free images is of immense importance, as acquiring paired images often entails significant cost. However, we observe that previous unpaired image dehazing approaches tend to suffer from performance degradation near depth borders, where depth tends to vary abruptly. Hence, we propose to anneal the depth border degradation in unpaired image dehazing with cyclic perceptual-depth supervision. Coupled with the dual-path feature re-using backbones of the generators and discriminators, our model achieves $\mathbf{20.36}$ Peak Signal-to-Noise Ratio (PSNR) on NYU Depth V2 dataset, significantly outperforming its predecessors with reduced Floating Point Operations (FLOPs). △ Less

Submitted 10 July, 2020; originally announced July 2020.

arXiv:2007.00084 [pdf, other]

doi 10.1038/s41578-020-00260-1

Deep neural networks for the evaluation and design of photonic devices

Authors: Jiaqi Jiang, Mingkun Chen, Jonathan A. Fan

Abstract: The data sciences revolution is poised to transform the way photonic systems are simulated and designed. Photonics are in many ways an ideal substrate for machine learning: the objective of much of computational electromagnetics is the capture of non-linear relationships in high dimensional spaces, which is the core strength of neural networks. Additionally, the mainstream availability of Maxwell… ▽ More The data sciences revolution is poised to transform the way photonic systems are simulated and designed. Photonics are in many ways an ideal substrate for machine learning: the objective of much of computational electromagnetics is the capture of non-linear relationships in high dimensional spaces, which is the core strength of neural networks. Additionally, the mainstream availability of Maxwell solvers makes the training and evaluation of neural networks broadly accessible and tailorable to specific problems. In this Review, we will show how deep neural networks, configured as discriminative networks, can learn from training sets and operate as high-speed surrogate electromagnetic solvers. We will also examine how deep generative networks can learn geometric features in device distributions and even be configured to serve as robust global optimizers. Fundamental data sciences concepts framed within the context of photonics will also be discussed, including the network training process, delineation of different network classes and architectures, and dimensionality reduction. △ Less

Submitted 30 June, 2020; originally announced July 2020.

Comments: Review paper

arXiv:2005.00966 [pdf, other]

doi 10.1016/j.media.2022.102395

Boundary-aware Context Neural Network for Medical Image Segmentation

Authors: Ruxin Wang, Shuyuan Chen, Chaojie Ji, Jian** Fan, Ye Li

Abstract: Medical image segmentation can provide a reliable basis for further clinical analysis and disease diagnosis. The performance of medical image segmentation has been significantly advanced with the convolutional neural networks (CNNs). However, most existing CNNs-based methods often produce unsatisfactory segmentation mask without accurate object boundaries. This is caused by the limited context inf… ▽ More Medical image segmentation can provide a reliable basis for further clinical analysis and disease diagnosis. The performance of medical image segmentation has been significantly advanced with the convolutional neural networks (CNNs). However, most existing CNNs-based methods often produce unsatisfactory segmentation mask without accurate object boundaries. This is caused by the limited context information and inadequate discriminative feature maps after consecutive pooling and convolution operations. In that the medical image is characterized by the high intra-class variation, inter-class indistinction and noise, extracting powerful context and aggregating discriminative features for fine-grained segmentation are still challenging today. In this paper, we formulate a boundary-aware context neural network (BA-Net) for 2D medical image segmentation to capture richer context and preserve fine spatial information. BA-Net adopts encoder-decoder architecture. In each stage of encoder network, pyramid edge extraction module is proposed for obtaining edge information with multiple granularities firstly. Then we design a mini multi-task learning module for jointly learning to segment object masks and detect lesion boundaries. In particular, a new interactive attention is proposed to bridge two tasks for achieving information complementarity between different tasks, which effectively leverages the boundary information for offering a strong cue to better segmentation prediction. At last, a cross feature fusion module aims to selectively aggregate multi-level features from the whole encoder network. By cascaded three modules, richer context and fine-grain features of each stage are encoded. Extensive experiments on five datasets show that the proposed BA-Net outperforms state-of-the-art approaches. △ Less

Submitted 2 May, 2020; originally announced May 2020.

Journal ref: Medical Image Analysis, 2022

arXiv:2004.03064 [pdf, other]

Coarse-to-Fine Gaze Redirection with Numerical and Pictorial Guidance

Authors: **g**g Chen, Jichao Zhang, Enver Sangineto, Jiayuan Fan, Tao Chen, Nicu Sebe

Abstract: Gaze redirection aims at manipulating the gaze of a given face image with respect to a desired direction (i.e., a reference angle) and it can be applied to many real life scenarios, such as video-conferencing or taking group photos. However, previous work on this topic mainly suffers of two limitations: (1) Low-quality image generation and (2) Low redirection precision. In this paper, we propose t… ▽ More Gaze redirection aims at manipulating the gaze of a given face image with respect to a desired direction (i.e., a reference angle) and it can be applied to many real life scenarios, such as video-conferencing or taking group photos. However, previous work on this topic mainly suffers of two limitations: (1) Low-quality image generation and (2) Low redirection precision. In this paper, we propose to alleviate these problems by means of a novel gaze redirection framework which exploits both a numerical and a pictorial direction guidance, jointly with a coarse-to-fine learning strategy. Specifically, the coarse branch learns the spatial transformation which warps input image according to desired gaze. On the other hand, the fine-grained branch consists of a generator network with conditional residual image learning and a multi-task discriminator. This second branch reduces the gap between the previously warped image and the ground-truth image and recovers finer texture details. Moreover, we propose a numerical and pictorial guidance module~(NPG) which uses a pictorial gazemap description and numerical angles as an extra guide to further improve the precision of gaze redirection. Extensive experiments on a benchmark dataset show that the proposed method outperforms the state-of-the-art approaches in terms of both image quality and redirection precision. The code is available at https://github.com/**g**gchen777/CFGR △ Less

Submitted 26 November, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: 12 pages, accepted by WACV 2021

arXiv:2003.10620 [pdf, ps, other]

Joint Optimization of Spectrum and Energy Efficiency Considering the C-V2X Security: A Deep Reinforcement Learning Approach

Authors: Zhipeng Liu, Yinhui Han, Jianwei Fan, Lin Zhang, Yunzhi Lin

Abstract: Cellular vehicle-to-everything (C-V2X) communication, as a part of 5G wireless communication, has been considered one of the most significant techniques for Smart City. Vehicles platooning is an application of Smart City that improves traffic capacity and safety by C-V2X. However, different from vehicles platooning travelling on highways, C-V2X could be more easily eavesdropped and the spectrum re… ▽ More Cellular vehicle-to-everything (C-V2X) communication, as a part of 5G wireless communication, has been considered one of the most significant techniques for Smart City. Vehicles platooning is an application of Smart City that improves traffic capacity and safety by C-V2X. However, different from vehicles platooning travelling on highways, C-V2X could be more easily eavesdropped and the spectrum resource could be limited when they converge at an intersection. Satisfying the secrecy rate of C-V2X, how to increase the spectrum efficiency (SE) and energy efficiency (EE) in the platooning network is a big challenge. In this paper, to solve this problem, we propose a Security-Aware Approach to Enhancing SE and EE Based on Deep Reinforcement Learning, named SEED. The SEED formulates an objective optimization function considering both SE and EE, and the secrecy rate of C-V2X is treated as a critical constraint of this function. The optimization problem is transformed into the spectrum and transmission power selections of V2V and V2I links using deep Q network (DQN). The heuristic result of SE and EE is obtained by the DQN policy based on rewards. Finally, we simulate the traffic and communication environments using Python. The evaluation results demonstrate that the SEED outperforms the DQN-wopa algorithm and the baseline algorithm by 31.83 % and 68.40 % in efficiency. Source code for the SEED is available at https://github.com/BandaidZ/OptimizationofSEandEEBasedonDRL. △ Less

Submitted 23 March, 2020; originally announced March 2020.

arXiv:2003.03007 [pdf, ps, other]

Unifying Graph Embedding Features with Graph Convolutional Networks for Skeleton-based Action Recognition

Authors: Dong Yang, Monica Mengqi Li, Hong Fu, Jicong Fan, Zhao Zhang, Howard Leung

Abstract: Combining skeleton structure with graph convolutional networks has achieved remarkable performance in human action recognition. Since current research focuses on designing basic graph for representing skeleton data, these embedding features contain basic topological information, which cannot learn more systematic perspectives from skeleton data. In this paper, we overcome this limitation by propos… ▽ More Combining skeleton structure with graph convolutional networks has achieved remarkable performance in human action recognition. Since current research focuses on designing basic graph for representing skeleton data, these embedding features contain basic topological information, which cannot learn more systematic perspectives from skeleton data. In this paper, we overcome this limitation by proposing a novel framework, which unifies 15 graph embedding features into the graph convolutional network for human action recognition, aiming to best take advantage of graph information to distinguish key joints, bones, and body parts in human action, instead of being exclusive to a single feature or domain. Additionally, we fully investigate how to find the best graph features of skeleton structure for improving human action recognition. Besides, the topological information of the skeleton sequence is explored to further enhance the performance in a multi-stream framework. Moreover, the unified graph features are extracted by the adaptive methods on the training process, which further yields improvements. Our model is validated by three large-scale datasets, namely NTU-RGB+D, Kinetics and SYSU-3D, and outperforms the state-of-the-art methods. Overall, our work unified graph embedding features to promotes systematic research on human action recognition. △ Less

Submitted 11 October, 2022; v1 submitted 5 March, 2020; originally announced March 2020.

arXiv:2002.09026 [pdf]

Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix

Authors: Jianyu Fan, Eric Nichols, Daniel Tompkins, Ana Elisa Mendez Mendez, Benjamin Elizalde, Philippe Pasquier

Abstract: Realistic recordings of soundscapes often have multiple sound events co-occurring, such as car horns, engine and human voices. Sound event retrieval is a type of content-based search aiming at finding audio samples, similar to an audio query based on their acoustic or semantic content. State of the art sound event retrieval models have focused on single-label audio recordings, with only one sound… ▽ More Realistic recordings of soundscapes often have multiple sound events co-occurring, such as car horns, engine and human voices. Sound event retrieval is a type of content-based search aiming at finding audio samples, similar to an audio query based on their acoustic or semantic content. State of the art sound event retrieval models have focused on single-label audio recordings, with only one sound event occurring, rather than on multi-label audio recordings (i.e., multiple sound events occur in one recording). To address this latter problem, we propose different Deep Learning architectures with a Siamese-structure and a Pairwise Presence Matrix. The networks are trained and evaluated using the SONYC-UST dataset containing both single- and multi-label soundscape recordings. The performance results show the effectiveness of our proposed model. △ Less

Submitted 20 February, 2020; originally announced February 2020.

Comments: Paper accepted for 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

arXiv:2002.09021 [pdf]

A Comparative Study of Western and Chinese Classical Music based on Soundscape Models

Authors: Jianyu Fan, Yi-Hsuan Yang, Kui Dong, Philippe Pasquier

Abstract: Whether literally or suggestively, the concept of soundscape is alluded in both modern and ancient music. In this study, we examine whether we can analyze and compare Western and Chinese classical music based on soundscape models. We addressed this question through a comparative study. Specifically, corpora of Western classical music excerpts (WCMED) and Chinese classical music excerpts (CCMED) we… ▽ More Whether literally or suggestively, the concept of soundscape is alluded in both modern and ancient music. In this study, we examine whether we can analyze and compare Western and Chinese classical music based on soundscape models. We addressed this question through a comparative study. Specifically, corpora of Western classical music excerpts (WCMED) and Chinese classical music excerpts (CCMED) were curated and annotated with emotional valence and arousal through a crowdsourcing experiment. We used a sound event detection (SED) and soundscape emotion recognition (SER) models with transfer learning to predict the perceived emotion of WCMED and CCMED. The results show that both SER and SED models could be used to analyze Chinese and Western classical music. The fact that SER and SED work better on Chinese classical music emotion recognition provides evidence that certain similarities exist between Chinese classical music and soundscape recordings, which permits transferability between machine learning models. △ Less

Submitted 20 February, 2020; originally announced February 2020.

Comments: Paper accepted for 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

arXiv:2001.05484 [pdf, other]

Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing Data

Authors: Yuxin Chen, Jianqing Fan, Cong Ma, Yuling Yan

Abstract: This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation, in the presence of (1) random noise, (2) gross sparse outliers, and (3) missing data. This problem, often dubbed as robust principal component analysis (robust PCA), finds applications in various domains. Despite the wide applicability of convex relaxation, the available statistic… ▽ More This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation, in the presence of (1) random noise, (2) gross sparse outliers, and (3) missing data. This problem, often dubbed as robust principal component analysis (robust PCA), finds applications in various domains. Despite the wide applicability of convex relaxation, the available statistical support (particularly the stability analysis vis-à-vis random noise) remains highly suboptimal, which we strengthen in this paper. When the unknown matrix is well-conditioned, incoherent, and of constant rank, we demonstrate that a principled convex program achieves near-optimal statistical accuracy, in terms of both the Euclidean loss and the $\ell_{\infty}$ loss. All of this happens even when nearly a constant fraction of observations are corrupted by outliers with arbitrary magnitudes. The key analysis idea lies in bridging the convex program in use and an auxiliary nonconvex optimization algorithm, and hence the title of this paper. △ Less

Submitted 28 February, 2021; v1 submitted 15 January, 2020; originally announced January 2020.

Comments: accepted to the Annals of Statistics

Journal ref: Annals of Statistics, vol. 49, no. 5, pp. 2948-2971, 2021

arXiv:1912.03661 [pdf]

Adaptive Trajectory Estimation with Power Limited Steering Model under Perturbation Compensation

Authors: Weipeng Li, Xiaogang Yang, Ruitao Lu, Jiwei Fan, Tao Zhang, Chuan He

Abstract: Trajectory estimation of maneuvering objects is applied in numerous tasks like navigation, path planning and visual tracking. Many previous works get impressive results in the strictly controlled condition with accurate prior statistics and dedicated dynamic model for certain object. But in challenging conditions without dedicated dynamic model and precise prior statistics, the performance of thes… ▽ More Trajectory estimation of maneuvering objects is applied in numerous tasks like navigation, path planning and visual tracking. Many previous works get impressive results in the strictly controlled condition with accurate prior statistics and dedicated dynamic model for certain object. But in challenging conditions without dedicated dynamic model and precise prior statistics, the performance of these methods significantly declines. To solve the problem, a dynamic model called the power-limited steering model (PLS) is proposed to describe the motion of non-cooperative object. It is a natural combination of instantaneous power and instantaneous angular velocity, which relies on the nonlinearity instead of the state switching probability to achieve switching of states. And the renormalization group is introduced to compensate the nonlinear effect of perturbation in PLS model. For robust and efficient trajectory estimation, an adaptive trajectory estimation (AdaTE) algorithm is proposed. By updating the statistics and truncation time online, it corrects the estimation error caused by biased prior statistics and observation drift, while reducing the computational complexity lower than O(n). The experiment of trajectory estimation demonstrates the convergence of AdaTE, and the better robust to the biased prior statistics and the observation drift compared with EKF, UKF and sparse MAP. Other experiments demonstrate through slight modification, AdaTE can also be applied to local navigation in random obstacle environment, and trajectory optimization in visual tracking. △ Less

Submitted 1 July, 2020; v1 submitted 8 December, 2019; originally announced December 2019.

Comments: 19 pages, 7 figures

ACM Class: G.3.13; J.2.7

arXiv:1911.13029 [pdf, other]

Progressive-Growing of Generative Adversarial Networks for Metasurface Optimization

Authors: Fufang Wen, Jiaqi Jiang, Jonathan A. Fan

Abstract: Generative adversarial networks, which can generate metasurfaces based on a training set of high performance device layouts, have the potential to significantly reduce the computational cost of the metasurface design process. However, basic GAN architectures are unable to fully capture the detailed features of topologically complex metasurfaces, and generated devices therefore require additional c… ▽ More Generative adversarial networks, which can generate metasurfaces based on a training set of high performance device layouts, have the potential to significantly reduce the computational cost of the metasurface design process. However, basic GAN architectures are unable to fully capture the detailed features of topologically complex metasurfaces, and generated devices therefore require additional computationally-expensive design refinement. In this Letter, we show that GANs can better learn spatially fine features from high-resolution training data by progressively growing its network architecture and training set. Our results indicate that with this training methodology, the best generated devices have performances that compare well with the best devices produced by gradient-based topology optimization, thereby eliminating the need for additional design refinement. We envision that this network training method can generalize to other physical systems where device performance is strongly correlated with fine geometric structuring. △ Less

Submitted 2 December, 2019; v1 submitted 29 November, 2019; originally announced November 2019.

arXiv:1911.09762 [pdf, other]

Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models

Authors: Zhiyun Lu, Liangliang Cao, Yu Zhang, Chung-Cheng Chiu, James Fan

Abstract: In this paper, we propose to use pre-trained features from end-to-end ASR models to solve speech sentiment analysis as a down-stream task. We show that end-to-end ASR features, which integrate both acoustic and text information from speech, achieve promising results. We use RNN with self-attention as the sentiment classifier, which also provides an easy visualization through attention weights to h… ▽ More In this paper, we propose to use pre-trained features from end-to-end ASR models to solve speech sentiment analysis as a down-stream task. We show that end-to-end ASR features, which integrate both acoustic and text information from speech, achieve promising results. We use RNN with self-attention as the sentiment classifier, which also provides an easy visualization through attention weights to help interpret model predictions. We use well benchmarked IEMOCAP dataset and a new large-scale speech sentiment dataset SWBD-sentiment for evaluation. Our approach improves the-state-of-the-art accuracy on IEMOCAP from 66.6% to 71.7%, and achieves an accuracy of 70.10% on SWBD-sentiment with more than 49,500 utterances. △ Less

Submitted 4 March, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

arXiv:1910.03729 [pdf, other]

Large-scale Gastric Cancer Screening and Localization Using Multi-task Deep Neural Network

Authors: Hong Yu, Xiaofan Zhang, Lingjun Song, Liren Jiang, Xiaodi Huang, Wen Chen, Chenbin Zhang, Jiahui Li, Jiji Yang, Zhiqiang Hu, Qi Duan, Wanyuan Chen, Xianglei He, **shuang Fan, Weihai Jiang, Li Zhang, Chengmin Qiu, Minmin Gu, Weiwei Sun, Yangqiong Zhang, Guangyin Peng, Weiwei Shen, Guohui Fu

Abstract: Gastric cancer is one of the most common cancers, which ranks third among the leading causes of cancer death. Biopsy of gastric mucosa is a standard procedure in gastric cancer screening test. However, manual pathological inspection is labor-intensive and time-consuming. Besides, it is challenging for an automated algorithm to locate the small lesion regions in the gigapixel whole-slide image and… ▽ More Gastric cancer is one of the most common cancers, which ranks third among the leading causes of cancer death. Biopsy of gastric mucosa is a standard procedure in gastric cancer screening test. However, manual pathological inspection is labor-intensive and time-consuming. Besides, it is challenging for an automated algorithm to locate the small lesion regions in the gigapixel whole-slide image and make the decision correctly.To tackle these issues, we collected large-scale whole-slide image dataset with detailed lesion region annotation and designed a whole-slide image analyzing framework consisting of 3 networks which could not only determine the screening result but also present the suspicious areas to the pathologist for reference. Experiments demonstrated that our proposed framework achieves sensitivity of 97.05% and specificity of 92.72% in screening task and Dice coefficient of 0.8331 in segmentation task. Furthermore, we tested our best model in real-world scenario on 10,315 whole-slide images collected from 4 medical centers. △ Less

Submitted 19 September, 2020; v1 submitted 8 October, 2019; originally announced October 2019.

Comments: under minor revision

arXiv:1906.10654 [pdf, other]

ReachNN: Reachability Analysis of Neural-Network Controlled Systems

Authors: Chao Huang, Jiameng Fan, Wenchao Li, Xin Chen, Qi Zhu

Abstract: Applying neural networks as controllers in dynamical systems has shown great promises. However, it is critical yet challenging to verify the safety of such control systems with neural-network controllers in the loop. Previous methods for verifying neural network controlled systems are limited to a few specific activation functions. In this work, we propose a new reachability analysis approach base… ▽ More Applying neural networks as controllers in dynamical systems has shown great promises. However, it is critical yet challenging to verify the safety of such control systems with neural-network controllers in the loop. Previous methods for verifying neural network controlled systems are limited to a few specific activation functions. In this work, we propose a new reachability analysis approach based on Bernstein polynomials that can verify neural-network controlled systems with a more general form of activation functions, i.e., as long as they ensure that the neural networks are Lipschitz continuous. Specifically, we consider abstracting feedforward neural networks with Bernstein polynomials for a small subset of inputs. To quantify the error introduced by abstraction, we provide both theoretical error bound estimation based on the theory of Bernstein polynomials and more practical sampling based error bound estimation, following a tight Lipschitz constant estimation approach based on forward reachability analysis. Compared with previous methods, our approach addresses a much broader set of neural networks, including heterogeneous neural networks that contain multiple types of activation functions. Experiment results on a variety of benchmarks show the effectiveness of our approach. △ Less

Submitted 25 June, 2019; originally announced June 2019.

arXiv:1906.04159 [pdf, other]

doi 10.1073/pnas.1910053116

Inference and Uncertainty Quantification for Noisy Matrix Completion

Authors: Yuxin Chen, Jianqing Fan, Cong Ma, Yuling Yan

Abstract: Noisy matrix completion aims at estimating a low-rank matrix given only partial and corrupted entries. Despite substantial progress in designing efficient estimation algorithms, it remains largely unclear how to assess the uncertainty of the obtained estimates and how to perform statistical inference on the unknown matrix (e.g.~constructing a valid and short confidence interval for an unseen entry… ▽ More Noisy matrix completion aims at estimating a low-rank matrix given only partial and corrupted entries. Despite substantial progress in designing efficient estimation algorithms, it remains largely unclear how to assess the uncertainty of the obtained estimates and how to perform statistical inference on the unknown matrix (e.g.~constructing a valid and short confidence interval for an unseen entry). This paper takes a step towards inference and uncertainty quantification for noisy matrix completion. We develop a simple procedure to compensate for the bias of the widely used convex and nonconvex estimators. The resulting de-biased estimators admit nearly precise non-asymptotic distributional characterizations, which in turn enable optimal construction of confidence intervals\,/\,regions for, say, the missing entries and the low-rank factors. Our inferential procedures do not rely on sample splitting, thus avoiding unnecessary loss of data efficiency. As a byproduct, we obtain a sharp characterization of the estimation accuracy of our de-biased estimators, which, to the best of our knowledge, are the first tractable algorithms that provably achieve full statistical efficiency (including the preconstant). The analysis herein is built upon the intimate link between convex and nonconvex optimization --- an appealing feature recently discovered by \cite{chen2019noisy}. △ Less

Submitted 14 November, 2019; v1 submitted 10 June, 2019; originally announced June 2019.

Comments: published at Proceedings of the National Academy of Sciences Nov 2019, 116 (46) 22931-22937

arXiv:1901.06776 [pdf]

An Improved Dipole Extraction Method From Magnitude-Only Electromagnetic-Field Data

Authors: Chunyu Wu, Ze Sun, Xu Wang, Yansheng Wang, Ben Kim, Jun Fan

Abstract: Infinitesimal electric and magnetic dipoles are widely used as an equivalent radiating source model. In this paper, an improved method for dipole extraction from magnitude-only electromagnetic-field data based on genetic algorithm and back-and-forth iteration algorithm [1] is proposed. Compared with conventional back-and-forth iteration algorithm, this method offers an automatic flow to extract th… ▽ More Infinitesimal electric and magnetic dipoles are widely used as an equivalent radiating source model. In this paper, an improved method for dipole extraction from magnitude-only electromagnetic-field data based on genetic algorithm and back-and-forth iteration algorithm [1] is proposed. Compared with conventional back-and-forth iteration algorithm, this method offers an automatic flow to extract the equivalent dipoles without prior decision of the type, position, orientation and number of dipoles. It can be easily applied to electromagnetic-field data on arbitrarily shaped surfaces and minimize the number of required dipoles. The extracted dipoles can be close to original radiating structure, thus being physical. Compared with conventional genetic algorithm based method, this method reduces the optimization time and will not easily get trapped into local minima during optimization, thus being more robust. This method is validated by both simulation data and measurement data and its advantages are proved. The potential application of this method in phase retrieval is also discussed. △ Less

Submitted 20 January, 2019; originally announced January 2019.

arXiv:1811.12804 [pdf, other]

Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed Low-Rank Matrices

Authors: Yuxin Chen, Chen Cheng, Jianqing Fan

Abstract: This paper is concerned with the interplay between statistical asymmetry and spectral methods. Suppose we are interested in estimating a rank-1 and symmetric matrix $\mathbf{M}^{\star}\in \mathbb{R}^{n\times n}$, yet only a randomly perturbed version $\mathbf{M}$ is observed. The noise matrix $\mathbf{M}-\mathbf{M}^{\star}$ is composed of zero-mean independent (but not necessarily homoscedastic) e… ▽ More This paper is concerned with the interplay between statistical asymmetry and spectral methods. Suppose we are interested in estimating a rank-1 and symmetric matrix $\mathbf{M}^{\star}\in \mathbb{R}^{n\times n}$, yet only a randomly perturbed version $\mathbf{M}$ is observed. The noise matrix $\mathbf{M}-\mathbf{M}^{\star}$ is composed of zero-mean independent (but not necessarily homoscedastic) entries and is, therefore, not symmetric in general. This might arise, for example, when we have two independent samples for each entry of $\mathbf{M}^{\star}$ and arrange them into an {\em asymmetric} data matrix $\mathbf{M}$. The aim is to estimate the leading eigenvalue and eigenvector of $\mathbf{M}^{\star}$. We demonstrate that the leading eigenvalue of the data matrix $\mathbf{M}$ can be $O(\sqrt{n})$ times more accurate --- up to some log factor --- than its (unadjusted) leading singular value in eigenvalue estimation. Further, the perturbation of any linear form of the leading eigenvector of $\mathbf{M}$ --- say, entrywise eigenvector perturbation --- is provably well-controlled. This eigen-decomposition approach is fully adaptive to heteroscedasticity of noise without the need of careful bias correction or any prior knowledge about the noise variance. We also provide partial theory for the more general rank-$r$ case. The takeaway message is this: arranging the data samples in an asymmetric manner and performing eigen-decomposition could sometimes be beneficial. △ Less

Submitted 23 February, 2020; v1 submitted 30 November, 2018; originally announced November 2018.

Comments: accepted to Annals of Statistics, 2020. 37 pages

Journal ref: Annals of Statistics, 49(1): 435-458, February 2021

arXiv:1810.11137 [pdf, other]

Towards improved lossy image compression: Human image reconstruction with public-domain images

Authors: Ashutosh Bhown, Soham Mukherjee, Sean Yang, Shubham Chandak, Irena Fischer-Hwang, Kedar Tatwawadi, Judith Fan, Tsachy Weissman

Abstract: Lossy image compression has been studied extensively in the context of typical loss functions such as RMSE, MS-SSIM, etc. However, compression at low bitrates generally produces unsatisfying results. Furthermore, the availability of massive public image datasets appears to have hardly been exploited in image compression. Here, we present a paradigm for eliciting human image reconstruction in order… ▽ More Lossy image compression has been studied extensively in the context of typical loss functions such as RMSE, MS-SSIM, etc. However, compression at low bitrates generally produces unsatisfying results. Furthermore, the availability of massive public image datasets appears to have hardly been exploited in image compression. Here, we present a paradigm for eliciting human image reconstruction in order to perform lossy image compression. In this paradigm, one human describes images to a second human, whose task is to reconstruct the target image using publicly available images and text instructions. The resulting reconstructions are then evaluated by human raters on the Amazon Mechanical Turk platform and compared to reconstructions obtained using state-of-the-art compressor WebP. Our results suggest that prioritizing semantic visual elements may be key to achieving significant improvements in image compression, and that our paradigm can be used to develop a more human-centric loss function. The images, results and additional data are available at https://compression.stanford.edu/human-compression △ Less

Submitted 24 June, 2019; v1 submitted 25 October, 2018; originally announced October 2018.

arXiv:1810.10192 [pdf]

Solving Poisson's Equation using Deep Learning in Particle Simulation of PN Junction

Authors: Zhongyang Zhang, Ling Zhang, Ze Sun, Nicholas Erickson, Ryan From, Jun Fan

Abstract: Simulating the dynamic characteristics of a PN junction at the microscopic level requires solving the Poisson's equation at every time step. Solving at every time step is a necessary but time-consuming process when using the traditional finite difference (FDM) approach. Deep learning is a powerful technique to fit complex functions. In this work, deep learning is utilized to accelerate solving Poi… ▽ More Simulating the dynamic characteristics of a PN junction at the microscopic level requires solving the Poisson's equation at every time step. Solving at every time step is a necessary but time-consuming process when using the traditional finite difference (FDM) approach. Deep learning is a powerful technique to fit complex functions. In this work, deep learning is utilized to accelerate solving Poisson's equation in a PN junction. The role of the boundary condition is emphasized in the loss function to ensure a better fitting. The resulting I-V curve for the PN junction, using the deep learning solver presented in this work, shows a perfect match to the I-V curve obtained using the finite difference method, with the advantage of being 10 times faster at every time step. △ Less

Submitted 24 October, 2018; v1 submitted 24 October, 2018; originally announced October 2018.

Showing 1–46 of 46 results for author: Fan, J