Search | arXiv e-print repository

arXiv:2406.07857 [pdf, other]

Toward Enhanced Reinforcement Learning-Based Resource Management via Digital Twin: Opportunities, Applications, and Challenges

Authors: Nan Cheng, Xiucheng Wang, Zan Li, Zhisheng Yin, Tom Luan, Xuemin Shen

Abstract: This article presents a digital twin (DT)-enhanced reinforcement learning (RL) framework aimed at optimizing performance and reliability in network resource management, since the traditional RL methods face several unified challenges when applied to physical networks, including limited exploration efficiency, slow convergence, poor long-term performance, and safety concerns during the exploration… ▽ More This article presents a digital twin (DT)-enhanced reinforcement learning (RL) framework aimed at optimizing performance and reliability in network resource management, since the traditional RL methods face several unified challenges when applied to physical networks, including limited exploration efficiency, slow convergence, poor long-term performance, and safety concerns during the exploration phase. To deal with the above challenges, a comprehensive DT-based framework is proposed to enhance the convergence speed and performance for unified RL-based resource management. The proposed framework provides safe action exploration, more accurate estimates of long-term returns, faster training convergence, higher convergence performance, and real-time adaptation to varying network conditions. Then, two case studies on ultra-reliable and low-latency communication (URLLC) services and multiple unmanned aerial vehicles (UAV) network are presented, demonstrating improvements of the proposed framework in performance, convergence speed, and training cost reduction both on traditional RL and neural network based Deep RL (DRL). Finally, the article identifies and explores some of the research challenges and open issues in this rapidly evolving field. △ Less

Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 7pages, 6figures

arXiv:2406.03875 [pdf, other]

Energy-storing analysis and fishtail stiffness optimization for a wire-driven elastic robotic fish

Authors: Xiaocun Liao, Chao Zhou, Junfeng Fan, Zhuoliang Zhang, Zhaoran Yin, Liangwei Deng

Abstract: The robotic fish with high propulsion efficiency and good maneuverability achieves underwater fishlike propulsion by commonly adopting the motor to drive the fishtail, causing the significant fluctuations of the motor power due to the uneven swing speed of the fishtail in one swing cycle. Hence, we propose a wire-driven robotic fish with a spring-steel-based active-segment elastic spine. This bion… ▽ More The robotic fish with high propulsion efficiency and good maneuverability achieves underwater fishlike propulsion by commonly adopting the motor to drive the fishtail, causing the significant fluctuations of the motor power due to the uneven swing speed of the fishtail in one swing cycle. Hence, we propose a wire-driven robotic fish with a spring-steel-based active-segment elastic spine. This bionic spine can produce elastic deformation to store energy under the action of the wire driving and motor for responding to the fluctuations of the motor power. Further, we analyze the effects of the energy-storing of the active-segment elastic spine on the smoothness of motor power. Based on the developed Lagrangian dynamic model and cantilever beam model, the power-variance-based nonlinear optimization model for the stiffness of the active-segment elastic spine is established to respond to the sharp fluctuations of motor power during each fishtail swing cycle. Results validate that the energy-storing of the active-segment elastic spine plays a vital role in improving the power fluctuations and maximum frequency of the motor by adjusting its stiffness reasonably, which is beneficial to achieving high propulsion and high speed for robotic fish. Compared with the active-segment rigid spine that is incapable of storing energy, the energy-storing of the active-segment elastic spine is beneficial to increase the maximum frequency of the motor and the average thrust of the fishtail by 0.41 Hz, and 0.06 N, respectively. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 14 pages, 19 figures

arXiv:2405.05498 [pdf, other]

The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge

Authors: **gguang Tian, Shuaishuai Ye, Shunfei Chen, Yang Xiang, Zhaohui Yin, Xinhui Hu, Xinkang Xu

Abstract: This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on t… ▽ More This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on the development set. For speech recognition, we utilize self-supervised learning representations to train end-to-end ASR models. By integrating these models, we achieve a character error rate (CER) of 16.93\% on the track 1 evaluation set, and a concatenated minimum permutation character error rate (cpCER) of 25.88\% on the track 2 evaluation set. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.02809 [pdf, other]

Does Optimal Control Always Benefit from Better Prediction? An Analysis Framework for Predictive Optimal Control

Authors: Xiangrui Zeng, Cheng Yin, Zhou** Yin

Abstract: The ``prediction + optimal control'' scheme has shown good performance in many applications of automotive, traffic, robot, and building control. In practice, the prediction results are simply considered correct in the optimal control design process. However, in reality, these predictions may never be perfect. Under a conventional stochastic optimal control formulation, it is difficult to answer qu… ▽ More The ``prediction + optimal control'' scheme has shown good performance in many applications of automotive, traffic, robot, and building control. In practice, the prediction results are simply considered correct in the optimal control design process. However, in reality, these predictions may never be perfect. Under a conventional stochastic optimal control formulation, it is difficult to answer questions like ``what if the predictions are wrong''. This paper presents an analysis framework for predictive optimal control where the subjective belief about the future is no longer considered perfect. A novel concept called the hidden prediction state is proposed to establish connections among the predictors, the subjective beliefs, the control policies and the objective control performance. Based on this framework, the predictor evaluation problem is analyzed. Three commonly-used predictor evaluation measures, including the mean squared error, the regret and the log-likelihood, are considered. It is shown that neither using the mean square error nor using the likelihood can guarantee a monotonic relationship between the predictor error and the optimal control cost. To guarantee control cost improvement, it is suggested the predictor should be evaluated with the control performance, e.g., using the optimal control cost or the regret to evaluate predictors. Numerical examples and examples from automotive applications with real-world driving data are provided to illustrate the ideas and the results. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2401.06230 [pdf, other]

doi 10.1190/geo2023-0744.1

WISE: full-Waveform variational Inference via Subsurface Extensions

Authors: Ziyi Yin, Rafael Orozco, Mathias Louboutin, Felix J. Herrmann

Abstract: We introduce a probabilistic technique for full-waveform inversion, employing variational inference and conditional normalizing flows to quantify uncertainty in migration-velocity models and its impact on imaging. Our approach integrates generative artificial intelligence with physics-informed common-image gathers, reducing reliance on accurate initial velocity models. Considered case studies demo… ▽ More We introduce a probabilistic technique for full-waveform inversion, employing variational inference and conditional normalizing flows to quantify uncertainty in migration-velocity models and its impact on imaging. Our approach integrates generative artificial intelligence with physics-informed common-image gathers, reducing reliance on accurate initial velocity models. Considered case studies demonstrate its efficacy producing realizations of migration-velocity models conditioned by the data. These models are used to quantify amplitude and positioning effects during subsequent imaging. △ Less

Submitted 10 December, 2023; originally announced January 2024.

arXiv:2312.09620 [pdf, other]

A Deep Representation Learning-based Speech Enhancement Method Using Complex Convolution Recurrent Variational Autoencoder

Authors: Yang Xiang, **gguang Tian, Xinhui Hu, Xinkang Xu, ZhaoHui Yin

Abstract: Generally, the performance of deep neural networks (DNNs) heavily depends on the quality of data representation learning. Our preliminary work has emphasized the significance of deep representation learning (DRL) in the context of speech enhancement (SE) applications. Specifically, our initial SE algorithm employed a gated recurrent unit variational autoencoder (VAE) with a Gaussian distribution t… ▽ More Generally, the performance of deep neural networks (DNNs) heavily depends on the quality of data representation learning. Our preliminary work has emphasized the significance of deep representation learning (DRL) in the context of speech enhancement (SE) applications. Specifically, our initial SE algorithm employed a gated recurrent unit variational autoencoder (VAE) with a Gaussian distribution to enhance the performance of certain existing SE systems. Building upon our preliminary framework, this paper introduces a novel approach for SE using deep complex convolutional recurrent networks with a VAE (DCCRN-VAE). DCCRN-VAE assumes that the latent variables of signals follow complex Gaussian distributions that are modeled by DCCRN, as these distributions can better capture the behaviors of complex signals. Additionally, we propose the application of a residual loss in DCCRN-VAE to further improve the quality of the enhanced speech. {Compared to our preliminary work, DCCRN-VAE introduces a more sophisticated DCCRN structure and probability distribution for DRL. Furthermore, in comparison to DCCRN, DCCRN-VAE employs a more advanced DRL strategy. The experimental results demonstrate that the proposed SE algorithm outperforms both our preliminary SE framework and the state-of-the-art DCCRN SE method in terms of scale-invariant signal-to-distortion ratio, speech quality, and speech intelligibility. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP 2024

arXiv:2310.16302 [pdf, other]

Imperfect Digital Twin Assisted Low Cost Reinforcement Training for Multi-UAV Networks

Authors: Xiucheng Wang, Nan Cheng, Longfei Ma, Zhisheng Yin, Tom. Luan, Ning Lu

Abstract: Deep Reinforcement Learning (DRL) is widely used to optimize the performance of multi-UAV networks. However, the training of DRL relies on the frequent interactions between the UAVs and the environment, which consumes lots of energy due to the flying and communication of UAVs in practical experiments. Inspired by the growing digital twin (DT) technology, which can simulate the performance of algor… ▽ More Deep Reinforcement Learning (DRL) is widely used to optimize the performance of multi-UAV networks. However, the training of DRL relies on the frequent interactions between the UAVs and the environment, which consumes lots of energy due to the flying and communication of UAVs in practical experiments. Inspired by the growing digital twin (DT) technology, which can simulate the performance of algorithms in the digital space constructed by co** features of the physical space, the DT is introduced to reduce the costs of practical training, e.g., energy and hardware purchases. Different from previous DT-assisted works with an assumption of perfect reflecting real physics by virtual digital, we consider an imperfect DT model with deviations for assisting the training of multi-UAV networks. Remarkably, to trade off the training cost, DT construction cost, and the impact of deviations of DT on training, the natural and virtually generated UAV mixing deployment method is proposed. Two cascade neural networks (NN) are used to optimize the joint number of virtually generated UAVs, the DT construction cost, and the performance of multi-UAV networks. These two NNs are trained by unsupervised and reinforcement learning, both low-cost label-free training methods. Simulation results show the training cost can significantly decrease while guaranteeing the training performance. This implies that an efficient decision can be made with imperfect DTs in multi-UAV networks. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.03749 [pdf]

SCVCNet: Sliding cross-vector convolution network for cross-task and inter-individual-set EEG-based cognitive workload recognition

Authors: Qi Wang, Li Chen, Zhiyuan Zhan, Jianhua Zhang, Zhong Yin

Abstract: This paper presents a generic approach for applying the cognitive workload recognizer by exploiting common electroencephalogram (EEG) patterns across different human-machine tasks and individual sets. We propose a neural network called SCVCNet, which eliminates task- and individual-set-related interferences in EEGs by analyzing finer-grained frequency structures in the power spectral densities. Th… ▽ More This paper presents a generic approach for applying the cognitive workload recognizer by exploiting common electroencephalogram (EEG) patterns across different human-machine tasks and individual sets. We propose a neural network called SCVCNet, which eliminates task- and individual-set-related interferences in EEGs by analyzing finer-grained frequency structures in the power spectral densities. The SCVCNet utilizes a sliding cross-vector convolution (SCVC) operation, where paired input layers representing the theta and alpha power are employed. By extracting the weights from a kernel matrix's central row and column, we compute the weighted sum of the two vectors around a specified scalp location. Next, we introduce an inter-frequency-point feature integration module to fuse the SCVC feature maps. Finally, we combined the two modules with the output-channel pooling and classification layers to construct the model. To train the SCVCNet, we employ the regularized least-square method with ridge regression and the extreme learning machine theory. We validate its performance using three databases, each consisting of distinct tasks performed by independent participant groups. The average accuracy (0.6813 and 0.6229) and F1 score (0.6743 and 0.6076) achieved in two different validation paradigms show partially higher performance than the previous works. All features and algorithms are available on website:https://github.com/7ohnKeats/SCVCNet. △ Less

Submitted 21 September, 2023; originally announced October 2023.

Comments: 12 pages

arXiv:2308.14348 [pdf, other]

Label-free Deep Learning Driven Secure Access Selection in Space-Air-Ground Integrated Networks

Authors: Zhaowei Wang, Zhisheng Yin, Xiucheng Wang, Nan Cheng, Yuan Zhang, Tom H. Luan

Abstract: In Space-air-ground integrated networks (SAGIN), the inherent openness and extensive broadcast coverage expose these networks to significant eavesdrop** threats. Considering the inherent co-channel interference due to spectrum sharing among multi-tier access networks in SAGIN, it can be leveraged to assist the physical layer security among heterogeneous transmissions. However, it is challenging… ▽ More In Space-air-ground integrated networks (SAGIN), the inherent openness and extensive broadcast coverage expose these networks to significant eavesdrop** threats. Considering the inherent co-channel interference due to spectrum sharing among multi-tier access networks in SAGIN, it can be leveraged to assist the physical layer security among heterogeneous transmissions. However, it is challenging to conduct a secrecy-oriented access strategy due to both heterogeneous resources and different eavesdrop** models. In this paper, we explore secure access selection for a scenario involving multi-mode users capable of accessing satellites, unmanned aerial vehicles, or base stations in the presence of eavesdroppers. Particularly, we propose a Q-network approximation based deep learning approach for selecting the optimal access strategy for maximizing the sum secrecy rate. Meanwhile, the power optimization is also carried out by an unsupervised learning approach to improve the secrecy performance. Remarkably, two neural networks are trained by unsupervised learning and Q-network approximation which are both label-free methods without knowing the optimal solution as labels. Numerical results verify the efficiency of our proposed power optimization approach and access strategy, leading to enhanced secure transmission performance. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.07511 [pdf, other]

Distilling Knowledge from Resource Management Algorithms to Neural Networks: A Unified Training Assistance Approach

Authors: Longfei Ma, Nan Cheng, Xiucheng Wang, Zhisheng Yin, Haibo Zhou, Wei Quan

Abstract: As a fundamental problem, numerous methods are dedicated to the optimization of signal-to-interference-plus-noise ratio (SINR), in a multi-user setting. Although traditional model-based optimization methods achieve strong performance, the high complexity raises the research of neural network (NN) based approaches to trade-off the performance and complexity. To fully leverage the high performance o… ▽ More As a fundamental problem, numerous methods are dedicated to the optimization of signal-to-interference-plus-noise ratio (SINR), in a multi-user setting. Although traditional model-based optimization methods achieve strong performance, the high complexity raises the research of neural network (NN) based approaches to trade-off the performance and complexity. To fully leverage the high performance of traditional model-based methods and the low complexity of the NN-based method, a knowledge distillation (KD) based algorithm distillation (AD) method is proposed in this paper to improve the performance and convergence speed of the NN-based method, where traditional SINR optimization methods are employed as ``teachers" to assist the training of NNs, which are ``students", thus enhancing the performance of unsupervised and reinforcement learning techniques. This approach aims to alleviate common issues encountered in each of these training paradigms, including the infeasibility of obtaining optimal solutions as labels and overfitting in supervised learning, ensuring higher convergence performance in unsupervised learning, and improving training efficiency in reinforcement learning. Simulation results demonstrate the enhanced performance of the proposed AD-based methods compared to traditional learning methods. Remarkably, this research paves the way for the integration of traditional optimization insights and emerging NN techniques in wireless communication system optimization. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2308.05987 [pdf, other]

Large-Scale Learning on Overlapped Speech Detection: New Benchmark and New General System

Authors: Zhaohui Yin, **gguang Tian, Xinhui Hu, Xinkang Xu, Yang Xiang

Abstract: Overlapped Speech Detection (OSD) is an important part of speech applications involving analysis of multi-party conversations. However, most of existing OSD systems are trained and evaluated on small datasets with limited application domains, which led to the robustness of them lacks benchmark for evaluation and the accuracy of them remains inadequate in realistic acoustic environments. To solve t… ▽ More Overlapped Speech Detection (OSD) is an important part of speech applications involving analysis of multi-party conversations. However, most of existing OSD systems are trained and evaluated on small datasets with limited application domains, which led to the robustness of them lacks benchmark for evaluation and the accuracy of them remains inadequate in realistic acoustic environments. To solve these problem, we conduct a study of large-scale learning (LSL) in OSD tasks and propose a new general OSD system named CF-OSD with LSL based on Conformer network and LSL. In our study, a large-scale test set consisting of 151h labeled speech of different styles, languages and sound-source distances is produced and used as a new benchmark for evaluating the generality of OSD systems. Rigorous comparative experiments are designed and used to evaluate the effectiveness of LSL in OSD tasks and define the OSD model of our general OSD system. The experiment results show that LSL can significantly improve the accuracy and robustness of OSD systems, and the CF-OSD with LSL system significantly outperforms other OSD systems on our proposed benchmark. Moreover, our system has also achieved state-of-the-art performance on existing small dataset benchmarks, reaching 81.6\% and 53.8\% in the Alimeeting testset and DIHARD II evaluation set, respectively. △ Less

Submitted 7 September, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

arXiv:2307.13945 [pdf, ps, other]

Learning-based Control for PMSM Using Distributed Gaussian Processes with Optimal Aggregation Strategy

Authors: Zhenxiao Yin, Xiaobing Dai, Zewen Yang, Yang Shen, Georges Hattab, Hang Zhao

Abstract: The growing demand for accurate control in varying and unknown environments has sparked a corresponding increase in the requirements for power supply components, including permanent magnet synchronous motors (PMSMs). To infer the unknown part of the system, machine learning techniques are widely employed, especially Gaussian process regression (GPR) due to its flexibility of continuous system mode… ▽ More The growing demand for accurate control in varying and unknown environments has sparked a corresponding increase in the requirements for power supply components, including permanent magnet synchronous motors (PMSMs). To infer the unknown part of the system, machine learning techniques are widely employed, especially Gaussian process regression (GPR) due to its flexibility of continuous system modeling and its guaranteed performance. For practical implementation, distributed GPR is adopted to alleviate the high computational complexity. However, the study of distributed GPR from a control perspective remains an open problem. In this paper, a control-aware optimal aggregation strategy of distributed GPR for PMSMs is proposed based on the Lyapunov stability theory. This strategy exclusively leverages the posterior mean, thereby obviating the need for computationally intensive calculations associated with posterior variance in alternative approaches. Moreover, the straightforward calculation process of our proposed strategy lends itself to seamless implementation in high-frequency PMSM control. The effectiveness of the proposed strategy is demonstrated in the simulations. △ Less

Submitted 25 July, 2023; originally announced July 2023.

arXiv:2307.02002 [pdf, other]

Interpretable and Secure Trajectory Optimization for UAV-Assisted Communication

Authors: Yunhao Quan, Nan Cheng, Xiucheng Wang, **glong Shen, Longfei Ma, Zhisheng Yin

Abstract: Unmanned aerial vehicles (UAVs) have gained popularity due to their flexible mobility, on-demand deployment, and the ability to establish high probability line-of-sight wireless communication. As a result, UAVs have been extensively used as aerial base stations (ABSs) to supplement ground-based cellular networks for various applications. However, existing UAV-assisted communication schemes mainly… ▽ More Unmanned aerial vehicles (UAVs) have gained popularity due to their flexible mobility, on-demand deployment, and the ability to establish high probability line-of-sight wireless communication. As a result, UAVs have been extensively used as aerial base stations (ABSs) to supplement ground-based cellular networks for various applications. However, existing UAV-assisted communication schemes mainly focus on trajectory optimization and power allocation, while ignoring the issue of collision avoidance during UAV flight. To address this issue, this paper proposes an interpretable UAV-assisted communication scheme that decomposes reliable UAV services into two sub-problems. The first is the constrained UAV coordinates and power allocation problem, which is solved using the Dueling Double DQN (D3QN) method. The second is the constrained UAV collision avoidance and trajectory optimization problem, which is addressed through the Monte Carlo tree search (MCTS) method. This approach ensures both reliable and efficient operation of UAVs. Moreover, we propose a scalable interpretable artificial intelligence (XAI) framework that enables more transparent and reliable system decisions. The proposed scheme's interpretability generates explainable and trustworthy results, making it easier to comprehend, validate, and control UAV-assisted communication solutions. Through extensive experiments, we demonstrate that our proposed algorithm outperforms existing techniques in terms of performance and generalization. The proposed model improves the reliability, efficiency, and safety of UAV-assisted communication systems, making it a promising solution for future UAV-assisted communication applications △ Less

Submitted 4 July, 2023; originally announced July 2023.

arXiv:2306.06144 [pdf, other]

doi 10.1109/JSEN.2023.3272907

Bayesian Calibration of MEMS Accelerometers

Authors: Oliver Dürr, Po-Yu Fan, Zong-Xian Yin

Abstract: This study aims to investigate the utilization of Bayesian techniques for the calibration of micro-electro-mechanical systems (MEMS) accelerometers. These devices have garnered substantial interest in various practical applications and typically require calibration through error-correcting functions. The parameters of these error-correcting functions are determined during a calibration process. Ho… ▽ More This study aims to investigate the utilization of Bayesian techniques for the calibration of micro-electro-mechanical systems (MEMS) accelerometers. These devices have garnered substantial interest in various practical applications and typically require calibration through error-correcting functions. The parameters of these error-correcting functions are determined during a calibration process. However, due to various sources of noise, these parameters cannot be determined with precision, making it desirable to incorporate uncertainty in the calibration models. Bayesian modeling offers a natural and complete way of reflecting uncertainty by treating the model parameters as variables rather than fixed values. Additionally, Bayesian modeling enables the incorporation of prior knowledge, making it an ideal choice for calibration. Nevertheless, it is infrequently used in sensor calibration. This study introduces Bayesian methods for the calibration of MEMS accelerometer data in a straightforward manner using recent advances in probabilistic programming. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: Accepted in IEEE Sensors

arXiv:2305.15719 [pdf, other]

Efficient Neural Music Generation

Authors: Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yu** Wang, Yuxuan Wang

Abstract: Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real… ▽ More Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real-time generation. Efficient music generation with a quality on par with MusicLM remains a significant challenge. In this paper, we present MeLoDy (M for music; L for LM; D for diffusion), an LM-guided diffusion model that generates music audios of state-of-the-art quality meanwhile reducing 95.7% or 99.6% forward passes in MusicLM, respectively, for sampling 10s or 30s music. MeLoDy inherits the highest-level LM from MusicLM for semantic modeling, and applies a novel dual-path diffusion (DPD) model and an audio VAE-GAN to efficiently decode the conditioning semantic tokens into waveform. DPD is proposed to simultaneously model the coarse and fine acoustics by incorporating the semantic information into segments of latents effectively via cross-attention at each denoising step. Our experimental results suggest the superiority of MeLoDy, not only in its practical advantages on sampling speed and infinitely continuable generation, but also in its state-of-the-art musicality, audio quality, and text correlation. Our samples are available at https://Efficient-MeLoDy.github.io/. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.07662 [pdf, other]

Self-information Domain-based Neural CSI Compression with Feature Coupling

Authors: Ziqing Yin, Renjie Xie, Wei Xu, Zhaohui Yang, Xiaohu You

Abstract: Deep learning (DL)-based channel state information (CSI) feedback methods compressed the CSI matrix by exploiting its delay and angle features straightforwardly, while the measure in terms of information contained in the CSI matrix has rarely been considered. Based on this observation, we introduce self-information as an informative CSI representation from the perspective of information theory, wh… ▽ More Deep learning (DL)-based channel state information (CSI) feedback methods compressed the CSI matrix by exploiting its delay and angle features straightforwardly, while the measure in terms of information contained in the CSI matrix has rarely been considered. Based on this observation, we introduce self-information as an informative CSI representation from the perspective of information theory, which reflects the amount of information of the original CSI matrix in an explicit way. Then, a novel DL-based network is proposed for temporal CSI compression in the self-information domain, namely SD-CsiNet. The proposed SD-CsiNet projects the raw CSI onto a self-information matrix in the newly-defined self-information domain, extracts both temporal and spatial features of the self-information matrix, and then couples these two features for effective compression. Experimental results verify the effectiveness of the proposed SD-CsiNet by exploiting the self-information of CSI. Particularly for compression ratios 1/8 and 1/16, the SD-CsiNet respectively achieves 7.17 dB and 3.68 dB performance gains compared to state-of-the-art methods. △ Less

Submitted 30 April, 2023; originally announced May 2023.

arXiv:2304.02968 [pdf, other]

doi 10.1145/3583781.3590235

Technology-Circuit-Algorithm Tri-Design for Processing-in-Pixel-in-Memory (P2M)

Authors: Md Abdullah-Al Kaiser, Gourav Datta, Sreetama Sarkar, Souvik Kundu, Zihan Yin, Manas Garg, Ajey P. Jacob, Peter A. Beerel, Akhilesh R. Jaiswal

Abstract: The massive amounts of data generated by camera sensors motivate data processing inside pixel arrays, i.e., at the extreme-edge. Several critical developments have fueled recent interest in the processing-in-pixel-in-memory paradigm for a wide range of visual machine intelligence tasks, including (1) advances in 3D integration technology to enable complex processing inside each pixel in a 3D integ… ▽ More The massive amounts of data generated by camera sensors motivate data processing inside pixel arrays, i.e., at the extreme-edge. Several critical developments have fueled recent interest in the processing-in-pixel-in-memory paradigm for a wide range of visual machine intelligence tasks, including (1) advances in 3D integration technology to enable complex processing inside each pixel in a 3D integrated manner while maintaining pixel density, (2) analog processing circuit techniques for massively parallel low-energy in-pixel computations, and (3) algorithmic techniques to mitigate non-idealities associated with analog processing through hardware-aware training schemes. This article presents a comprehensive technology-circuit-algorithm landscape that connects technology capabilities, circuit design strategies, and algorithmic optimizations to power, performance, area, bandwidth reduction, and application-level accuracy metrics. We present our results using a comprehensive co-design framework incorporating hardware and algorithmic optimizations for various complex real-life visual intelligence tasks mapped onto our P2M paradigm. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Journal ref: GLSVLSI '23: Great Lakes Symposium on VLSI 2023 Proceedings

arXiv:2303.14095 [pdf, other]

PanoVPR: Towards Unified Perspective-to-Equirectangular Visual Place Recognition via Sliding Windows across the Panoramic View

Authors: Ze Shi, Hao Shi, Kailun Yang, Zhe Yin, Yining Lin, Kaiwei Wang

Abstract: Visual place recognition has gained significant attention in recent years as a crucial technology in autonomous driving and robotics. Currently, the two main approaches are the perspective view retrieval (P2P) paradigm and the equirectangular image retrieval (E2E) paradigm. However, it is practical and natural to assume that users only have consumer-grade pinhole cameras to obtain query perspectiv… ▽ More Visual place recognition has gained significant attention in recent years as a crucial technology in autonomous driving and robotics. Currently, the two main approaches are the perspective view retrieval (P2P) paradigm and the equirectangular image retrieval (E2E) paradigm. However, it is practical and natural to assume that users only have consumer-grade pinhole cameras to obtain query perspective images and retrieve them in panoramic database images from map providers. To address this, we propose \textit{PanoVPR}, a perspective-to-equirectangular (P2E) visual place recognition framework that employs sliding windows to eliminate feature truncation caused by hard crop**. Specifically, PanoVPR slides windows over the entire equirectangular image and computes feature descriptors for each window, which are then compared to determine place similarity. Notably, our unified framework enables direct transfer of the backbone from P2P methods without any modification, supporting not only CNNs but also Transformers. To facilitate training and evaluation, we derive the Pitts250k-P2E dataset from the Pitts250k and establish YQ360, latter is the first P2E visual place recognition dataset collected by a mobile robot platform aiming to simulate real-world task scenarios better. Extensive experiments demonstrate that PanoVPR achieves state-of-the-art performance and obtains 3.8% and 8.0% performance gain on Pitts250k-P2E and YQ360 compared to the previous best method, respectively. Code and datasets will be publicly available at https://github.com/zafirshi/PanoVPR. △ Less

Submitted 28 July, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: Accepted to ITSC 2023. Code and datasets will be made available at https://github.com/zafirshi/PanoVPR

arXiv:2302.14751 [pdf]

High speed free-space optical communication using standard fiber communication component without optical amplification

Authors: Yao Zhang, Hua-Ying Liu, Xiaoyi Liu, Peng Xu, Xiang Dong, Pengfei Fan, Xiaohui Tian, Hua Yu, Dong Pan, Zhijun Yin, Guilu Long, Shi-Ning Zhu, Zhenda Xie

Abstract: Free-space optical communication (FSO) can achieve fast, secure and license-free communication without need for physical cables, making it a cost-effective, energy-efficient and flexible solution when the fiber connection is unavailable. To establish FSO connection on-demand, it is essential to build portable FSO devices with compact structure and light weight. Here, we develop a miniaturized FSO… ▽ More Free-space optical communication (FSO) can achieve fast, secure and license-free communication without need for physical cables, making it a cost-effective, energy-efficient and flexible solution when the fiber connection is unavailable. To establish FSO connection on-demand, it is essential to build portable FSO devices with compact structure and light weight. Here, we develop a miniaturized FSO system and realize 9.16 Gbps FSO between two nodes that is 1 km apart, using a commercial single-mode-fiber-coupled optical transceiver module without optical amplification. Using our 4-stage acquisition, pointing and tracking (APT) systems, the tracking error is within 3 μrad and results an average link loss of 13.7 dB, which is the key for this high-bandwidth FSO demonstration without optical amplification. Our FSO link has been tested up to 4 km, with link loss of 18 dB that is limited by the foggy weather during the test. Longer FSO distances can be expected with better weather condition and optical amplification. With single FSO device weight of only 9.5 kg, this result arouses massive applications of field-deployable high-speed wireless communication. △ Less

Submitted 16 April, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: 7 pages, 5 figures

arXiv:2212.04248 [pdf, other]

Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors

Authors: Zhentao Yu, Zixin Yin, Deyu Zhou, Duomin Wang, Finn Wong, Baoyuan Wang

Abstract: In this paper, we introduce a simple and novel framework for one-shot audio-driven talking head generation. Unlike prior works that require additional driving sources for controlled synthesis in a deterministic manner, we instead probabilistically sample all the holistic lip-irrelevant facial motions (i.e. pose, expression, blink, gaze, etc.) to semantically match the input audio while still maint… ▽ More In this paper, we introduce a simple and novel framework for one-shot audio-driven talking head generation. Unlike prior works that require additional driving sources for controlled synthesis in a deterministic manner, we instead probabilistically sample all the holistic lip-irrelevant facial motions (i.e. pose, expression, blink, gaze, etc.) to semantically match the input audio while still maintaining both the photo-realism of audio-lip synchronization and the overall naturalness. This is achieved by our newly proposed audio-to-visual diffusion prior trained on top of the map** between audio and disentangled non-lip facial representations. Thanks to the probabilistic nature of the diffusion prior, one big advantage of our framework is it can synthesize diverse facial motion sequences given the same audio clip, which is quite user-friendly for many real applications. Through comprehensive evaluations on public benchmarks, we conclude that (1) our diffusion prior outperforms auto-regressive prior significantly on almost all the concerned metrics; (2) our overall system is competitive with prior works in terms of audio-lip synchronization but can effectively sample rich and natural-looking lip-irrelevant facial motions while still semantically harmonized with the audio input. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: 16 pages

arXiv:2212.02918 [pdf, other]

doi 10.1016/j.pmcj.2022.101625

Thermal Dissipation Resulting from Everyday Interactions as a Sensing Modality -- The MIDAS Touch

Authors: Farooq Dar, Hilary Emenike, Zhigang Yin, Mohan Liyanage, Rajesh Sharma, Agustin Zuniga, Mohammad A. Hoque, Marko Radeta, Petteri Nurmi, Huber Flores

Abstract: We contribute MIDAS as a novel sensing solution for characterizing everyday objects using thermal dissipation. MIDAS takes advantage of the fact that anytime a person touches an object it results in heat transfer. By capturing and modeling the dissipation of the transferred heat, e.g., through the decrease in the captured thermal radiation, MIDAS can characterize the object and determine its mater… ▽ More We contribute MIDAS as a novel sensing solution for characterizing everyday objects using thermal dissipation. MIDAS takes advantage of the fact that anytime a person touches an object it results in heat transfer. By capturing and modeling the dissipation of the transferred heat, e.g., through the decrease in the captured thermal radiation, MIDAS can characterize the object and determine its material. We validate MIDAS through extensive empirical benchmarks and demonstrate that MIDAS offers an innovative sensing modality that can recognize a wide range of materials with up to 83% accuracy and generalize to variations in the people interacting with objects. We also demonstrate that MIDAS can detect thermal dissipation through objects, up to 2 mm thickness, and support analysis of multiple objects that are interacted with △ Less

Submitted 6 December, 2022; originally announced December 2022.

Journal ref: Pervasive and Mobile Computing, Volume 84, 2022

arXiv:2211.16791 [pdf, other]

Adaptive adversarial training method for improving multi-scale GAN based on generalization bound theory

Authors: **g Tang, Bo Tao, Zeyu Gong, Zhou** Yin

Abstract: In recent years, multi-scale generative adversarial networks (GANs) have been proposed to build generalized image processing models based on single sample. Constraining on the sample size, multi-scale GANs have much difficulty converging to the global optimum, which ultimately leads to limitations in their capabilities. In this paper, we pioneered the introduction of PAC-Bayes generalized bound th… ▽ More In recent years, multi-scale generative adversarial networks (GANs) have been proposed to build generalized image processing models based on single sample. Constraining on the sample size, multi-scale GANs have much difficulty converging to the global optimum, which ultimately leads to limitations in their capabilities. In this paper, we pioneered the introduction of PAC-Bayes generalized bound theory into the training analysis of specific models under different adversarial training methods, which can obtain a non-vacuous upper bound on the generalization error for the specified multi-scale GAN structure. Based on the drastic changes we found of the generalization error bound under different adversarial attacks and different training states, we proposed an adaptive training method which can greatly improve the image manipulation ability of multi-scale GANs. The final experimental results show that our adaptive training method in this paper has greatly contributed to the improvement of the quality of the images generated by multi-scale GANs on several image manipulation tasks. In particular, for the image super-resolution restoration task, the multi-scale GAN model trained by the proposed method achieves a 100% reduction in natural image quality evaluator (NIQE) and a 60% reduction in root mean squared error (RMSE), which is better than many models trained on large-scale datasets. △ Less

Submitted 30 November, 2022; originally announced November 2022.

arXiv:2211.10661 [pdf, other]

Phonemic Adversarial Attack against Audio Recognition in Real World

Authors: Jiakai Wang, Zhendong Chen, Zixin Yin, Qinghong Yang, Xianglong Liu

Abstract: Recently, adversarial attacks for audio recognition have attracted much attention. However, most of the existing studies mainly rely on the coarse-grain audio features at the instance level to generate adversarial noises, which leads to expensive generation time costs and weak universal attacking ability. Motivated by the observations that all audio speech consists of fundamental phonemes, this pa… ▽ More Recently, adversarial attacks for audio recognition have attracted much attention. However, most of the existing studies mainly rely on the coarse-grain audio features at the instance level to generate adversarial noises, which leads to expensive generation time costs and weak universal attacking ability. Motivated by the observations that all audio speech consists of fundamental phonemes, this paper proposes a phonemic adversarial tack (PAT) paradigm, which attacks the fine-grain audio features at the phoneme level commonly shared across audio instances, to generate phonemic adversarial noises, enjoying the more general attacking ability with fast generation speed. Specifically, for accelerating the generation, a phoneme density balanced sampling strategy is introduced to sample quantity less but phonemic features abundant audio instances as the training data via estimating the phoneme density, which substantially alleviates the heavy dependency on the large training dataset. Moreover, for promoting universal attacking ability, the phonemic noise is optimized in an asynchronous way with a sliding window, which enhances the phoneme diversity and thus well captures the critical fundamental phonemic patterns. By conducting extensive experiments, we comprehensively investigate the proposed PAT framework and demonstrate that it outperforms the SOTA baselines by large margins (i.e., at least 11X speed up and 78% attacking ability improvement). △ Less

Submitted 19 November, 2022; originally announced November 2022.

arXiv:2211.08880 [pdf]

Temporal-spatial Representation Learning Transformer for EEG-based Emotion Recognition

Authors: Zhe Wang, Yongxiong Wang, Chuanfei Hu, Zhong Yin, Yu Song

Abstract: Both the temporal dynamics and spatial correlations of Electroencephalogram (EEG), which contain discriminative emotion information, are essential for the emotion recognition. However, some redundant information within the EEG signals would degrade the performance. Specifically,the subjects reach prospective intense emotions for only a fraction of the stimulus duration. Besides, it is a challenge… ▽ More Both the temporal dynamics and spatial correlations of Electroencephalogram (EEG), which contain discriminative emotion information, are essential for the emotion recognition. However, some redundant information within the EEG signals would degrade the performance. Specifically,the subjects reach prospective intense emotions for only a fraction of the stimulus duration. Besides, it is a challenge to extract discriminative features from the complex spatial correlations among a number of electrodes. To deal with the problems, we propose a transformer-based model to robustly capture temporal dynamics and spatial correlations of EEG. Especially, temporal feature extractors which share the weight among all the EEG channels are designed to adaptively extract dynamic context information from raw signals. Furthermore, multi-head self-attention mechanism within the transformers could adaptively localize the vital EEG fragments and emphasize the essential brain regions which contribute to the performance. To verify the effectiveness of the proposed method, we conduct the experiments on two public datasets, DEAP and MAHNOBHCI. The results demonstrate that the proposed method achieves outstanding performance on arousal and valence classification. △ Less

Submitted 16 November, 2022; originally announced November 2022.

arXiv:2211.03527 [pdf, other]

doi 10.1190/tle42010069.1

De-risking geological carbon storage from high resolution time-lapse seismic to explainable leakage detection

Authors: Ziyi Yin, Huseyin Tuna Erdinc, Abhinav Prakash Gahlot, Mathias Louboutin, Felix J. Herrmann

Abstract: Geological carbon storage represents one of the few truly scalable technologies capable of reducing the CO2 concentration in the atmosphere. While this technology has the potential to scale, its success hinges on our ability to mitigate its risks. An important aspect of risk mitigation concerns assurances that the injected CO2 remains within the storage complex. Amongst the different monitoring mo… ▽ More Geological carbon storage represents one of the few truly scalable technologies capable of reducing the CO2 concentration in the atmosphere. While this technology has the potential to scale, its success hinges on our ability to mitigate its risks. An important aspect of risk mitigation concerns assurances that the injected CO2 remains within the storage complex. Amongst the different monitoring modalities, seismic imaging stands out with its ability to attain high resolution and high fidelity images. However, these superior features come, unfortunately, at prohibitive costs and time-intensive efforts potentially rendering extensive seismic monitoring undesirable. To overcome this shortcoming, we present a methodology where time-lapse images are created by inverting non-replicated time-lapse monitoring data jointly. By no longer insisting on replication of the surveys to obtain high fidelity time-lapse images and differences, extreme costs and time-consuming labor are averted. To demonstrate our approach, hundreds of noisy time-lapse seismic datasets are simulated that contain imprints of regular CO2 plumes and irregular plumes that leak. These time-lapse datasets are subsequently inverted to produce time-lapse difference images used to train a deep neural classifier. The testing results show that the classifier is capable of detecting CO2 leakage automatically on unseen data and with a reasonable accuracy. △ Less

Submitted 7 October, 2022; originally announced November 2022.

arXiv:2210.11153 [pdf, other]

Reversed Image Signal Processing and RAW Reconstruction. AIM 2022 Challenge Report

Authors: Marcos V. Conde, Radu Timofte, Yibin Huang, **gyang Peng, Chang Chen, Cheng Li, Eduardo Pérez-Pellitero, Fenglong Song, Furui Bai, Shuai Liu, Chaoyu Feng, Xiaotao Wang, Lei Lei, Yu Zhu, Chenghua Li, Yingying Jiang, Yong A, Peisong Wang, Cong Leng, Jian Cheng, Xiaoyu Liu, Zhicun Yin, Zhilu Zhang, Junyi Li, Ming Liu , et al. (18 additional authors not shown)

Abstract: Cameras capture sensor RAW images and transform them into pleasant RGB images, suitable for the human eyes, using their integrated Image Signal Processor (ISP). Numerous low-level vision tasks operate in the RAW domain (e.g. image denoising, white balance) due to its linear relationship with the scene irradiance, wide-range of information at 12bits, and sensor designs. Despite this, RAW image data… ▽ More Cameras capture sensor RAW images and transform them into pleasant RGB images, suitable for the human eyes, using their integrated Image Signal Processor (ISP). Numerous low-level vision tasks operate in the RAW domain (e.g. image denoising, white balance) due to its linear relationship with the scene irradiance, wide-range of information at 12bits, and sensor designs. Despite this, RAW image datasets are scarce and more expensive to collect than the already large and public RGB datasets. This paper introduces the AIM 2022 Challenge on Reversed Image Signal Processing and RAW Reconstruction. We aim to recover raw sensor images from the corresponding RGBs without metadata and, by doing this, "reverse" the ISP transformation. The proposed methods and benchmark establish the state-of-the-art for this low-level vision inverse problem, and generating realistic raw sensor readings can potentially benefit other tasks such as denoising and super-resolution. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: ECCV 2022 Advances in Image Manipulation (AIM) workshop

arXiv:2210.05451 [pdf, other]

Enabling ISP-less Low-Power Computer Vision

Authors: Gourav Datta, Zeyu Liu, Zihan Yin, Linyu Sun, Akhilesh R. Jaiswal, Peter A. Beerel

Abstract: In order to deploy current computer vision (CV) models on resource-constrained low-power devices, recent works have proposed in-sensor and in-pixel computing approaches that try to partly/fully bypass the image signal processor (ISP) and yield significant bandwidth reduction between the image sensor and the CV processing unit by downsampling the activation maps in the initial convolutional neural… ▽ More In order to deploy current computer vision (CV) models on resource-constrained low-power devices, recent works have proposed in-sensor and in-pixel computing approaches that try to partly/fully bypass the image signal processor (ISP) and yield significant bandwidth reduction between the image sensor and the CV processing unit by downsampling the activation maps in the initial convolutional neural network (CNN) layers. However, direct inference on the raw images degrades the test accuracy due to the difference in covariance of the raw images captured by the image sensors compared to the ISP-processed images used for training. Moreover, it is difficult to train deep CV models on raw images, because most (if not all) large-scale open-source datasets consist of RGB images. To mitigate this concern, we propose to invert the ISP pipeline, which can convert the RGB images of any dataset to its raw counterparts, and enable model training on raw images. We release the raw version of the COCO dataset, a large-scale benchmark for generic high-level vision tasks. For ISP-less CV systems, training on these raw images result in a 7.1% increase in test accuracy on the visual wake works (VWW) dataset compared to relying on training with traditional ISP-processed RGB datasets. To further improve the accuracy of ISP-less CV models and to increase the energy and bandwidth benefits obtained by in-sensor/in-pixel computing, we propose an energy-efficient form of analog in-pixel demosaicing that may be coupled with in-pixel CNN computations. When evaluated on raw images captured by real sensors from the PASCALRAW dataset, our approach results in a 8.1% increase in mAP. Lastly, we demonstrate a further 20.5% increase in mAP by using a novel application of few-shot learning with thirty shots each for the novel PASCALRAW dataset, constituting 3 classes. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Comments: Accepted to WACV 2023

arXiv:2208.12707 [pdf, other]

IRIS: Integrated Retinal Functionality in Image Sensors

Authors: Zihan Yin, Md Abdullah-Al Kaiser, Lamine Ousmane Camara, Mark Camarena, Maryam Parsa, Ajey Jacob, Gregory Schwartz, Akhilesh Jaiswal

Abstract: Neuromorphic image sensors draw inspiration from the biological retina to implement visual computations in electronic hardware. Gain control in phototransduction and temporal differentiation at the first retinal synapse inspired the first generation of neuromorphic sensors, but processing in downstream retinal circuits, much of which has been discovered in the past decade, has not been implemented… ▽ More Neuromorphic image sensors draw inspiration from the biological retina to implement visual computations in electronic hardware. Gain control in phototransduction and temporal differentiation at the first retinal synapse inspired the first generation of neuromorphic sensors, but processing in downstream retinal circuits, much of which has been discovered in the past decade, has not been implemented in image sensor technology. We present a technology-circuit co-design solution that implements two motion computations occurring at the output of the retina that could have wide applications for vision based decision making in dynamic environments. Our simulations on Globalfoundries 22nm technology node show that, by taking advantage of the recent advances in semiconductor chip stacking technology, the proposed retina-inspired circuits can be fabricated on image sensing platforms in existing semiconductor foundries. Integrated Retinal Functionality in Image Sensors (IRIS) technology could drive advances in machine vision applications that demand robust, high-speed, energy-efficient and low-bandwidth real-time decision making. △ Less

Submitted 14 August, 2022; originally announced August 2022.

Comments: 18 pages

arXiv:2208.11087 [pdf]

Locally temporal-spatial pattern learning with graph attention mechanism for EEG-based emotion recognition

Authors: Yiwen Zhu, Kaiyu Gan, Zhong Yin

Abstract: Technique of emotion recognition enables computers to classify human affective states into discrete categories. However, the emotion may fluctuate instead of maintaining a stable state even within a short time interval. There is also a difficulty to take the full use of the EEG spatial distribution due to its 3-D topology structure. To tackle the above issues, we proposed a locally temporal-spatia… ▽ More Technique of emotion recognition enables computers to classify human affective states into discrete categories. However, the emotion may fluctuate instead of maintaining a stable state even within a short time interval. There is also a difficulty to take the full use of the EEG spatial distribution due to its 3-D topology structure. To tackle the above issues, we proposed a locally temporal-spatial pattern learning graph attention network (LTS-GAT) in the present study. In the LTS-GAT, a divide-and-conquer scheme was used to examine local information on temporal and spatial dimensions of EEG patterns based on the graph attention mechanism. A dynamical domain discriminator was added to improve the robustness against inter-individual variations of the EEG statistics to learn robust EEG feature representations across different participants. We evaluated the LTS-GAT on two public datasets for affective computing studies under individual-dependent and independent paradigms. The effectiveness of LTS-GAT model was demonstrated when compared to other existing mainstream methods. Moreover, visualization methods were used to illustrate the relations of different brain regions and emotion recognition. Meanwhile, the weights of different time segments were also visualized to investigate emotion sparsity problems. △ Less

Submitted 19 August, 2022; originally announced August 2022.

arXiv:2208.01781 [pdf, other]

Digital Twin-Assisted Efficient Reinforcement Learning for Edge Task Scheduling

Authors: Xiucheng Wang, Longfei Ma, Haocheng Li, Zhisheng Yin, Tom. Luan, Nan Cheng

Abstract: Task scheduling is a critical problem when one user offloads multiple different tasks to the edge server. When a user has multiple tasks to offload and only one task can be transmitted to server at a time, while server processes tasks according to the transmission order, the problem is NP-hard. However, it is difficult for traditional optimization methods to quickly obtain the optimal solution, wh… ▽ More Task scheduling is a critical problem when one user offloads multiple different tasks to the edge server. When a user has multiple tasks to offload and only one task can be transmitted to server at a time, while server processes tasks according to the transmission order, the problem is NP-hard. However, it is difficult for traditional optimization methods to quickly obtain the optimal solution, while approaches based on reinforcement learning face with the challenge of excessively large action space and slow convergence. In this paper, we propose a Digital Twin (DT)-assisted RL-based task scheduling method in order to improve the performance and convergence of the RL. We use DT to simulate the results of different decisions made by the agent, so that one agent can try multiple actions at a time, or, similarly, multiple agents can interact with environment in parallel in DT. In this way, the exploration efficiency of RL can be significantly improved via DT, and thus RL can converges faster and local optimality is less likely to happen. Particularly, two algorithms are designed to made task scheduling decisions, i.e., DT-assisted asynchronous Q-learning (DTAQL) and DT-assisted exploring Q-learning (DTEQL). Simulation results show that both algorithms significantly improve the convergence speed of Q-learning by increasing the exploration efficiency. △ Less

Submitted 2 August, 2022; originally announced August 2022.

arXiv:2206.02407 [pdf, other]

Green Interference Based Symbiotic Security in Integrated Satellite-terrestrial Communications

Authors: Zhisheng Yin, Nan Cheng, Tom H. Luan, Yilong Hui, Wei Wang

Abstract: In this paper, we investigate secure transmissions in integrated satellite-terrestrial communications and the green interference based symbiotic security scheme is proposed. Particularly, the co-channel interference induced by the spectrum sharing between satellite and terrestrial networks and the inter-beam interference due to frequency reuse among satellite multi-beam serve as the green interfer… ▽ More In this paper, we investigate secure transmissions in integrated satellite-terrestrial communications and the green interference based symbiotic security scheme is proposed. Particularly, the co-channel interference induced by the spectrum sharing between satellite and terrestrial networks and the inter-beam interference due to frequency reuse among satellite multi-beam serve as the green interference to assist the symbiotic secure transmission, where the secure transmissions of both satellite and terrestrial links are guaranteed simultaneously. Specifically, to realize the symbiotic security, we formulate a problem to maximize the sum secrecy rate of satellite users by cooperatively beamforming optimizing and a constraint of secrecy rate of each terrestrial user is guaranteed. Since the formulated problem is non-convex and intractable, the Taylor expansion and semi-definite relaxation (SDR) are adopted to further reformulate this problem, and the successive convex approximation (SCA) algorithm is designed to solve it. Finally, the tightness of the relaxation is proved. In addition, numerical results verify the efficiency of our proposed approach. △ Less

Submitted 6 June, 2022; originally announced June 2022.

arXiv:2205.14285 [pdf, other]

P2M-DeTrack: Processing-in-Pixel-in-Memory for Energy-efficient and Real-Time Multi-Object Detection and Tracking

Authors: Gourav Datta, Souvik Kundu, Zihan Yin, Joe Mathai, Zeyu Liu, Zixu Wang, Mulin Tian, Shunlin Lu, Ravi T. Lakkireddy, Andrew Schmidt, Wael Abd-Almageed, Ajey P. Jacob, Akhilesh R. Jaiswal, Peter A. Beerel

Abstract: Today's high resolution, high frame rate cameras in autonomous vehicles generate a large volume of data that needs to be transferred and processed by a downstream processor or machine learning (ML) accelerator to enable intelligent computing tasks, such as multi-object detection and tracking. The massive amount of data transfer incurs significant energy, latency, and bandwidth bottlenecks, which h… ▽ More Today's high resolution, high frame rate cameras in autonomous vehicles generate a large volume of data that needs to be transferred and processed by a downstream processor or machine learning (ML) accelerator to enable intelligent computing tasks, such as multi-object detection and tracking. The massive amount of data transfer incurs significant energy, latency, and bandwidth bottlenecks, which hinders real-time processing. To mitigate this problem, we propose an algorithm-hardware co-design framework called Processing-in-Pixel-in-Memory-based object Detection and Tracking (P2M-DeTrack). P2M-DeTrack is based on a custom faster R-CNN-based model that is distributed partly inside the pixel array (front-end) and partly in a separate FPGA/ASIC (back-end). The proposed front-end in-pixel processing down-samples the input feature maps significantly with judiciously optimized strided convolution and pooling. Compared to a conventional baseline design that transfers frames of RGB pixels to the back-end, the resulting P2M-DeTrack designs reduce the data bandwidth between sensor and back-end by up to 24x. The designs also reduce the sensor and total energy (obtained from in-house circuit simulations at Globalfoundries 22nm technology node) per frame by 5.7x and 1.14x, respectively. Lastly, they reduce the sensing and total frame latency by an estimated 1.7x and 3x, respectively. We evaluate our approach on the multi-object object detection (tracking) task of the large-scale BDD100K dataset and observe only a 0.5% reduction in the mean average precision (0.8% reduction in the identification F1 score) compared to the state-of-the-art. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: 6 pages, 4 figures, 4 tables

arXiv:2204.11567 [pdf, other]

Deep CSI Compression for Massive MIMO: A Self-information Model-driven Neural Network

Authors: Ziqing Yin, Wei Xu, Renjie Xie, Shaoqing Zhang, Derrick Wing Kwan Ng, Xiaohu You

Abstract: In order to fully exploit the advantages of massive multiple-input multiple-output (mMIMO), it is critical for the transmitter to accurately acquire the channel state information (CSI). Deep learning (DL)-based methods have been proposed for CSI compression and feedback to the transmitter. Although most existing DL-based methods consider the CSI matrix as an image, structural features of the CSI i… ▽ More In order to fully exploit the advantages of massive multiple-input multiple-output (mMIMO), it is critical for the transmitter to accurately acquire the channel state information (CSI). Deep learning (DL)-based methods have been proposed for CSI compression and feedback to the transmitter. Although most existing DL-based methods consider the CSI matrix as an image, structural features of the CSI image are rarely exploited in neural network design. As such, we propose a model of self-information that dynamically measures the amount of information contained in each patch of a CSI image from the perspective of structural features. Then, by applying the self-information model, we propose a model-and-data-driven network for CSI compression and feedback, namely IdasNet. The IdasNet includes the design of a module of self-information deletion and selection (IDAS), an encoder of informative feature compression (IFC), and a decoder of informative feature recovery (IFR). In particular, the model-driven module of IDAS pre-compresses the CSI image by removing informative redundancy in terms of the self-information. The encoder of IFC then conducts feature compression to the pre-compressed CSI image and generates a feature codeword which contains two components, i.e., codeword values and position indices of the codeword values. Subsequently, the IFR decoder decouples the codeword values as well as position indices to recover the CSI image. Experimental results verify that the proposed IdasNet noticeably outperforms existing DL-based networks under various compression ratios while it has the number of network parameters reduced by orders-of-magnitude compared with various existing methods. △ Less

Submitted 25 April, 2022; originally announced April 2022.

arXiv:2203.05696 [pdf, other]

Toward Efficient Hyperspectral Image Processing inside Camera Pixels

Authors: Gourav Datta, Zihan Yin, Ajey Jacob, Akhilesh R. Jaiswal, Peter A. Beerel

Abstract: Hyperspectral cameras generate a large amount of data due to the presence of hundreds of spectral bands as opposed to only three channels (red, green, and blue) in traditional cameras. This requires a significant amount of data transmission between the hyperspectral image sensor and a processor used to classify/detect/track the images, frame by frame, expending high energy and causing bandwidth an… ▽ More Hyperspectral cameras generate a large amount of data due to the presence of hundreds of spectral bands as opposed to only three channels (red, green, and blue) in traditional cameras. This requires a significant amount of data transmission between the hyperspectral image sensor and a processor used to classify/detect/track the images, frame by frame, expending high energy and causing bandwidth and security bottlenecks. To mitigate this problem, we propose a form of processing-in-pixel (PIP) that leverages advanced CMOS technologies to enable the pixel array to perform a wide range of complex operations required by the modern convolutional neural networks (CNN) for hyperspectral image recognition (HSI). Consequently, our PIP-optimized custom CNN layers effectively compress the input data, significantly reducing the bandwidth required to transmit the data downstream to the HSI processing unit. This reduces the average energy consumption associated with pixel array of cameras and the CNN processing unit by 25.06x and 3.90x respectively, compared to existing hardware implementations. Our custom models yield average test accuracies within 0.56% of the baseline models for the standard HSI benchmarks. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: 6 pages, 3 figures

arXiv:2202.13388 [pdf, other]

PanoFlow: Learning 360° Optical Flow for Surrounding Temporal Understanding

Authors: Hao Shi, Yifan Zhou, Kailun Yang, Xiaoting Yin, Ze Wang, Yaozu Ye, Zhe Yin, Shi Meng, Peng Li, Kaiwei Wang

Abstract: Optical flow estimation is a basic task in self-driving and robotics systems, which enables to temporally interpret traffic scenes. Autonomous vehicles clearly benefit from the ultra-wide Field of View (FoV) offered by 360° panoramic sensors. However, due to the unique imaging process of panoramic cameras, models designed for pinhole images do not directly generalize satisfactorily to 360° panoram… ▽ More Optical flow estimation is a basic task in self-driving and robotics systems, which enables to temporally interpret traffic scenes. Autonomous vehicles clearly benefit from the ultra-wide Field of View (FoV) offered by 360° panoramic sensors. However, due to the unique imaging process of panoramic cameras, models designed for pinhole images do not directly generalize satisfactorily to 360° panoramic images. In this paper, we put forward a novel network framework--PanoFlow, to learn optical flow for panoramic images. To overcome the distortions introduced by equirectangular projection in panoramic transformation, we design a Flow Distortion Augmentation (FDA) method, which contains radial flow distortion (FDA-R) or equirectangular flow distortion (FDA-E). We further look into the definition and properties of cyclic optical flow for panoramic videos, and hereby propose a Cyclic Flow Estimation (CFE) method by leveraging the cyclicity of spherical images to infer 360° optical flow and converting large displacement to relatively small displacement. PanoFlow is applicable to any existing flow estimation method and benefits from the progress of narrow-FoV flow estimation. In addition, we create and release a synthetic panoramic dataset FlowScape based on CARLA to facilitate training and quantitative analysis. PanoFlow achieves state-of-the-art performance on the public OmniFlowNet and the established FlowScape benchmarks. Our proposed approach reduces the End-Point-Error (EPE) on FlowScape by 27.3%. On OmniFlowNet, PanoFlow achieves a 55.5% error reduction from the best published result. We also qualitatively validate our method via a collection vehicle and a public real-world OmniPhotos dataset, indicating strong potential and robustness for real-world navigation applications. Code and dataset are publicly available at https://github.com/MasterHow/PanoFlow. △ Less

Submitted 29 November, 2022; v1 submitted 27 February, 2022; originally announced February 2022.

Comments: Code and dataset are publicly available at https://github.com/MasterHow/PanoFlow

arXiv:2109.03488 [pdf, ps, other]

Partial Symbol Recovery for Interference Resilience in Low-Power Wide Area Networks

Authors: Kai Sun, Zhimeng Yin, Weiwei Chen, Shuai Wang, Zeyu Zhang, Tian He

Abstract: Recent years have witnessed the proliferation of Low-power Wide Area Networks (LPWANs) in the unlicensed band for various Internet-of-Things (IoT) applications. Due to the ultra-low transmission power and long transmission duration, LPWAN devices inevitably suffer from high power Cross Technology Interference (CTI), such as interference from Wi-Fi, coexisting in the same spectrum. To alleviate thi… ▽ More Recent years have witnessed the proliferation of Low-power Wide Area Networks (LPWANs) in the unlicensed band for various Internet-of-Things (IoT) applications. Due to the ultra-low transmission power and long transmission duration, LPWAN devices inevitably suffer from high power Cross Technology Interference (CTI), such as interference from Wi-Fi, coexisting in the same spectrum. To alleviate this issue, this paper introduces the Partial Symbol Recovery (PSR) scheme for improving the CTI resilience of LPWAN. We verify our idea on LoRa, a widely adopted LPWAN technique, as a proof of concept. At the PHY layer, although CTI has much higher power, its duration is relatively shorter compared with LoRa symbols, leaving part of a LoRa symbol uncorrupted. Moreover, due to its high redundancy, LoRa chips within a symbol are highly correlated. This opens the possibility of detecting a LoRa symbol with only part of the chips. By examining the unique frequency patterns in LoRa symbols with time-frequency analysis, our design effectively detects the clean LoRa chips that are free of CTI. This enables PSR to only rely on clean LoRa chips for successfully recovering from communication failures. We evaluate our PSR design with real-world testbeds, including SX1280 LoRa chips and USRP B210, under Wi-Fi interference in various scenarios. Extensive experiments demonstrate that our design offers reliable packet recovery performance, successfully boosting the LoRa packet reception ratio from 45.2% to 82.2% with a performance gain of 1.8 times. △ Less

Submitted 8 September, 2021; originally announced September 2021.

arXiv:2108.00129 [pdf]

Point-wise posteriori phase estimation in high-precision fringe projection profilometry

Authors: Cong Liu, Chuang Zhang, Zhuoyi Yin, Xiaopeng Liu, Zhihong Xu

Abstract: In fringe projection profilometry, the high-order harmonics information of non-sinusoidal fringes will lead to errors in the phase estimation. In order to solve this problem, a point-wise posterior phase estimation (PWPPE) method based on deep learning technique is proposed in this paper. The complex nonlinear map** relationship between the multiple gray values and the sine / cosine value of the… ▽ More In fringe projection profilometry, the high-order harmonics information of non-sinusoidal fringes will lead to errors in the phase estimation. In order to solve this problem, a point-wise posterior phase estimation (PWPPE) method based on deep learning technique is proposed in this paper. The complex nonlinear map** relationship between the multiple gray values and the sine / cosine value of the phase is constructed by using the feedforward neural network model. After the model training, it can estimate the phase values of each pixel location, and the accuracy is higher than the point-wise least-square (PWLS) method. To further verify the effectiveness of this method, a face mask is measured, the traditional PWLS method and the proposed PWPPE method are employed, respectively. The comparison results show that the traditional method is with periodic phase errors, while the proposed PWPPE method can effectively eliminate such phase errors caused by non-sinusoidal fringes. △ Less

Submitted 30 July, 2021; originally announced August 2021.

arXiv:2104.08337 [pdf]

Identification of mental fatigue in language comprehension tasks based on EEG and deep learning

Authors: Chunhua Ye, Zhong Yin, Chenxi Wu, Xiayidai Abulaiti, Yixing Zhang, Zhenqi Sun, Jianhua Zhang

Abstract: Mental fatigue increases the risk of operator error in language comprehension tasks. In order to prevent operator performance degradation, we used EEG signals to assess the mental fatigue of operators in human-computer systems. This study presents an experimental design for fatigue detection in language comprehension tasks. We obtained EEG signals from a 14-channel wireless EEG detector in 15 heal… ▽ More Mental fatigue increases the risk of operator error in language comprehension tasks. In order to prevent operator performance degradation, we used EEG signals to assess the mental fatigue of operators in human-computer systems. This study presents an experimental design for fatigue detection in language comprehension tasks. We obtained EEG signals from a 14-channel wireless EEG detector in 15 healthy participants. Each participant was given a cognitive test of a language comprehension task, in the form of multiple choice questions, in which pronoun references were selected between nominal and surrogate sentences. In this paper, the 2400 EEG fragments collected are divided into three data sets according to different utilization rates, namely 1200s data set with 50% utilization rate, 1500s data set with 62.5% utilization rate, and 1800s data set with 75% utilization rate. In the aspect of feature extraction, different EEG features were extracted, including time domain features, frequency domain features and entropy features, and the effects of different features and feature combinations on classification accuracy were explored. In terms of classification, we introduced the Convolutional Neural Network (CNN) method as the preferred method, It was compared with Least Squares Support Vector Machines(LSSVM),Support Vector Machines(SVM),Logistic Regression (LR), Random Forest(RF), Naive Bayes (NB), K-Nearest Neighbor (KNN) and Decision Tree(DT).According to the results, the classification accuracy of convolutional neural network (CNN) is higher than that of other classification methods. The classification results show that the classification accuracy of 1200S dataset is higher than the other two datasets. The combination of Frequency and entropy feature and CNN has the highest classification accuracy, which is 85.34%. △ Less

Submitted 14 April, 2021; originally announced April 2021.

arXiv:2103.06725 [pdf, other]

Duplex Contextual Relation Network for Polyp Segmentation

Authors: Zi** Yin, Kongming Liang, Zhanyu Ma, Jun Guo

Abstract: Polyp segmentation is of great importance in the early diagnosis and treatment of colorectal cancer. Since polyps vary in their shape, size, color, and texture, accurate polyp segmentation is very challenging. One promising way to mitigate the diversity of polyps is to model the contextual relation for each pixel such as using attention mechanism. However, previous methods only focus on learning t… ▽ More Polyp segmentation is of great importance in the early diagnosis and treatment of colorectal cancer. Since polyps vary in their shape, size, color, and texture, accurate polyp segmentation is very challenging. One promising way to mitigate the diversity of polyps is to model the contextual relation for each pixel such as using attention mechanism. However, previous methods only focus on learning the dependencies between the position within an individual image and ignore the contextual relation across different images. In this paper, we propose Duplex Contextual Relation Network (DCRNet) to capture both within-image and cross-image contextual relations. Specifically, we first design Interior Contextual-Relation Module to estimate the similarity between each position and all the positions within the same image. Then Exterior Contextual-Relation Module is incorporated to estimate the similarity between each position and the positions across different images. Based on the above two types of similarity, the feature at one position can be further enhanced by the contextual region embedding within and across images. To store the characteristic region embedding from all the images, a memory bank is designed and operates as a queue. Therefore, the proposed method can relate similar features even though they come from different images. We evaluate the proposed method on the EndoScene, Kvasir-SEG and the recently released large-scale PICCOLO dataset. Experimental results show that the proposed DCRNet outperforms the state-of-the-art methods in terms of the widely-used evaluation metrics. △ Less

Submitted 19 January, 2022; v1 submitted 11 March, 2021; originally announced March 2021.

Comments: Accepted to ISBI2022

arXiv:2012.04701 [pdf, other]

3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management

Authors: Tianyi Zhao, Kai Cao, Jiawen Yao, Isabella Nogues, Le Lu, Lingyun Huang, **g Xiao, Zhaozheng Yin, Ling Zhang

Abstract: The pancreatic disease taxonomy includes ten types of masses (tumors or cysts)[20,8]. Previous work focuses on develo** segmentation or classification methods only for certain mass types. Differential diagnosis of all mass types is clinically highly desirable [20] but has not been investigated using an automated image understanding approach. We exploit the feasibility to distinguish pancreatic d… ▽ More The pancreatic disease taxonomy includes ten types of masses (tumors or cysts)[20,8]. Previous work focuses on develo** segmentation or classification methods only for certain mass types. Differential diagnosis of all mass types is clinically highly desirable [20] but has not been investigated using an automated image understanding approach. We exploit the feasibility to distinguish pancreatic ductal adenocarcinoma (PDAC) from the nine other nonPDAC masses using multi-phase CT imaging. Both image appearance and the 3D organ-mass geometry relationship are critical. We propose a holistic segmentation-mesh-classification network (SMCN) to provide patient-level diagnosis, by fully utilizing the geometry and location information, which is accomplished by combining the anatomical structure and the semantic detection-by-segmentation network. SMCN learns the pancreas and mass segmentation task and builds an anatomical correspondence-aware organ mesh model by progressively deforming a pancreas prototype on the raw segmentation mask (i.e., mask-to-mesh). A new graph-based residual convolutional network (Graph-ResNet), whose nodes fuse the information of the mesh model and feature vectors extracted from the segmentation network, is developed to produce the patient-level differential classification results. Extensive experiments on 661 patients' CT scans (five phases per patient) show that SMCN can improve the mass segmentation and detection accuracy compared to the strong baseline method nnUNet (e.g., for nonPDAC, Dice: 0.611 vs. 0.478; detection rate: 89% vs. 70%), achieve similar sensitivity and specificity in differentiating PDAC and nonPDAC as expert radiologists (i.e., 94% and 90%), and obtain results comparable to a multimodality test [20] that combines clinical, imaging, and molecular testing for clinical management of patients. △ Less

Submitted 8 December, 2020; originally announced December 2020.

arXiv:2009.12525 [pdf]

Cross-individual Recognition of Emotions by a Dynamic Entropy based on Pattern Learning with EEG features

Authors: Xiaolong Zhong, Zhong Yin

Abstract: Use of the electroencephalogram (EEG) and machine learning approaches to recognize emotions can facilitate affective human computer interactions. However, the type of EEG data constitutes an obstacle for cross-individual EEG feature modelling and classification. To address this issue, we propose a deep-learning framework denoted as a dynamic entropy-based pattern learning (DEPL) to abstract inform… ▽ More Use of the electroencephalogram (EEG) and machine learning approaches to recognize emotions can facilitate affective human computer interactions. However, the type of EEG data constitutes an obstacle for cross-individual EEG feature modelling and classification. To address this issue, we propose a deep-learning framework denoted as a dynamic entropy-based pattern learning (DEPL) to abstract informative indicators pertaining to the neurophysiological features among multiple individuals. DEPL enhanced the capability of representations generated by a deep convolutional neural network by modelling the interdependencies between the cortical locations of dynamical entropy based features. The effectiveness of the DEPL has been validated with two public databases, commonly referred to as the DEAP and MAHNOB-HCI multimodal tagging databases. Specifically, the leave one subject out training and testing paradigm has been applied. Numerous experiments on EEG emotion recognition demonstrate that the proposed DEPL is superior to those traditional machine learning (ML) methods, and could learn between electrode dependencies w.r.t. different emotions, which is meaningful for develo** the effective human-computer interaction systems by adapting to human emotions in the real world applications. △ Less

Submitted 25 May, 2021; v1 submitted 26 September, 2020; originally announced September 2020.

arXiv:2006.10358 [pdf]

Cloud detection in Landsat-8 imagery in Google Earth Engine based on a deep neural network

Authors: Zhixiang Yin, Feng Ling, Giles M. Foody, Xinyan Li, Yun Du

Abstract: Google Earth Engine (GEE) provides a convenient platform for applications based on optical satellite imagery of large areas. With such data sets, the detection of cloud is often a necessary prerequisite step. Recently, deep learning-based cloud detection methods have shown their potential for cloud detection but they can only be applied locally, leading to inefficient data downloading time and sto… ▽ More Google Earth Engine (GEE) provides a convenient platform for applications based on optical satellite imagery of large areas. With such data sets, the detection of cloud is often a necessary prerequisite step. Recently, deep learning-based cloud detection methods have shown their potential for cloud detection but they can only be applied locally, leading to inefficient data downloading time and storage problems. This letter proposes a method to directly perform cloud detection in Landsat-8 imagery in GEE based on deep learning (DeepGEE-CD). A deep neural network (DNN) was first trained locally, and then the trained DNN was deployed in the JavaScript client of GEE. An experiment was undertaken to validate the proposed method with a set of Landsat-8 images and the results show that DeepGEE-CD outperformed the widely used function of mask (Fmask) algorithm. The proposed DeepGEE-CD approach can accurately detect cloud in Landsat-8 imagery without downloading it, making it a promising method for routine cloud detection of Landsat-8 imagery in GEE. △ Less

Submitted 1 October, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

arXiv:2004.07389 [pdf, other]

doi 10.1190/segam2020-3426999.1

Extended source imaging, a unifying framework for seismic & medical imaging

Authors: Ziyi Yin, Rafael Orozco, Philipp Witte, Mathias Louboutin, Gabrio Rizzuti, Felix J. Herrmann

Abstract: We present three imaging modalities that live on the crossroads of seismic and medical imaging. Through the lens of extended source imaging, we can draw deep connections among the fields of wave-equation based seismic and medical imaging, despite first appearances. From the seismic perspective, we underline the importance to work with the correct physics and spatially varying velocity fields. Medi… ▽ More We present three imaging modalities that live on the crossroads of seismic and medical imaging. Through the lens of extended source imaging, we can draw deep connections among the fields of wave-equation based seismic and medical imaging, despite first appearances. From the seismic perspective, we underline the importance to work with the correct physics and spatially varying velocity fields. Medical imaging, on the other hand, opens the possibility for new imaging modalities where outside stimuli, such as laser or radar pulses, can not only be used to identify endogenous optical or thermal contrasts but that these sources can also be used to insonify the medium so that images of the whole specimen can in principle be created. △ Less

Submitted 15 April, 2020; originally announced April 2020.

Comments: Submitted to the Society of Exploration Geophysicists Annual Meeting 2020

arXiv:1911.09275 [pdf, other]

A Machine Learning-enhanced Robust P-Phase Picker for Real-time Seismic Monitoring

Authors: Dazhong Shen, Qi Zhang, Tong Xu, Hengshu Zhu, Wenjia Zhao, Zikai Yin, Peilun Zhou, Lihua Fang, Enhong Chen, Hui Xiong

Abstract: Identifying the arrival times of seismic P-phases plays a significant role in real-time seismic monitoring, which provides critical guidance for emergency response activities. While considerable research has been conducted on this topic, efficiently capturing the arrival times of seismic P-phases hidden within intensively distributed and noisy seismic waves, such as those generated by the aftersho… ▽ More Identifying the arrival times of seismic P-phases plays a significant role in real-time seismic monitoring, which provides critical guidance for emergency response activities. While considerable research has been conducted on this topic, efficiently capturing the arrival times of seismic P-phases hidden within intensively distributed and noisy seismic waves, such as those generated by the aftershocks of destructive earthquakes, remains a real challenge since most common existing methods in seismology rely on laborious expert supervision. To this end, in this paper, we present a machine learning-enhanced framework based on ensemble learning strategy, EL-Picker, for the automatic identification of seismic P-phase arrivals on continuous and massive waveforms. More specifically, EL-Picker consists of three modules, namely, Trigger, Classifier, and Refiner, and an ensemble learning strategy is exploited to integrate several machine learning classifiers. An evaluation of the aftershocks following the MS 8.0 Wenchuan earthquake demonstrates that EL-Picker can not only achieve the best identification performance but also identify 120% more seismic P-phase arrivals as complementary data. Meanwhile, experimental results also reveal both the applicability of different machine learning models for waveforms collected from different seismic stations and the regularities of seismic P-phase arrivals that might be neglected during manual inspection. These findings clearly validate the effectiveness, efficiency, flexibility and stability of EL-Picker. △ Less

Submitted 20 August, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: Note that this paper is the English version of our work published in SCIENTIA SINICA Informationis (http://engine.scichina.com/doi/10.1360/SSI-2020-0214), which is suggested to be cited if needed

arXiv:1911.02360 [pdf, other]

Reversible Adversarial Attack based on Reversible Image Transformation

Authors: Zhaoxia Yin, Hua Wang, Li Chen, Jie Wang, Weiming Zhang

Abstract: In order to prevent illegal or unauthorized access of image data such as human faces and ensure legitimate users can use authorization-protected data, reversible adversarial attack technique is rise. Reversible adversarial examples (RAE) get both attack capability and reversibility at the same time. However, the existing technique can not meet application requirements because of serious distortion… ▽ More In order to prevent illegal or unauthorized access of image data such as human faces and ensure legitimate users can use authorization-protected data, reversible adversarial attack technique is rise. Reversible adversarial examples (RAE) get both attack capability and reversibility at the same time. However, the existing technique can not meet application requirements because of serious distortion and failure of image recovery when adversarial perturbations get strong. In this paper, we take advantage of Reversible Image Transformation technique to generate RAE and achieve reversible adversarial attack. Experimental results show that proposed RAE generation scheme can ensure imperceptible image distortion and the original image can be reconstructed error-free. What's more, both the attack ability and the image quality are not limited by the perturbation amplitude. △ Less

Submitted 25 May, 2021; v1 submitted 6 November, 2019; originally announced November 2019.

Comments: 2021 International Workshop on Safety & Security of Deep Learning

arXiv:1909.09316 [pdf]

doi 10.1109/MGRS.2021.3050782

Spatially Continuous and High-resolution Land Surface Temperature: A Review of Reconstruction and Spatiotemporal Fusion Techniques

Authors: Penghai Wu, Zhixiang Yin, Chao Zeng, Sibo Duan, Frank-Michael Gottsche, Xiaoshaung Ma, Xinghua Li, Hui Yang, Huanfeng Shen

Abstract: Remotely sensed, spatially continuous and high spatiotemporal resolution (hereafter referred to as high resolution) land surface temperature (LST) is a key parameter for studying the thermal environment and has important applications in many fields. However, difficult atmospheric conditions, sensor malfunctioning and scanning gaps between orbits frequently introduce spatial discontinuities into sa… ▽ More Remotely sensed, spatially continuous and high spatiotemporal resolution (hereafter referred to as high resolution) land surface temperature (LST) is a key parameter for studying the thermal environment and has important applications in many fields. However, difficult atmospheric conditions, sensor malfunctioning and scanning gaps between orbits frequently introduce spatial discontinuities into satellite-retri1eved LST products. For a single sensor, there is also a trade-off between temporal and spatial resolution and, therefore, it is impossible to obtain high temporal and spatial resolution simultaneously. In recent years the reconstruction and spatiotemporal fusion of LST products have become active research topics that aim at overcoming this limitation. They are two of most investigated approaches in thermal remote sensing and attract increasing attention, which has resulted in a number of different algorithms. However, to the best of our knowledge, currently no review exists that expatiates and summarizes the available LST reconstruction and spatiotemporal fusion methods and algorithms. This paper introduces the principles and theories behind LST reconstruction and spatiotemporal fusion and provides an overview of the published research and algorithms. We summarized three kinds of reconstruction methods for missing pixels (spatial, temporal and spatiotemporal methods), two kinds of reconstruction methods for cloudy pixels (Satellite Passive Microwave (PMW)-based and Surface Energy Balance (SEB)-based methods) and three kinds of spatiotemporal fusion methods (weighted function-based, unmixing-based and hybrid methods). The review concludes by summarizing validation methods and by identifying some promising future research directions for generating spatially continuous and high resolution LST products. △ Less

Submitted 20 September, 2019; originally announced September 2019.

Comments: 41 pages, 7 figures, 2 tables

arXiv:1908.07519 [pdf, other]

Multi-Modal Recognition of Worker Activity for Human-Centered Intelligent Manufacturing

Authors: Wen** Tao, Ming C. Leu, Zhaozheng Yin

Abstract: In a human-centered intelligent manufacturing system, sensing and understanding of the worker's activity are the primary tasks. In this paper, we propose a novel multi-modal approach for worker activity recognition by leveraging information from different sensors and in different modalities. Specifically, a smart armband and a visual camera are applied to capture Inertial Measurement Unit (IMU) si… ▽ More In a human-centered intelligent manufacturing system, sensing and understanding of the worker's activity are the primary tasks. In this paper, we propose a novel multi-modal approach for worker activity recognition by leveraging information from different sensors and in different modalities. Specifically, a smart armband and a visual camera are applied to capture Inertial Measurement Unit (IMU) signals and videos, respectively. For the IMU signals, we design two novel feature transform mechanisms, in both frequency and spatial domains, to assemble the captured IMU signals as images, which allow using convolutional neural networks to learn the most discriminative features. Along with the above two modalities, we propose two other modalities for the video data, at the video frame and video clip levels, respectively. Each of the four modalities returns a probability distribution on activity prediction. Then, these probability distributions are fused to output the worker activity classification result. A worker activity dataset of 6 activities is established, which at present contains 6 common activities in assembly tasks, i.e., grab a tool/part, hammer a nail, use a power-screwdriver, rest arms, turn a screwdriver, and use a wrench. The developed multi-modal approach is evaluated on this dataset and achieves recognition accuracies as high as 97% and 100% in the leave-one-out and half-half experiments, respectively. △ Less

Submitted 20 August, 2019; originally announced August 2019.

Comments: 17 pages, 8 figures, 6 tables

arXiv:1905.08967

Multiple reconstruction compression framework based on PNG image

Authors: Zhiqing Lu, Zhaoxia Yin, Bin Luo

Abstract: It is shown that neural networks (NNs) achieve excellent performances in image compression and reconstruction. However, there are still many shortcomings in the practical application, which eventually lead to the loss of neural network image processing ability. Based on this, this paper proposes a joint framework based on neural network and zoom compression. The framework first encodes the incomin… ▽ More It is shown that neural networks (NNs) achieve excellent performances in image compression and reconstruction. However, there are still many shortcomings in the practical application, which eventually lead to the loss of neural network image processing ability. Based on this, this paper proposes a joint framework based on neural network and zoom compression. The framework first encodes the incoming PNG or JPEG image information, and then the image is converted into binary input decoder to reconstruct the intermediate state image, next we import the intermediate state image into the zooming compressor and re-pressurize it, and reconstruct the final image. From the experimental results, this method can better process the digital image and suppress the reverse expansion problem, and the compression effect can be improved by 4 to 10 times as much as that of using RNN alone, showing better ability in application. In this paper, the method is transmitted over a digital image, the effect is far better than the existing compression method alone, the Human visual system can not feel the change of the effect. △ Less

Submitted 14 November, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

Comments: The experimental results cannot reproduced

Showing 1–48 of 48 results for author: Yin, Z