Search | arXiv e-print repository

Beyond the Visible: Jointly Attending to Spectral and Spatial Dimensions with HSI-Diffusion for the FINCH Spacecraft

Authors: Ian Vyse, Rishit Dagli, Dav Vrat Chadha, John P. Ma, Hector Chen, Isha Ruparelia, Prithvi Seran, Matthew Xie, Eesa Aamer, Aidan Armstrong, Naveen Black, Ben Borstein, Kevin Caldwell, Orrin Dahanaggamaarachchi, Joe Dai, Abeer Fatima, Stephanie Lu, Maxime Michet, Anoushka Paul, Carrie Ann Po, Shivesh Prakash, Noa Prosser, Riddhiman Roy, Mirai Shinjo, Iliya Shofman , et al. (4 additional authors not shown)

Abstract: Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and… ▽ More Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and spatial information, it is prone to various types of noise, including random noise, stripe noise, and dead pixels. Effective denoising of these images is crucial for downstream scientific tasks. Traditional methods, including hand-crafted techniques encoding strong priors, learned 2D image denoising methods applied across different hyperspectral bands, or diffusion generative models applied independently on bands, often struggle with varying noise strengths across spectral bands, leading to significant spectral distortion. This paper presents a novel approach to hyperspectral image denoising using latent diffusion models that integrate spatial and spectral information. We particularly do so by building a 3D diffusion model and presenting a 3-stage training approach on real and synthetically crafted datasets. The proposed method preserves image structure while reducing noise. Evaluations on both popular hyperspectral denoising datasets and synthetically crafted datasets for the FINCH mission demonstrate the effectiveness of this approach. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: To appear in 38th Annual Small Satellite Conference

arXiv:2403.15156 [pdf, other]

Infrastructure-Assisted Collaborative Perception in Automated Valet Parking: A Safety Perspective

Authors: Yukuan Jia, Jiawen Zhang, Shimeng Lu, Baokang Fan, Ruiqing Mao, Sheng Zhou, Zhisheng Niu

Abstract: Environmental perception in Automated Valet Parking (AVP) has been a challenging task due to severe occlusions in parking garages. Although Collaborative Perception (CP) can be applied to broaden the field of view of connected vehicles, the limited bandwidth of vehicular communications restricts its application. In this work, we propose a BEV feature-based CP network architecture for infrastructur… ▽ More Environmental perception in Automated Valet Parking (AVP) has been a challenging task due to severe occlusions in parking garages. Although Collaborative Perception (CP) can be applied to broaden the field of view of connected vehicles, the limited bandwidth of vehicular communications restricts its application. In this work, we propose a BEV feature-based CP network architecture for infrastructure-assisted AVP systems. The model takes the roadside camera and LiDAR as optional inputs and adaptively fuses them with onboard sensors in a unified BEV representation. Autoencoder and downsampling are applied for channel-wise and spatial-wise dimension reduction, while sparsification and quantization further compress the feature map with little loss in data precision. Combining these techniques, the size of a BEV feature map is effectively compressed to fit in the feasible data rate of the NR-V2X network. With the synthetic AVP dataset, we observe that CP can effectively increase perception performance, especially for pedestrians. Moreover, the advantage of infrastructure-assisted CP is demonstrated in two typical safety-critical scenarios in the AVP setting, increasing the maximum safe cruising speed by up to 3m/s in both scenarios. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 7 pages, 7 figures, 4 tables, accepted by IEEE VTC2024-Spring

arXiv:2403.15029 [pdf]

On the Solution Uniqueness of Data-Driven Modeling of Flexible Loads

Authors: Shuai Lu, Jiayi Ding, Wei Gu, Junpeng Zhu, Yijun Xu, Zhaoyang Dong, Zezheng Sun

Abstract: This letter first explores the solution uniqueness of the data-driven modeling of price-responsive flexible loads (PFL). The PFL on the demand side is critical in modern power systems. An accurate PFL model is fundamental for system operations. Yet, whether the PFL model can be uniquely and correctly identified from operational data remains unclear. To address this, we analyze the structural and p… ▽ More This letter first explores the solution uniqueness of the data-driven modeling of price-responsive flexible loads (PFL). The PFL on the demand side is critical in modern power systems. An accurate PFL model is fundamental for system operations. Yet, whether the PFL model can be uniquely and correctly identified from operational data remains unclear. To address this, we analyze the structural and practical identifiability of the PFL model, deriving the condition for the solution uniqueness. Based on this, we point out the implications for selecting physical models of PFL to enhance the identification results. Numerical results validate this work. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2312.02809 [pdf, other]

Semi-implicit Continuous Newton Method for Power Flow Analysis

Authors: Ruizhi Yu, Wei Gu, Shuai Lu, Yijun Xu

Abstract: This paper proposes a semi-implicit version of continuous Newton method (CNM) for power flow analysis. The proposed method succeeds the numerical robustness from the implicit CNM (ICNM) framework while prevents the iterative solution of nonlinear systems, hence revealing higher convergence speed and computation efficiency. The intractability of ICNM consists in its nonlinear implicit ordinary-diff… ▽ More This paper proposes a semi-implicit version of continuous Newton method (CNM) for power flow analysis. The proposed method succeeds the numerical robustness from the implicit CNM (ICNM) framework while prevents the iterative solution of nonlinear systems, hence revealing higher convergence speed and computation efficiency. The intractability of ICNM consists in its nonlinear implicit ordinary-differential-equation (ODE) nature. We circumvent this by introducing intermediate variables, hence converting the implicit ODEs into differential algebraic equations (DAEs), and solve the DAEs with a linear scheme, the stiffly accurate Rosenbrock type method (SARM). A new 4-stage 3rd-order hyper-stable SARM, together with a 2nd-order embedded formula to control the step size, is constructed. Case studies on system 9241pegase verified the alleged performance. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.07157 [pdf, other]

Communication-Assisted Sensing in 6G Networks

Authors: Fuwang Dong, Fan Liu, Shihang Lu, Yifeng Xiong, Qixun Zhang, Zhiyong Feng

Abstract: Exploring the mutual benefit and reciprocity of sensing and communication (S\&C) functions is fundamental to realizing deeper integration for integrated sensing and communication (ISAC) systems. This paper investigates a novel communication-assisted sensing (CAS) system within 6G perceptive networks, where the base station actively senses the targets through device-free wireless sensing and simult… ▽ More Exploring the mutual benefit and reciprocity of sensing and communication (S\&C) functions is fundamental to realizing deeper integration for integrated sensing and communication (ISAC) systems. This paper investigates a novel communication-assisted sensing (CAS) system within 6G perceptive networks, where the base station actively senses the targets through device-free wireless sensing and simultaneously transmits the estimated information to end-users. In such a CAS system, we first establish an optimal waveform design framework based on the rate-distortion (RD) and source-channel separation (SCT) theorems. After analyzing the relationships between the sensing distortion, coding rate, and communication channel capacity, we propose two distinct waveform design strategies in the scenario of target impulse response estimation. In the separated S\&C waveforms scheme, we equivalently transform the original problem into a power allocation problem and develop a low-complexity one-dimensional search algorithm, shedding light on a notable power allocation tradeoff between the S\&C waveform. In the dual-functional waveform scheme, we conceive a heuristic mutual information optimization algorithm for the general case, alongside a modified gradient projection algorithm tailored for the scenarios with independent sensing sub-channels. Additionally, we identify the presence of both subspace tradeoff and water-filling tradeoff in this scheme. Finally, we validate the effectiveness of the proposed algorithms through numerical simulations. △ Less

Submitted 15 March, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.01822 [pdf, other]

Random ISAC Signals Deserve Dedicated Precoding

Authors: Shihang Lu, Fan Liu, Fuwang Dong, Yifeng Xiong, Jie Xu, Ya-Feng Liu, Shi **

Abstract: Radar systems typically employ well-designed deterministic signals for target sensing, while integrated sensing and communications (ISAC) systems have to adopt random signals to convey useful information. This paper analyzes the sensing and ISAC performance relying on random signaling in a multi-antenna system. Towards this end, we define a new sensing performance metric, namely, ergodic linear mi… ▽ More Radar systems typically employ well-designed deterministic signals for target sensing, while integrated sensing and communications (ISAC) systems have to adopt random signals to convey useful information. This paper analyzes the sensing and ISAC performance relying on random signaling in a multi-antenna system. Towards this end, we define a new sensing performance metric, namely, ergodic linear minimum mean square error (ELMMSE), which characterizes the estimation error averaged over random ISAC signals. Then, we investigate a data-dependent precoding (DDP) scheme to minimize the ELMMSE in sensing-only scenarios, which attains the optimized performance at the cost of high implementation overhead. To reduce the cost, we present an alternative data-independent precoding (DIP) scheme by stochastic gradient projection (SGP). Moreover, we shed light on the optimal structures of both sensing-only DDP and DIP precoders. As a further step, we extend the proposed DDP and DIP approaches to ISAC scenarios, which are solved via a tailored penalty-based alternating optimization algorithm. Our numerical results demonstrate that the proposed DDP and DIP methods achieve substantial performance gains over conventional ISAC signaling schemes that treat the signal sample covariance matrix as deterministic, which proves that random ISAC signals deserve dedicated precoding designs. △ Less

Submitted 31 March, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

Comments: 15 pages, 12 figures

arXiv:2310.08418 [pdf, ps, other]

doi 10.1109/TSG.2024.3420743

Privacy-Preserved Aggregate Thermal Dynamic Model of Buildings

Authors: Zeyin Hou, Shuai Lu, Yijun Xu, Haifeng Qiu, Wei Gu, Zhaoyang Dong, Shixing Ding

Abstract: The thermal inertia of buildings brings considerable flexibility to the heating and cooling load, which is known to be a promising demand response resource. The aggregate model that can describe the thermal dynamics of the building cluster is an important interference for energy systems to exploit its intrinsic thermal inertia. However, the private information of users, such as the indoor temperat… ▽ More The thermal inertia of buildings brings considerable flexibility to the heating and cooling load, which is known to be a promising demand response resource. The aggregate model that can describe the thermal dynamics of the building cluster is an important interference for energy systems to exploit its intrinsic thermal inertia. However, the private information of users, such as the indoor temperature and heating/cooling power, needs to be collected in the parameter estimation procedure to obtain the aggregate model, causing severe privacy concerns. In light of this, we propose a novel privacy-preserved parameter estimation approach to infer the aggregate model for the thermal dynamics of the building cluster for the first time. Using it, the parameters of the aggregate thermal dynamic model (ATDM) can be obtained by the load aggregator without accessing the individual's privacy information. More specifically, this method not only exploits the block coordinate descent (BCD) method to resolve its non-convexity in the estimation but investigates the transformation-based encryption (TE) associated with its secure aggregation protocol (SAP) techniques to realize privacy-preserved computation. Its capability of preserving privacy is also theoretically proven. Finally, simulation results using real-world data demonstrate the accuracy and privacy-preserved performance of our proposed method. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2309.02375 [pdf, other]

Sensing With Random Signals

Authors: Shihang Lu, Fan Liu, Fuwang Dong, Yifeng Xiong, Jie Xu, Ya-Feng Liu

Abstract: Radar systems typically employ well-designed deterministic signals for target sensing. In contrast to that, integrated sensing and communications (ISAC) systems have to use random signals to convey useful information, potentially causing sensing performance degradation. In this paper, we define a new sensing performance metric, namely, ergodic linear minimum mean square error (ELMMSE), accounting… ▽ More Radar systems typically employ well-designed deterministic signals for target sensing. In contrast to that, integrated sensing and communications (ISAC) systems have to use random signals to convey useful information, potentially causing sensing performance degradation. In this paper, we define a new sensing performance metric, namely, ergodic linear minimum mean square error (ELMMSE), accounting for the randomness of ISAC signals. Then, we investigate a data-dependent precoding scheme to minimize the ELMMSE, which attains the optimized sensing performance at the price of high computational complexity. To reduce the complexity, we present an alternative data-independent precoding scheme and propose a stochastic gradient projection (SGP) algorithm for ELMMSE minimization, which can be trained offline by locally generated signal samples. Finally, we demonstrate the superiority of the proposed methods by simulations. △ Less

Submitted 14 January, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: 5 pages, 4 figures, accepted by ICASSP 2024

arXiv:2309.00853 [pdf]

Correlated and Multi-frequency Diffusion Modeling for Highly Under-sampled MRI Reconstruction

Authors: Yu Guan, Chuanming Yu, Shiyu Lu, Zhuoxu Cui, Dong Liang, Qiegen Liu

Abstract: Most existing MRI reconstruction methods perform tar-geted reconstruction of the entire MR image without tak-ing specific tissue regions into consideration. This may fail to emphasize the reconstruction accuracy on im-portant tissues for diagnosis. In this study, leveraging a combination of the properties of k-space data and the diffusion process, our novel scheme focuses on mining the multi-frequ… ▽ More Most existing MRI reconstruction methods perform tar-geted reconstruction of the entire MR image without tak-ing specific tissue regions into consideration. This may fail to emphasize the reconstruction accuracy on im-portant tissues for diagnosis. In this study, leveraging a combination of the properties of k-space data and the diffusion process, our novel scheme focuses on mining the multi-frequency prior with different strategies to pre-serve fine texture details in the reconstructed image. In addition, a diffusion process can converge more quickly if its target distribution closely resembles the noise distri-bution in the process. This can be accomplished through various high-frequency prior extractors. The finding further solidifies the effectiveness of the score-based gen-erative model. On top of all the advantages, our method improves the accuracy of MRI reconstruction and accel-erates sampling process. Experimental results verify that the proposed method successfully obtains more accurate reconstruction and outperforms state-of-the-art methods. △ Less

Submitted 2 September, 2023; originally announced September 2023.

arXiv:2308.15942 [pdf]

Stage-by-stage Wavelet Optimization Refinement Diffusion Model for Sparse-View CT Reconstruction

Authors: Kai Xu, Shiyu Lu, Bin Huang, Weiwen Wu, Qiegen Liu

Abstract: Diffusion models have emerged as potential tools to tackle the challenge of sparse-view CT reconstruction, displaying superior performance compared to conventional methods. Nevertheless, these prevailing diffusion models predominantly focus on the sinogram or image domains, which can lead to instability during model training, potentially culminating in convergence towards local minimal solutions.… ▽ More Diffusion models have emerged as potential tools to tackle the challenge of sparse-view CT reconstruction, displaying superior performance compared to conventional methods. Nevertheless, these prevailing diffusion models predominantly focus on the sinogram or image domains, which can lead to instability during model training, potentially culminating in convergence towards local minimal solutions. The wavelet trans-form serves to disentangle image contents and features into distinct frequency-component bands at varying scales, adeptly capturing diverse directional structures. Employing the Wavelet transform as a guiding sparsity prior significantly enhances the robustness of diffusion models. In this study, we present an innovative approach named the Stage-by-stage Wavelet Optimization Refinement Diffusion (SWORD) model for sparse-view CT reconstruction. Specifically, we establish a unified mathematical model integrating low-frequency and high-frequency generative models, achieving the solution with optimization procedure. Furthermore, we perform the low-frequency and high-frequency generative models on wavelet's decomposed components rather than sinogram or image domains, ensuring the stability of model training. Our method rooted in established optimization theory, comprising three distinct stages, including low-frequency generation, high-frequency refinement and domain transform. Our experimental results demonstrate that the proposed method outperforms existing state-of-the-art methods both quantitatively and qualitatively. △ Less

Submitted 3 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.11268 [pdf, ps, other]

Orthogonal Constant-Amplitude Sequence Families for System Parameter Identification in Spectrally Compact OFDM

Authors: Shih-Hao Lu, Char-Dir Chung, Wei-Chang Chen, **-Feng Tsou

Abstract: In rectangularly-pulsed orthogonal frequency division multiplexing (OFDM) systems, constant-amplitude (CA) sequences are desirable to construct preamble/pilot waveforms to facilitate system parameter identification (SPI). Orthogonal CA sequences are generally preferred in various SPI applications like random-access channel identification. However, the number of conventional orthogonal CA sequences… ▽ More In rectangularly-pulsed orthogonal frequency division multiplexing (OFDM) systems, constant-amplitude (CA) sequences are desirable to construct preamble/pilot waveforms to facilitate system parameter identification (SPI). Orthogonal CA sequences are generally preferred in various SPI applications like random-access channel identification. However, the number of conventional orthogonal CA sequences (e.g., Zadoff-Chu sequences) that can be adopted in cellular communication without causing sequence identification ambiguity is insufficient. Such insufficiency causes heavy performance degradation for SPI requiring a large number of identification sequences. Moreover, rectangularly-pulsed OFDM preamble/pilot waveforms carrying conventional CA sequences suffer from large power spectral sidelobes and thus exhibit low spectral compactness. This paper is thus motivated to develop several order-I CA sequence families which contain more orthogonal CA sequences while endowing the corresponding OFDM preamble/pilot waveforms with fast-decaying spectral sidelobes. Since more orthogonal sequences are provided, the developed order-I CA sequence families can enhance the performance characteristics in SPI requiring a large number of identification sequences over multipath channels exhibiting short-delay channel profiles, while composing spectrally compact OFDM preamble/pilot waveforms. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: 15 pages, 4 figures

arXiv:2308.10483 [pdf]

doi 10.1109/TSTE.2024.3383062

Aggregate Model of District Heating Network for Integrated Energy Dispatch: A Physically Informed Data-Driven Approach

Authors: Shuai Lu, Zihang Gao, Yong Sun, Suhan Zhang, Baoju Li, Chengliang Hao, Yijun Xu, Wei Gu

Abstract: The district heating network (DHN) is essential in enhancing the operational flexibility of integrated energy systems (IES). Yet, it is hard to obtain an accurate and concise DHN model for the operation owing to complicated network features and imperfect measurements. Considering this, this paper proposes a physical-ly informed data-driven aggregate model (AGM) for the DHN, providing a concise des… ▽ More The district heating network (DHN) is essential in enhancing the operational flexibility of integrated energy systems (IES). Yet, it is hard to obtain an accurate and concise DHN model for the operation owing to complicated network features and imperfect measurements. Considering this, this paper proposes a physical-ly informed data-driven aggregate model (AGM) for the DHN, providing a concise description of the source-load relationship of DHN without exposing network details. First, we derive the analytical relationship between the state variables of the source and load nodes of the DHN, offering a physical fundament for the AGM. Second, we propose a physics-informed estimator for the AGM that is robust to low-quality measurements, in which the physical constraints associated with the parameter normalization and sparsity are embedded to improve the accuracy and robustness. Finally, we propose a physics-enhanced algorithm to solve the nonlinear estimator with non-closed constraints efficiently. Simulation results verify the effectiveness of the proposed method. △ Less

Submitted 27 March, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.08185 [pdf, other]

Sensing as a Service in 6G Perceptive Mobile Networks: Architecture, Advances, and the Road Ahead

Authors: Fuwang Dong, Fan Liu, Yuanhao Cui, Shihang Lu, Yunxin Li

Abstract: Sensing-as-a-service is anticipated to be the core feature of 6G perceptive mobile networks (PMN), where high-precision real-time sensing will become an inherent capability rather than being an auxiliary function as before. With the proliferation of wireless connected devices, resource allocation (RA) in terms of the users' specific quality-of-service (QoS) requirements plays a pivotal role in enh… ▽ More Sensing-as-a-service is anticipated to be the core feature of 6G perceptive mobile networks (PMN), where high-precision real-time sensing will become an inherent capability rather than being an auxiliary function as before. With the proliferation of wireless connected devices, resource allocation (RA) in terms of the users' specific quality-of-service (QoS) requirements plays a pivotal role in enhancing interference management ability and resource utilization efficiency. In this article, we comprehensively introduce the concept of sensing service in PMN, including the types of tasks, the distinctions/advantages compared to conventional networks, and the definitions of sensing QoS. Subsequently, we provide a unified RA framework in sensing-centric PMN and elaborate on the unique challenges. Furthermore, we present a typical case study named "communication-assisted sensing" and evaluate the performance trade-off between sensing and communication procedures. Finally, we shed light on several open problems and opportunities deserving further investigation in the future. △ Less

Submitted 8 November, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

arXiv:2305.11399 [pdf, other]

Waveform Design for Communication-Assisted Sensing in 6G Perceptive Networks

Authors: Fuwang Dong, Fan Liu, Shihang Lu, Weijie Yuan, Yuanhao Cui, Yifeng Xiong, Feifei Gao

Abstract: The integrated sensing and communication (ISAC) technique has the potential to achieve coordination gain by exploiting the mutual assistance between sensing and communication (S&C) functions. While the sensing-assisted communications (SAC) technology has been extensively studied for high-mobility scenarios, the communication-assisted sensing (CAS) counterpart remains widely unexplored. This paper… ▽ More The integrated sensing and communication (ISAC) technique has the potential to achieve coordination gain by exploiting the mutual assistance between sensing and communication (S&C) functions. While the sensing-assisted communications (SAC) technology has been extensively studied for high-mobility scenarios, the communication-assisted sensing (CAS) counterpart remains widely unexplored. This paper presents a waveform design framework for CAS in 6G perceptive networks, aiming to attain an optimal sensing quality of service (QoS) at the user after the target's parameters successively ``pass-through'' the S$\&$C channels. In particular, a pair of transmission schemes, namely, separated S&C and dual-functional waveform designs, are proposed to optimize the sensing QoS under the constraints of the rate-distortion and power budget. The first scheme reveals a power allocation trade-off, while the latter presents a water-filling trade-off. Numerical results demonstrate the effectiveness of the proposed algorithms, where the dual-functional scheme exhibits approximately 25% performance gain compared to its separated waveform design counterpart. △ Less

Submitted 20 July, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.00179 [pdf, other]

Integrated Sensing and Communications: Recent Advances and Ten Open Challenges

Authors: Shihang Lu, Fan Liu, Yunxin Li, Kecheng Zhang, Hongjia Huang, Jiaqi Zou, Xinyu Li, Yuxiang Dong, Fuwang Dong, Jia Zhu, Yifeng Xiong, Weijie Yuan, Yuanhao Cui, Lajos Hanzo

Abstract: It is anticipated that integrated sensing and communications (ISAC) would be one of the key enablers of next-generation wireless networks (such as beyond 5G (B5G) and 6G) for supporting a variety of emerging applications. In this paper, we provide a comprehensive review of the recent advances in ISAC systems, with a particular focus on their foundations, system design, networking aspects and ISAC… ▽ More It is anticipated that integrated sensing and communications (ISAC) would be one of the key enablers of next-generation wireless networks (such as beyond 5G (B5G) and 6G) for supporting a variety of emerging applications. In this paper, we provide a comprehensive review of the recent advances in ISAC systems, with a particular focus on their foundations, system design, networking aspects and ISAC applications. Furthermore, we discuss the corresponding open questions of the above that emerged in each issue. Hence, we commence with the information theory of sensing and communications (S$\&$C), followed by the information-theoretic limits of ISAC systems by shedding light on the fundamental performance metrics. Next, we discuss their clock synchronization and phase offset problems, the associated Pareto-optimal signaling strategies, as well as the associated super-resolution ISAC system design. Moreover, we envision that ISAC ushers in a paradigm shift for the future cellular networks relying on network sensing, transforming the classic cellular architecture, cross-layer resource management methods, and transmission protocols. In ISAC applications, we further highlight the security and privacy issues of wireless sensing. Finally, we close by studying the recent advances in a representative ISAC use case, namely the multi-object multi-task (MOMT) recognition problem using wireless signals. △ Less

Submitted 17 December, 2023; v1 submitted 29 April, 2023; originally announced May 2023.

Comments: 26 pages, 22 figures, resubmitted to IEEE Journal. Appreciation for the outstanding contributions of coauthors in the paper!

arXiv:2303.11857 [pdf, other]

Rethinking Estimation Rate for Wireless Sensing: A Rate-Distortion Perspective

Authors: Fuwang Dong, Fan Liu, Shihang Lu, Yifeng Xiong

Abstract: Wireless sensing has been recognized as a key enabling technology for numerous emerging applications. For decades, the sensing performance was mostly evaluated from a reliability perspective, with the efficiency aspect widely unexplored. Motivated from both backgrounds of rate-distortion theory and optimal sensing waveform design, a novel efficiency metric, namely, the sensing estimation rate (SER… ▽ More Wireless sensing has been recognized as a key enabling technology for numerous emerging applications. For decades, the sensing performance was mostly evaluated from a reliability perspective, with the efficiency aspect widely unexplored. Motivated from both backgrounds of rate-distortion theory and optimal sensing waveform design, a novel efficiency metric, namely, the sensing estimation rate (SER), is defined to unify the information- and estimation- theoretic perspectives of wireless sensing. Specifically, the active sensing process is characterized as a virtual lossy data transmission through non-cooperative joint source-channel coding. The bounds of SER are analyzed based on the data processing inequality, followed by a detailed derivation of achievable bounds under the special cases of the Gaussian linear model (GLM) and semi-controllable GLM. As for the intractable non-linear model, a computable upper bound is also given in terms of the Bayesian Cramér-Rao bound (BCRB). Finally, we show the rationality and effectiveness of the SER defined by comparing to the related works. △ Less

Submitted 12 June, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

arXiv:2302.14677 [pdf, other]

Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger

Authors: Yi Yu, Yufei Wang, Wenhan Yang, Shijian Lu, Yap-peng Tan, Alex C. Kot

Abstract: Recent deep-learning-based compression methods have achieved superior performance compared with traditional approaches. However, deep learning models have proven to be vulnerable to backdoor attacks, where some specific trigger patterns added to the input can lead to malicious behavior of the models. In this paper, we present a novel backdoor attack with multiple triggers against learned image com… ▽ More Recent deep-learning-based compression methods have achieved superior performance compared with traditional approaches. However, deep learning models have proven to be vulnerable to backdoor attacks, where some specific trigger patterns added to the input can lead to malicious behavior of the models. In this paper, we present a novel backdoor attack with multiple triggers against learned image compression models. Motivated by the widely used discrete cosine transform (DCT) in existing compression systems and standards, we propose a frequency-based trigger injection model that adds triggers in the DCT domain. In particular, we design several attack objectives for various attacking scenarios, including: 1) attacking compression quality in terms of bit-rate and reconstruction quality; 2) attacking task-driven measures, such as down-stream face recognition and semantic segmentation. Moreover, a novel simple dynamic loss is designed to balance the influence of different loss terms adaptively, which helps achieve more efficient training. Extensive experiments show that with our trained trigger injection models and simple modification of encoder parameters (of the compression model), the proposed attack can successfully inject several backdoors with corresponding triggers in a single image compression model. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: Accepted by CVPR 2023

ACM Class: I.4

arXiv:2302.02922 [pdf, other]

Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks

Authors: Shuai Zhang, Meng Wang, Pin-Yu Chen, Sijia Liu, Songtao Lu, Miao Liu

Abstract: Due to the significant computational challenge of training large-scale graph neural networks (GNNs), various sparse learning techniques have been exploited to reduce memory and storage costs. Examples include \textit{graph sparsification} that samples a subgraph to reduce the amount of data aggregation and \textit{model sparsification} that prunes the neural network to reduce the number of trainab… ▽ More Due to the significant computational challenge of training large-scale graph neural networks (GNNs), various sparse learning techniques have been exploited to reduce memory and storage costs. Examples include \textit{graph sparsification} that samples a subgraph to reduce the amount of data aggregation and \textit{model sparsification} that prunes the neural network to reduce the number of trainable weights. Despite the empirical successes in reducing the training cost while maintaining the test accuracy, the theoretical generalization analysis of sparse learning for GNNs remains elusive. To the best of our knowledge, this paper provides the first theoretical characterization of joint edge-model sparse learning from the perspective of sample complexity and convergence rate in achieving zero generalization error. It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy. Although the analysis is centered on two-layer GNNs with structural constraints on data, the insights are applicable to more general setups and justified by both synthetic and practical citation datasets. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Journal ref: The Eleventh International Conference on Learning Representations, 2023

arXiv:2212.03630 [pdf]

One Sample Diffusion Model in Projection Domain for Low-Dose CT Imaging

Authors: Bin Huang, Liu Zhang, Shiyu Lu, Boyu Lin, Weiwen Wu, Qiegen Liu

Abstract: Low-dose computed tomography (CT) plays a significant role in reducing the radiation risk in clinical applications. However, lowering the radiation dose will significantly degrade the image quality. With the rapid development and wide application of deep learning, it has brought new directions for the development of low-dose CT imaging algorithms. Therefore, we propose a fully unsupervised one sam… ▽ More Low-dose computed tomography (CT) plays a significant role in reducing the radiation risk in clinical applications. However, lowering the radiation dose will significantly degrade the image quality. With the rapid development and wide application of deep learning, it has brought new directions for the development of low-dose CT imaging algorithms. Therefore, we propose a fully unsupervised one sample diffusion model (OSDM)in projection domain for low-dose CT reconstruction. To extract sufficient prior information from single sample, the Hankel matrix formulation is employed. Besides, the penalized weighted least-squares and total variation are introduced to achieve superior image quality. Specifically, we first train a score-based generative model on one sinogram by extracting a great number of tensors from the structural-Hankel matrix as the network input to capture prior distribution. Then, at the inference stage, the stochastic differential equation solver and data consistency step are performed iteratively to obtain the sinogram data. Finally, the final image is obtained through the filtered back-projection algorithm. The reconstructed results are approaching to the normal-dose counterparts. The results prove that OSDM is practical and effective model for reducing the artifacts and preserving the image quality. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: 11 pages, 11 figures. arXiv admin note: text overlap with arXiv:2211.13926

arXiv:2211.00434 [pdf, other]

On the Performance Gain of Integrated Sensing and Communications: A Subspace Correlation Perspective

Authors: Shihang Lu, Xiao Meng, Zhen Du, Yifeng Xiong, Fan Liu

Abstract: In this paper, we shed light on the performance gain of integrated sensing and communications (ISAC) from the perspective of channel correlations between radar sensing and communication (S&C), namely ISAC subspace correlation. To begin with, we consider a multi-input multi-output (MIMO) ISAC system and reveal that the optimal ISAC signal is in the subspace spanned by the transmitted steering vecto… ▽ More In this paper, we shed light on the performance gain of integrated sensing and communications (ISAC) from the perspective of channel correlations between radar sensing and communication (S&C), namely ISAC subspace correlation. To begin with, we consider a multi-input multi-output (MIMO) ISAC system and reveal that the optimal ISAC signal is in the subspace spanned by the transmitted steering vectors of the sensing channel and the right singular matrix of the communication channel. By leveraging this result, we study a basic ISAC scenario with a single target and a single-antenna communication user, and derive the optimal waveform covariance matrix for minimizing the estimation error under a given communication rate constraint. To quantify the integration gain of ISAC systems, we define the subspace "correlation coefficient" to characterize the coupling effect between S&C channels. Finally, numerical results are provided to validate the effectiveness of the proposed approaches. △ Less

Submitted 2 November, 2022; v1 submitted 1 November, 2022; originally announced November 2022.

Comments: 6 pages, 5 figures, submitted to IEEE conference

arXiv:2210.17408 [pdf, ps, other]

Accelerating Diffusion Models via Pre-segmentation Diffusion Sampling for Medical Image Segmentation

Authors: Xutao Guo, Yanwu Yang, Chenfei Ye, Shang Lu, Yang Xiang, Ting Ma

Abstract: Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian n… ▽ More Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian noise, resulting in extremely inefficient inference. To mitigate the issue, we propose a principled acceleration strategy, called pre-segmentation diffusion sampling DDPM (PD-DDPM), which is specially used for medical image segmentation. The key idea is to obtain pre-segmentation results based on a separately trained segmentation network, and construct noise predictions (non-Gaussian distribution) according to the forward diffusion rule. We can then start with noisy predictions and use fewer reverse steps to generate segmentation results. Experiments show that PD-DDPM yields better segmentation results over representative baseline methods even if the number of reverse steps is significantly reduced. Moreover, PD-DDPM is orthogonal to existing advanced segmentation models, which can be combined to further improve the segmentation performance. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.13987 [pdf, other]

RIS-assisted Integrated Sensing and Communications: A Subspace Rotation Approach

Authors: Xiao Meng, Fan Liu, Shihang Lu, Sundeep Prabhakar Chepuri, Christos Masouros

Abstract: In this paper, we propose a novel joint active and passive beamforming approach for integrated sensing and communication (ISAC) transmission with assistance of reconfigurable intelligent surfaces (RISs) to simultaneously detect a target and communicate with a communication user. We first show that the sensing and communication (S&C) performance can be jointly improved due to the capability of the… ▽ More In this paper, we propose a novel joint active and passive beamforming approach for integrated sensing and communication (ISAC) transmission with assistance of reconfigurable intelligent surfaces (RISs) to simultaneously detect a target and communicate with a communication user. We first show that the sensing and communication (S&C) performance can be jointly improved due to the capability of the RISs to control the ISAC channel. In particular, we show that RISs can favourably enhance both the channel gain and the coupling degree of S&C channels by modifying the underlying subspaces. In light of this, we develop a heuristic algorithm that expands and rotates the S&C subspaces that is able to attain significantly improved ISAC performance. To verify the effectiveness of the subspace rotation scheme, we further provide a benchmark scheme which maximizes the signal-to-noise ratio (SNR) at the sensing receiver while guaranteeing the SNR at the communication user. Finally, numerical simulations are provided to validate the proposed approaches. △ Less

Submitted 23 October, 2022; originally announced October 2022.

arXiv:2209.06261 [pdf, other]

Real2Sim2Real Transfer for Control of Cable-driven Robots via a Differentiable Physics Engine

Authors: Kun Wang, William R. Johnson III, Shiyang Lu, Xiaonan Huang, Joran Booth, Rebecca Kramer-Bottiglio, Mridul Aanjaneya, Kostas Bekris

Abstract: Tensegrity robots, composed of rigid rods and flexible cables, exhibit high strength-to-weight ratios and significant deformations, which enable them to navigate unstructured terrains and survive harsh impacts. They are hard to control, however, due to high dimensionality, complex dynamics, and a coupled architecture. Physics-based simulation is a promising avenue for develo** locomotion policie… ▽ More Tensegrity robots, composed of rigid rods and flexible cables, exhibit high strength-to-weight ratios and significant deformations, which enable them to navigate unstructured terrains and survive harsh impacts. They are hard to control, however, due to high dimensionality, complex dynamics, and a coupled architecture. Physics-based simulation is a promising avenue for develo** locomotion policies that can be transferred to real robots. Nevertheless, modeling tensegrity robots is a complex task due to a substantial sim2real gap. To address this issue, this paper describes a Real2Sim2Real (R2S2R) strategy for tensegrity robots. This strategy is based on a differentiable physics engine that can be trained given limited data from a real robot. These data include offline measurements of physical properties, such as mass and geometry for various robot components, and the observation of a trajectory using a random control policy. With the data from the real robot, the engine can be iteratively refined and used to discover locomotion policies that are directly transferable to the real robot. Beyond the R2S2R pipeline, key contributions of this work include computing non-zero gradients at contact points, a loss function for matching tensegrity locomotion gaits, and a trajectory segmentation technique that avoids conflicts in gradient evaluation during training. Multiple iterations of the R2S2R process are demonstrated and evaluated on a real 3-bar tensegrity robot. △ Less

Submitted 17 September, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

Comments: Accepted to IROS2023; https://sites.google.com/view/sim2real

arXiv:2208.12923 [pdf, other]

Global RTK Positioning in Graphical State Space

Authors: Yihong Ge, Sudan Yan, Shaolin Lü, Cong Li

Abstract: This paper proposes a new method for RTK post-processing. Different from the traditional forward-backward Kalman filter, in our method, the whole system equation is built on a graphical state space model and solved by factor graph optimization. The position solution provided by the forward Kalman filter is used as the linearization points of the graphical state space model. Constant variables, suc… ▽ More This paper proposes a new method for RTK post-processing. Different from the traditional forward-backward Kalman filter, in our method, the whole system equation is built on a graphical state space model and solved by factor graph optimization. The position solution provided by the forward Kalman filter is used as the linearization points of the graphical state space model. Constant variables, such as double-difference ambiguity, will exist as constants in the graphical state space model, not as time-series variables. It is shown by experiment results that factor graph optimization with a graphical state space model is more effective than Kalman filter with a traditional discrete-time state space model for RTK post-processing problem. △ Less

Submitted 8 November, 2022; v1 submitted 26 August, 2022; originally announced August 2022.

arXiv:2205.14285 [pdf, other]

P2M-DeTrack: Processing-in-Pixel-in-Memory for Energy-efficient and Real-Time Multi-Object Detection and Tracking

Authors: Gourav Datta, Souvik Kundu, Zihan Yin, Joe Mathai, Zeyu Liu, Zixu Wang, Mulin Tian, Shunlin Lu, Ravi T. Lakkireddy, Andrew Schmidt, Wael Abd-Almageed, Ajey P. Jacob, Akhilesh R. Jaiswal, Peter A. Beerel

Abstract: Today's high resolution, high frame rate cameras in autonomous vehicles generate a large volume of data that needs to be transferred and processed by a downstream processor or machine learning (ML) accelerator to enable intelligent computing tasks, such as multi-object detection and tracking. The massive amount of data transfer incurs significant energy, latency, and bandwidth bottlenecks, which h… ▽ More Today's high resolution, high frame rate cameras in autonomous vehicles generate a large volume of data that needs to be transferred and processed by a downstream processor or machine learning (ML) accelerator to enable intelligent computing tasks, such as multi-object detection and tracking. The massive amount of data transfer incurs significant energy, latency, and bandwidth bottlenecks, which hinders real-time processing. To mitigate this problem, we propose an algorithm-hardware co-design framework called Processing-in-Pixel-in-Memory-based object Detection and Tracking (P2M-DeTrack). P2M-DeTrack is based on a custom faster R-CNN-based model that is distributed partly inside the pixel array (front-end) and partly in a separate FPGA/ASIC (back-end). The proposed front-end in-pixel processing down-samples the input feature maps significantly with judiciously optimized strided convolution and pooling. Compared to a conventional baseline design that transfers frames of RGB pixels to the back-end, the resulting P2M-DeTrack designs reduce the data bandwidth between sensor and back-end by up to 24x. The designs also reduce the sensor and total energy (obtained from in-house circuit simulations at Globalfoundries 22nm technology node) per frame by 5.7x and 1.14x, respectively. Lastly, they reduce the sensing and total frame latency by an estimated 1.7x and 3x, respectively. We evaluate our approach on the multi-object object detection (tracking) task of the large-scale BDD100K dataset and observe only a 0.5% reduction in the mean average precision (0.8% reduction in the identification F1 score) compared to the state-of-the-art. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: 6 pages, 4 figures, 4 tables

arXiv:2205.06225 [pdf, ps, other]

doi 10.1109/TSP.2023.3244104

Rethinking WMMSE: Can Its Complexity Scale Linearly With the Number of BS Antennas?

Authors: Xiaotong Zhao, Siyuan Lu, Qingjiang Shi, Zhi-Quan Luo

Abstract: Precoding design for maximizing weighted sum-rate (WSR) is a fundamental problem for downlink of massive multi-user multiple-input multiple-output (MU-MIMO) systems. It is well-known that this problem is generally NP-hard due to the presence of multi-user interference. The weighted minimum mean-square error (WMMSE) algorithm is a popular approach for WSR maximization. However, its computational co… ▽ More Precoding design for maximizing weighted sum-rate (WSR) is a fundamental problem for downlink of massive multi-user multiple-input multiple-output (MU-MIMO) systems. It is well-known that this problem is generally NP-hard due to the presence of multi-user interference. The weighted minimum mean-square error (WMMSE) algorithm is a popular approach for WSR maximization. However, its computational complexity is cubic in the number of base station (BS) antennas, which is unaffordable when the BS is equipped with a large antenna array. In this paper, we consider the WSR maximization problem with either a sum-power constraint (SPC) or per-antenna power constraints (PAPCs). For the former, we prove that any nontrivial stationary point must have a low-dimensional subspace structure, and then propose a reduced-WMMSE (R-WMMSE) with linear complexity by exploiting the solution structure. For the latter, we propose a linear-complexity WMMSE approach, named PAPC-WMMSE, by using a novel recursive design of the algorithm. Both R-WMMSE and PAPC-WMMSE have simple closed-form updates and guaranteed convergence to stationary points. Simulation results verify the efficacy of the proposed designs, especially the much lower complexity as compared to the state-of-the-art approaches for massive MU-MIMO systems. △ Less

Submitted 22 May, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

arXiv:2205.04010 [pdf, other]

doi 10.1109/TVT.2022.3210307

The Degrees-of-Freedom in Monostatic ISAC Channels: NLoS Exploitation vs. Reduction

Authors: Shihang Lu, Fan Liu, Lajos Hanzo

Abstract: The degrees of freedom (DoFs) attained in monostatic integrated sensing and communications (ISAC) are analyzed. Specifically, monostatic sensing aims for extracting target-orientation information from the line of sight (LoS) channel between the transmitter and the target, since the Non-LoS (NLoS) paths only contain clutter or interference. By contrast, in wireless communications, typically, both t… ▽ More The degrees of freedom (DoFs) attained in monostatic integrated sensing and communications (ISAC) are analyzed. Specifically, monostatic sensing aims for extracting target-orientation information from the line of sight (LoS) channel between the transmitter and the target, since the Non-LoS (NLoS) paths only contain clutter or interference. By contrast, in wireless communications, typically, both the LoS and NLoS paths are exploited for achieving diversity or multiplexing gains. Hence, we shed light on the NLoS exploitation vs. reduction tradeoffs in a monostatic ISAC scenario. In particular, we optimize the transmit power of each signal path to maximize the communication rate, while guaranteeing the sensing performance for the target. The non-convex problem formulated is firstly solved in closed form for a single-NLoS-link scenario, then we harness the popular successive convex approximation (SCA) method for a general multiple-NLoS-link scenario. Our simulation results characterize the fundamental performance tradeoffs between sensing and communication, demonstrating that the available DoFs in the ISAC channel should be efficiently exploited in a way that is distinctly different from that of communication-only scenarios. △ Less

Submitted 8 May, 2022; originally announced May 2022.

Comments: Submit to IEEE Journal. 5 pages, 4 figures

arXiv:2204.00442 [pdf, other]

Marginal Contrastive Correspondence for Guided Image Generation

Authors: Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Changgong Zhang

Abstract: Exemplar-based image translation establishes dense correspondences between a conditional input and an exemplar (from two different domains) for leveraging detailed exemplar styles to achieve realistic image translation. Existing work builds the cross-domain correspondences implicitly by minimizing feature-wise distances across the two domains. Without explicit exploitation of domain-invariant feat… ▽ More Exemplar-based image translation establishes dense correspondences between a conditional input and an exemplar (from two different domains) for leveraging detailed exemplar styles to achieve realistic image translation. Existing work builds the cross-domain correspondences implicitly by minimizing feature-wise distances across the two domains. Without explicit exploitation of domain-invariant features, this approach may not reduce the domain gap effectively which often leads to sub-optimal correspondences and image translation. We design a Marginal Contrastive Learning Network (MCL-Net) that explores contrastive learning to learn domain-invariant features for realistic exemplar-based image translation. Specifically, we design an innovative marginal contrastive loss that guides to establish dense correspondences explicitly. Nevertheless, building correspondence with domain-invariant semantics alone may impair the texture patterns and lead to degraded texture generation. We thus design a Self-Correlation Map (SCM) that incorporates scene structures as auxiliary information which improves the built correspondences substantially. Quantitative and qualitative experiments on multifarious image translation tasks show that the proposed method outperforms the state-of-the-art consistently. △ Less

Submitted 1 April, 2022; originally announced April 2022.

Comments: Accepted to CVPR 2022 (Oral Presentation)

arXiv:2203.13875 [pdf]

doi 10.1039/D2DD00066K

Semi-supervised machine learning model for analysis of nanowire morphologies from transmission electron microscopy images

Authors: Shizhao Lu, Brian Montz, Todd Emrick, Arthi Jayaraman

Abstract: In the field of materials science, microscopy is the first and often only accessible method for structural characterization. There is a growing interest in the development of machine learning methods that can automate the analysis and interpretation of microscopy images. Typically training of machine learning models requires large numbers of images with associated structural labels, however, manua… ▽ More In the field of materials science, microscopy is the first and often only accessible method for structural characterization. There is a growing interest in the development of machine learning methods that can automate the analysis and interpretation of microscopy images. Typically training of machine learning models requires large numbers of images with associated structural labels, however, manual labeling of images requires domain knowledge and is prone to human error and subjectivity. To overcome these limitations, we present a semi-supervised transfer learning approach that uses a small number of labeled microscopy images for training and performs as effectively as methods trained on significantly larger image datasets. Specifically, we train an image encoder with unlabeled images using self-supervised learning methods and use that encoder for transfer learning of different downstream image tasks (classification and segmentation) with a minimal number of labeled images for training. We test the transfer learning ability of two self-supervised learning methods: SimCLR and Barlow-Twins on transmission electron microscopy (TEM) images. We demonstrate in detail how this machine learning workflow applied to TEM images of protein nanowires enables automated classification of nanowire morphologies (e.g., single nanowires, nanowire bundles, phase separated) as well as segmentation tasks that can serve as groundwork for quantification of nanowire domain sizes and shape analysis. We also extend the application of the machine learning workflow to classification of nanoparticle morphologies and identification of different type of viruses from TEM images. △ Less

Submitted 27 September, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

Comments: 18 pages, 10 figures

arXiv:2202.06303 [pdf, other]

On the Exactness of an Energy-efficient Train Control model based on Convex Optimization

Authors: Shaofeng Lu, Minling Feng, Kunpeng Wu

Abstract: In this paper, we demonstrate the exactness proof for the energy-efficient train control (EETC) model based on convex optimization. The proof of exactness shows that the convex optimization model will share the same optimization results with the initial model on which the convex relaxations are conducted. We first show how the relaxation on the initial non-convex model is conducted and provide ana… ▽ More In this paper, we demonstrate the exactness proof for the energy-efficient train control (EETC) model based on convex optimization. The proof of exactness shows that the convex optimization model will share the same optimization results with the initial model on which the convex relaxations are conducted. We first show how the relaxation on the initial non-convex model is conducted and provide analysis to show that the relaxations are convex constraints and the relaxed model is thus a convex model. Subsequently, we prove that the relaxed convex model will always achieve its optimal solution on the initial equality constraints and the optimal solution achieved by convex optimization will be the same as the one obtained by the initial non-convex model and the relaxations applied are exact. A numerical verification has been conducted based on a typical urban rail system with a steep gradient. The results of this paper shed lights on further applications of convex optimization on energy-efficient train control and relevant areas related to operation and control of low-carbon transportation systems. △ Less

Submitted 13 February, 2022; originally announced February 2022.

Comments: 11 pages and 4 figures

arXiv:2201.11999 [pdf, other]

doi 10.1145/3474085.3475180

Dual Learning Music Composition and Dance Choreography

Authors: Shuang Wu, Zhenguang Li, Shijian Lu, Li Cheng

Abstract: Music and dance have always co-existed as pillars of human activities, contributing immensely to the cultural, social, and entertainment functions in virtually all societies. Notwithstanding the gradual systematization of music and dance into two independent disciplines, their intimate connection is undeniable and one art-form often appears incomplete without the other. Recent research works have… ▽ More Music and dance have always co-existed as pillars of human activities, contributing immensely to the cultural, social, and entertainment functions in virtually all societies. Notwithstanding the gradual systematization of music and dance into two independent disciplines, their intimate connection is undeniable and one art-form often appears incomplete without the other. Recent research works have studied generative models for dance sequences conditioned on music. The dual task of composing music for given dances, however, has been largely overlooked. In this paper, we propose a novel extension, where we jointly model both tasks in a dual learning approach. To leverage the duality of the two modalities, we introduce an optimal transport objective to align feature embeddings, as well as a cycle consistency loss to foster overall consistency. Experimental results demonstrate that our dual learning framework improves individual task performance, delivering generated music compositions and dance choreographs that are realistic and faithful to the conditioned inputs. △ Less

Submitted 28 January, 2022; originally announced January 2022.

Comments: ACMMM 2021 (Oral)

arXiv:2201.10731 [pdf, other]

A fast-solved model for energy-efficient train control based on convex optimization

Authors: Minling Feng, Kunpeng Wu, Shaofeng Lu

Abstract: In modern rail transportation, energy-efficient train control (EETC) is concerned with the optimal train speed trajectory or control strategies to achieve the minimum energy cost under various operation and traction constraints. This paper proposes an EETC model based on convex optimization so that the model can be rapidly solved by convex optimization algorithms. The high computational efficiency… ▽ More In modern rail transportation, energy-efficient train control (EETC) is concerned with the optimal train speed trajectory or control strategies to achieve the minimum energy cost under various operation and traction constraints. This paper proposes an EETC model based on convex optimization so that the model can be rapidly solved by convex optimization algorithms. The high computational efficiency and robustness of the convex model can be verified by comparing the results achieved by the method proposed by this paper and other mainstream mathematical programming methods including mixed-integer linear programming (MILP) and Radau pseudospectral method (RPM). Based on the characteristics of convex optimization, the proposed method boasts more significant advantages over its counterparts in terms of computational efficiency in the promising online applications for automatic train control systems of various types of rail transportation. △ Less

Submitted 25 January, 2022; originally announced January 2022.

Comments: 10 pages, 5 figures

arXiv:2112.01806 [pdf, other]

Music-to-Dance Generation with Optimal Transport

Authors: Shuang Wu, Shijian Lu, Li Cheng

Abstract: Dance choreography for a piece of music is a challenging task, having to be creative in presenting distinctive stylistic dance elements while taking into account the musical theme and rhythm. It has been tackled by different approaches such as similarity retrieval, sequence-to-sequence modeling and generative adversarial networks, but their generated dance sequences are often short of motion reali… ▽ More Dance choreography for a piece of music is a challenging task, having to be creative in presenting distinctive stylistic dance elements while taking into account the musical theme and rhythm. It has been tackled by different approaches such as similarity retrieval, sequence-to-sequence modeling and generative adversarial networks, but their generated dance sequences are often short of motion realism, diversity and music consistency. In this paper, we propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographies from music. We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music. This gives a well defined and non-divergent training objective that mitigates the limitation of standard GAN training which is frequently plagued with instability and divergent generator loss issues. Extensive experiments demonstrate that our MDOT-Net can synthesize realistic and diverse dances which achieve an organic unity with the input music, reflecting the shared intentionality and matching the rhythmic articulation. Sample results are found at https://www.youtube.com/watch?v=dErfBkrlUO8. △ Less

Submitted 4 May, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

Comments: IJCAI 2022

arXiv:2111.02581 [pdf, ps, other]

doi 10.1109/TWC.2021.3125954

Optimal Discrete Constellation Inputs for Aggregated LiFi-WiFi Networks

Authors: Shuai Ma, Fan Zhang, Songtao Lu, Hang Li, Ruixin Yang, Sihua Shao, Jiaheng Wang, Shiyin Li

Abstract: In this paper, we investigate the performance of a practical aggregated LiFi-WiFi system with the discrete constellation inputs from a practical view. We derive the achievable rate expressions of the aggregated LiFi-WiFi system for the first time. Then, we study the rate maximization problem via optimizing the constellation distribution and power allocation jointly. Specifically, a multilevel merc… ▽ More In this paper, we investigate the performance of a practical aggregated LiFi-WiFi system with the discrete constellation inputs from a practical view. We derive the achievable rate expressions of the aggregated LiFi-WiFi system for the first time. Then, we study the rate maximization problem via optimizing the constellation distribution and power allocation jointly. Specifically, a multilevel mercy-filling power allocation scheme is proposed by exploiting the relationship between the mutual information and minimum mean-squared error (MMSE) of discrete inputs. Meanwhile, an inexact gradient descent method is proposed for obtaining the optimal probability distributions. To strike a balance between the computational complexity and the transmission performance, we further develop a framework that maximizes the lower bound of the achievable rate where the optimal power allocation can be obtained in closed forms and the constellation distributions problem can be solved efficiently by Frank-Wolfe method. Extensive numerical results show that the optimized strategies are able to provide significant gains over the state-of-the-art schemes in terms of the achievable rate. △ Less

Submitted 3 November, 2021; originally announced November 2021.

Comments: 14 pages, 13 figures, accepted by IEEE Transactions on Wireless Communications

arXiv:2109.08839 [pdf, other]

SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification

Authors: Wentao Zhu, Tianlong Kong, Shun Lu, Jixiang Li, Dawei Zhang, Feng Deng, Xiaorui Wang, Sen Yang, Ji Liu

Abstract: Recently, x-vector has been a successful and popular approach for speaker verification, which employs a time delay neural network (TDNN) and statistics pooling to extract speaker characterizing embedding from variable-length utterances. Improvement upon the x-vector has been an active research area, and enormous neural networks have been elaborately designed based on the x-vector, eg, extended TDN… ▽ More Recently, x-vector has been a successful and popular approach for speaker verification, which employs a time delay neural network (TDNN) and statistics pooling to extract speaker characterizing embedding from variable-length utterances. Improvement upon the x-vector has been an active research area, and enormous neural networks have been elaborately designed based on the x-vector, eg, extended TDNN (E-TDNN), factorized TDNN (F-TDNN), and densely connected TDNN (D-TDNN). In this work, we try to identify the optimal architectures from a TDNN based search space employing neural architecture search (NAS), named SpeechNAS. Leveraging the recent advances in the speaker recognition, such as high-order statistics pooling, multi-branch mechanism, D-TDNN and angular additive margin softmax (AAM) loss with a minimum hyper-spherical energy (MHE), SpeechNAS automatically discovers five network architectures, from SpeechNAS-1 to SpeechNAS-5, of various numbers of parameters and GFLOPs on the large-scale text-independent speaker recognition dataset VoxCeleb1. Our derived best neural network achieves an equal error rate (EER) of 1.02% on the standard test set of VoxCeleb1, which surpasses previous TDNN based state-of-the-art approaches by a large margin. Code and trained weights are in https://github.com/wentaozhu/speechnas.git △ Less

Submitted 18 September, 2021; originally announced September 2021.

Comments: 8 pages, 3 figures, 3 tables. Accepted by ASRU2021

arXiv:2107.11027 [pdf, other]

WaveFill: A Wavelet-based Generation Network for Image Inpainting

Authors: Yingchen Yu, Fangneng Zhan, Shijian Lu, Jianxiong Pan, Feiying Ma, Xuansong Xie, Chunyan Miao

Abstract: Image inpainting aims to complete the missing or corrupted regions of images with realistic contents. The prevalent approaches adopt a hybrid objective of reconstruction and perceptual quality by using generative adversarial networks. However, the reconstruction loss and adversarial loss focus on synthesizing contents of different frequencies and simply applying them together often leads to inter-… ▽ More Image inpainting aims to complete the missing or corrupted regions of images with realistic contents. The prevalent approaches adopt a hybrid objective of reconstruction and perceptual quality by using generative adversarial networks. However, the reconstruction loss and adversarial loss focus on synthesizing contents of different frequencies and simply applying them together often leads to inter-frequency conflicts and compromised inpainting. This paper presents WaveFill, a wavelet-based inpainting network that decomposes images into multiple frequency bands and fills the missing regions in each frequency band separately and explicitly. WaveFill decomposes images by using discrete wavelet transform (DWT) that preserves spatial information naturally. It applies L1 reconstruction loss to the decomposed low-frequency bands and adversarial loss to high-frequency bands, hence effectively mitigate inter-frequency conflicts while completing images in spatial domain. To address the inpainting inconsistency in different frequency bands and fuse features with distinct statistics, we design a novel normalization scheme that aligns and fuses the multi-frequency features effectively. Extensive experiments over multiple datasets show that WaveFill achieves superior image inpainting qualitatively and quantitatively. △ Less

Submitted 23 July, 2021; originally announced July 2021.

Comments: 10 pages, 7 figures

arXiv:2107.01602 [pdf, other]

Graphical State Space Model

Authors: Shaolin Lü

Abstract: In this paper, a new framework, named as graphical state space model, is proposed for the real time optimal estimation of a class of nonlinear state space model. By discretizing this kind of system model as an equation which can not be solved by Extended Kalman filter, factor graph optimization can outperform Extended Kalman filter in some cases. A simple nonlinear example is given to demonstrate… ▽ More In this paper, a new framework, named as graphical state space model, is proposed for the real time optimal estimation of a class of nonlinear state space model. By discretizing this kind of system model as an equation which can not be solved by Extended Kalman filter, factor graph optimization can outperform Extended Kalman filter in some cases. A simple nonlinear example is given to demonstrate the efficiency of this framework. △ Less

Submitted 8 November, 2022; v1 submitted 4 July, 2021; originally announced July 2021.

arXiv:2106.04392 [pdf, other]

Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition

Authors: Yihong Dong, Ying Peng, Muqiao Yang, Songtao Lu, Qingjiang Shi

Abstract: Deep neural networks have been shown as a class of useful tools for addressing signal recognition issues in recent years, especially for identifying the nonlinear feature structures of signals. However, this power of most deep learning techniques heavily relies on an abundant amount of training data, so the performance of classic neural nets decreases sharply when the number of training data sampl… ▽ More Deep neural networks have been shown as a class of useful tools for addressing signal recognition issues in recent years, especially for identifying the nonlinear feature structures of signals. However, this power of most deep learning techniques heavily relies on an abundant amount of training data, so the performance of classic neural nets decreases sharply when the number of training data samples is small or unseen data are presented in the testing phase. This calls for an advanced strategy, i.e., model-agnostic meta-learning (MAML), which is able to capture the invariant representation of the data samples or signals. In this paper, inspired by the special structure of the signal, i.e., real and imaginary parts consisted in practical time-series signals, we propose a Complex-valued Attentional MEta Learner (CAMEL) for the problem of few-shot signal recognition by leveraging attention and meta-learning in the complex domain. To the best of our knowledge, this is also the first complex-valued MAML that can find the first-order stationary points of general nonconvex problems with theoretical convergence guarantees. Extensive experiments results showcase the superiority of the proposed CAMEL compared with the state-of-the-art methods. △ Less

Submitted 11 June, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

arXiv:2102.05011 [pdf, other]

Mars Image Content Classification: Three Years of NASA Deployment and Recent Advances

Authors: Kiri Wagstaff, Steven Lu, Emily Dunkel, Kevin Grimes, Brandon Zhao, Jesse Cai, Shoshanna B. Cole, Gary Doran, Raymond Francis, Jake Lee, Lukas Mandrake

Abstract: The NASA Planetary Data System hosts millions of images acquired from the planet Mars. To help users quickly find images of interest, we have developed and deployed content-based classification and search capabilities for Mars orbital and surface images. The deployed systems are publicly accessible using the PDS Image Atlas. We describe the process of training, evaluating, calibrating, and deployi… ▽ More The NASA Planetary Data System hosts millions of images acquired from the planet Mars. To help users quickly find images of interest, we have developed and deployed content-based classification and search capabilities for Mars orbital and surface images. The deployed systems are publicly accessible using the PDS Image Atlas. We describe the process of training, evaluating, calibrating, and deploying updates to two CNN classifiers for images collected by Mars missions. We also report on three years of deployment including usage statistics, lessons learned, and plans for the future. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: Published at the Thirty-Third Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-21). IAAI Innovative Application Award. 10 pages, 11 figures, 6 tables

arXiv:2102.04429 [pdf, other]

Federated Acoustic Modeling For Automatic Speech Recognition

Authors: Xiaodong Cui, Songtao Lu, Brian Kingsbury

Abstract: Data privacy and protection is a crucial issue for any automatic speech recognition (ASR) service provider when dealing with clients. In this paper, we investigate federated acoustic modeling using data from multiple clients. A client's data is stored on a local data server and the clients communicate only model parameters with a central server, and not their data. The communication happens infreq… ▽ More Data privacy and protection is a crucial issue for any automatic speech recognition (ASR) service provider when dealing with clients. In this paper, we investigate federated acoustic modeling using data from multiple clients. A client's data is stored on a local data server and the clients communicate only model parameters with a central server, and not their data. The communication happens infrequently to reduce the communication cost. To mitigate the non-iid issue, client adaptive federated training (CAFT) is proposed to canonicalize data across clients. The experiments are carried out on 1,150 hours of speech data from multiple domains. Hybrid LSTM acoustic models are trained via federated learning and their performance is compared to traditional centralized acoustic model training. The experimental results demonstrate the effectiveness of the proposed federated acoustic modeling strategy. We also show that CAFT can further improve the performance of the federated acoustic model. △ Less

Submitted 8 February, 2021; originally announced February 2021.

Comments: Accepted by ICASSP 2021

arXiv:2009.09406 [pdf, ps, other]

Learning-Based Massive Beamforming

Authors: Siyuan Lu, Shengjie Zhao, Qingjiang Shi

Abstract: Develo** resource allocation algorithms with strong real-time and high efficiency has been an imperative topic in wireless networks. Conventional optimization-based iterative resource allocation algorithms often suffer from slow convergence, especially for massive multiple-input-multiple-output (MIMO) beamforming problems. This paper studies learning-based efficient massive beamforming methods f… ▽ More Develo** resource allocation algorithms with strong real-time and high efficiency has been an imperative topic in wireless networks. Conventional optimization-based iterative resource allocation algorithms often suffer from slow convergence, especially for massive multiple-input-multiple-output (MIMO) beamforming problems. This paper studies learning-based efficient massive beamforming methods for multi-user MIMO networks. The considered massive beamforming problem is challenging in two aspects. First, the beamforming matrix to be learned is quite high-dimensional in case with a massive number of antennas. Second, the objective is often time-varying and the solution space is not fixed due to some communication requirements. All these challenges make learning representation for massive beamforming an extremely difficult task. In this paper, by exploiting the structure of the most popular WMMSE beamforming solution, we propose convolutional massive beamforming neural networks (CMBNN) using both supervised and unsupervised learning schemes with particular design of network structure and input/output. Numerical results demonstrate the efficacy of the proposed CMBNN in terms of running time and system throughput. △ Less

Submitted 20 September, 2020; originally announced September 2020.

arXiv:2009.08605 [pdf, other]

Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer

Authors: Siyuan Lu, Meiqi Wang, Shuang Liang, Jun Lin, Zhongfeng Wang

Abstract: Designing hardware accelerators for deep neural networks (DNNs) has been much desired. Nonetheless, most of these existing accelerators are built for either convolutional neural networks (CNNs) or recurrent neural networks (RNNs). Recently, the Transformer model is replacing the RNN in the natural language processing (NLP) area. However, because of intensive matrix computations and complicated dat… ▽ More Designing hardware accelerators for deep neural networks (DNNs) has been much desired. Nonetheless, most of these existing accelerators are built for either convolutional neural networks (CNNs) or recurrent neural networks (RNNs). Recently, the Transformer model is replacing the RNN in the natural language processing (NLP) area. However, because of intensive matrix computations and complicated data flow being involved, the hardware design for the Transformer model has never been reported. In this paper, we propose the first hardware accelerator for two key components, i.e., the multi-head attention (MHA) ResBlock and the position-wise feed-forward network (FFN) ResBlock, which are the two most complex layers in the Transformer. Firstly, an efficient method is introduced to partition the huge matrices in the Transformer, allowing the two ResBlocks to share most of the hardware resources. Secondly, the computation flow is well designed to ensure the high hardware utilization of the systolic array, which is the biggest module in our design. Thirdly, complicated nonlinear functions are highly optimized to further reduce the hardware complexity and also the latency of the entire system. Our design is coded using hardware description language (HDL) and evaluated on a Xilinx FPGA. Compared with the implementation on GPU with the same setting, the proposed design demonstrates a speed-up of 14.6x in the MHA ResBlock, and 3.4x in the FFN ResBlock, respectively. Therefore, this work lays a good foundation for building efficient hardware accelerators for multiple Transformer networks. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: 6 pages, 8 figures. This work has been accepted by IEEE SOCC (System-on-chip Conference) 2020, and peresnted by Siyuan Lu in SOCC2020. It also received the Best Paper Award in the Methdology Track in this conference

arXiv:2006.06682 [pdf, other]

Power System Disturbance Classification with Online Event-Driven Neuromorphic Computing

Authors: Kaveri Mahapatra, Sen Lu, Abhronil Sengupta, Nilanjan Ray Chaudhuri

Abstract: Accurate online classification of disturbance events in a transmission network is an important part of wide-area monitoring. Although many conventional machine learning techniques are very successful in classifying events, they rely on extracting information from PMU data at control centers and processing them through CPU/GPUs, which are highly inefficient in terms of energy consumption. To solve… ▽ More Accurate online classification of disturbance events in a transmission network is an important part of wide-area monitoring. Although many conventional machine learning techniques are very successful in classifying events, they rely on extracting information from PMU data at control centers and processing them through CPU/GPUs, which are highly inefficient in terms of energy consumption. To solve this challenge without compromising accuracy, this paper presents a novel methodology based on event-driven neuromorphic computing architecture for classification of power system disturbances. A Spiking Neural Network (SNN)-based computing framework is proposed, which exploits sparsity in disturbances and promotes local event driven operation for unsupervised learning and inference from incoming data. Spatio-temporal information of PMU signals is first extracted and encoded into spike trains and classification is achieved with SNN-based supervised and unsupervised learning framework. Moreover, a QR decomposition-based selection technique is proposed to identify signals participating in the low rank subspace of multiple disturbance events. Performance of the proposed method is validated on data collected from a 16-machine, 5-area New England-New York system. △ Less

Submitted 15 December, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: 11 pages, 8 figures. Paper has been accepted for publication in IEEE Transactions on Smart Grid on 26th of November, 2020. The link is as follows. https://ieeexplore.ieee.org/document/9290393/authors#authors

arXiv:2005.10053 [pdf, other]

Map Generation from Large Scale Incomplete and Inaccurate Data Labels

Authors: Rui Zhang, Conrad Albrecht, Wei Zhang, Xiaodong Cui, Ulrich Finkler, David Kung, Siyuan Lu

Abstract: Accurately and globally map** human infrastructure is an important and challenging task with applications in routing, regulation compliance monitoring, and natural disaster response management etc.. In this paper we present progress in develo** an algorithmic pipeline and distributed compute system that automates the process of map creation using high resolution aerial images. Unlike previous… ▽ More Accurately and globally map** human infrastructure is an important and challenging task with applications in routing, regulation compliance monitoring, and natural disaster response management etc.. In this paper we present progress in develo** an algorithmic pipeline and distributed compute system that automates the process of map creation using high resolution aerial images. Unlike previous studies, most of which use datasets that are available only in a few cities across the world, we utilizes publicly available imagery and map data, both of which cover the contiguous United States (CONUS). We approach the technical challenge of inaccurate and incomplete training data adopting state-of-the-art convolutional neural network architectures such as the U-Net and the CycleGAN to incrementally generate maps with increasingly more accurate and more complete labels of man-made infrastructure such as roads and houses. Since scaling the map** task to CONUS calls for parallelization, we then adopted an asynchronous distributed stochastic parallel gradient descent training scheme to distribute the computational workload onto a cluster of GPUs with nearly linear speed-up. △ Less

Submitted 20 May, 2020; originally announced May 2020.

Comments: This paper is accepted by KDD 2020

ACM Class: I.2.10

arXiv:2003.03027 [pdf]

Deep Phase Shifter for Quantitative Phase Imaging

Authors: Qinnan Zhang, Shengyu Lu, Jiaosheng Li, Wenjie Li, Dong Li, Xiaoxu Lu, Liyun Zhong, **dong Tian

Abstract: A single intensity-only holographic interferogram can records the full amplitude and phase information of optical field. However, current digital holography technologies cannot recover the lossless phase information from a single interferogram. In this paper, we provide an entirely new approach for the full-field quantitative phase imaging technology. We demonstrate that deep learning can be used… ▽ More A single intensity-only holographic interferogram can records the full amplitude and phase information of optical field. However, current digital holography technologies cannot recover the lossless phase information from a single interferogram. In this paper, we provide an entirely new approach for the full-field quantitative phase imaging technology. We demonstrate that deep learning can be used to replace the entitative phase shifter, and quantitative phase imaging can obtain quantitative phase from a single interferogram in in-line holography. A deep-phase-shift network (DPS-net) is reported, which can be trained with simulation training data. The trained DPS-net can be used to generate multiple interferograms with arbitrary phase shift from a single interferogram as an artificial intelligence phase shifter. The ability and the accuracy of generating arbitrary phase shifts are verified, and the performance of the proposed method is also verified by the experimental interferogram. The results demonstrate that the proposed method can provide a full digital phase shifter with high-accuracy for the technology of dynamic quantitative phase measurement. △ Less

Submitted 5 March, 2020; originally announced March 2020.

arXiv:1910.05857 [pdf, ps, other]

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: A Joint Gradient Estimation and Tracking Approach

Authors: Haoran Sun, Songtao Lu, Mingyi Hong

Abstract: Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks. In this work, we propose a decentra… ▽ More Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks. In this work, we propose a decentralized stochastic algorithm to deal with certain smooth non-convex problems where there are $m$ nodes in the system, and each node has a large number of samples (denoted as $n$). Differently from the majority of the existing decentralized learning algorithms for either stochastic or finite-sum problems, our focus is given to both reducing the total communication rounds among the nodes, while accessing the minimum number of local data samples. In particular, we propose an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) and gradient tracking (which tracks the global full gradient using local estimates). We show that, to achieve certain $ε$ stationary solution of the deterministic finite sum problem, the proposed algorithm achieves an $\mathcal{O}(mn^{1/2}ε^{-1})$ sample complexity and an $\mathcal{O}(ε^{-1})$ communication complexity. These bounds significantly improve upon the best existing bounds of $\mathcal{O}(mnε^{-1})$ and $\mathcal{O}(ε^{-1})$, respectively. Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m ε^{-3/2})$ sample complexity and an $\mathcal{O}(ε^{-1})$ communication complexity, while the best existing bounds are $\mathcal{O}(mε^{-2})$ and $\mathcal{O}(ε^{-2})$, respectively. △ Less

Submitted 13 October, 2019; originally announced October 2019.

Journal ref: Published at the International Conference on Machine Learning (ICML 2020)

arXiv:1908.05418 [pdf, other]

Multimodal Volume-Aware Detection and Segmentation for Brain Metastases Radiosurgery

Authors: Szu-Yeu Hu, Wei-Hung Weng, Shao-Lun Lu, Yueh-Hung Cheng, Furen Xiao, Feng-Ming Hsu, Jen-Tang Lu

Abstract: Stereotactic radiosurgery (SRS), which delivers high doses of irradiation in a single or few shots to small targets, has been a standard of care for brain metastases. While very effective, SRS currently requires manually intensive delineation of tumors. In this work, we present a deep learning approach for automated detection and segmentation of brain metastases using multimodal imaging and ensemb… ▽ More Stereotactic radiosurgery (SRS), which delivers high doses of irradiation in a single or few shots to small targets, has been a standard of care for brain metastases. While very effective, SRS currently requires manually intensive delineation of tumors. In this work, we present a deep learning approach for automated detection and segmentation of brain metastases using multimodal imaging and ensemble neural networks. In order to address small and multiple brain metastases, we further propose a volume-aware Dice loss which optimizes model performance using the information of lesion size. This work surpasses current benchmark levels and demonstrates a reliable AI-assisted system for SRS treatment planning for multiple brain metastases. △ Less

Submitted 15 August, 2019; originally announced August 2019.

Comments: Accepted to 2019 MICCAI AIRT

arXiv:1907.05598 [pdf, other]

Coupled-Projection Residual Network for MRI Super-Resolution

Authors: Chun-Mei Feng, Kai Wang, Shijian Lu, Yong Xu, Heng Kong, Ling Shao

Abstract: Magnetic Resonance Imaging(MRI) has been widely used in clinical application and pathology research by hel** doctors make more accurate diagnoses. On the other hand, accurate diagnosis by MRI remains a great challenge as images obtained via present MRI techniques usually have low resolutions. Improving MRI image quality and resolution thus becomes a critically important task. This paper presents… ▽ More Magnetic Resonance Imaging(MRI) has been widely used in clinical application and pathology research by hel** doctors make more accurate diagnoses. On the other hand, accurate diagnosis by MRI remains a great challenge as images obtained via present MRI techniques usually have low resolutions. Improving MRI image quality and resolution thus becomes a critically important task. This paper presents an innovative Coupled-Projection Residual Network (CPRN) for MRI super-resolution. The CPRN consists of two complementary sub-networks: a shallow network and a deep network that keep the content consistency while learning high frequency differences between low-resolution and high-resolution images. The shallow sub-network employs coupled-projection for better retaining the MRI image details, where a novel feedback mechanism is introduced to guide the reconstruction of high-resolution images. The deep sub-network learns from the residuals of the high-frequency image information, where multiple residual blocks are cascaded to magnify the MRI images at the last network layer. Finally, the features from the shallow and deep sub-networks are fused for the reconstruction of high-resolution MRI images. For effective fusion of features from the deep and shallow sub-networks, a step-wise connection (CPRN S) is designed as inspired by the human cognitive processes (from simple to complex). Experiments over three public MRI datasets show that our proposed CPRN achieves superior MRI super-resolution performance as compared with the state-of-the-art. Our source code will be publicly available at http://www.yongxu.org/lunwen.html. △ Less

Submitted 12 July, 2019; originally announced July 2019.

Comments: Our source code will be publicly available at http://www.yongxu.org/lunwen.html

arXiv:1906.01864 [pdf, other]

OpenEI: An Open Framework for Edge Intelligence

Authors: Xingzhou Zhang, Yifan Wang, Sidi Lu, Liangkai Liu, Lanyu Xu, Weisong Shi

Abstract: In the last five years, edge computing has attracted tremendous attention from industry and academia due to its promise to reduce latency, save bandwidth, improve availability, and protect data privacy to keep data secure. At the same time, we have witnessed the proliferation of AI algorithms and models which accelerate the successful deployment of intelligence mainly in cloud services. These two… ▽ More In the last five years, edge computing has attracted tremendous attention from industry and academia due to its promise to reduce latency, save bandwidth, improve availability, and protect data privacy to keep data secure. At the same time, we have witnessed the proliferation of AI algorithms and models which accelerate the successful deployment of intelligence mainly in cloud services. These two trends, combined together, have created a new horizon: Edge Intelligence (EI). The development of EI requires much attention from both the computer systems research community and the AI community to meet these demands. However, existing computing techniques used in the cloud are not applicable to edge computing directly due to the diversity of computing sources and the distribution of data sources. We envision that there missing a framework that can be rapidly deployed on edge and enable edge AI capabilities. To address this challenge, in this paper we first present the definition and a systematic review of EI. Then, we introduce an Open Framework for Edge Intelligence (OpenEI), which is a lightweight software platform to equip edges with intelligent processing and data sharing capability. We analyze four fundamental EI techniques which are used to build OpenEI and identify several open problems based on potential research directions. Finally, four typical application scenarios enabled by OpenEI are presented. △ Less

Submitted 5 June, 2019; originally announced June 2019.

Comments: 12 pages, 6 figures, ICDCS 2019 conference

arXiv:1905.03175 [pdf, other]

doi 10.1109/ACCESS.2019.2937680

A Hardware-Oriented and Memory-Efficient Method for CTC Decoding

Authors: Siyuan Lu, **ming Lu, Jun Lin, Zhongfeng Wang

Abstract: The Connectionist Temporal Classification (CTC) has achieved great success in sequence to sequence analysis tasks such as automatic speech recognition (ASR) and scene text recognition (STR). These applications can use the CTC objective function to train the recurrent neural networks (RNNs), and decode the outputs of RNNs during inference. While hardware architectures for RNNs have been studied, ha… ▽ More The Connectionist Temporal Classification (CTC) has achieved great success in sequence to sequence analysis tasks such as automatic speech recognition (ASR) and scene text recognition (STR). These applications can use the CTC objective function to train the recurrent neural networks (RNNs), and decode the outputs of RNNs during inference. While hardware architectures for RNNs have been studied, hardware-based CTCdecoders are desired for high-speed CTC-based inference systems. This paper, for the first time, provides a low-complexity and memory-efficient approach to build a CTC-decoder based on the beam search decoding. Firstly, we improve the beam search decoding algorithm to save the storage space. Secondly, we compress a dictionary (reduced from 26.02MB to 1.12MB) and use it as the language model. Meanwhile searching this dictionary is trivial. Finally, a fixed-point CTC-decoder for an English ASR and an STR task using the proposed method is implemented with C++ language. It is shown that the proposed method has little precision loss compared with its floating-point counterpart. Our experiments demonstrate the compression ratio of the storage required by the proposed beam search decoding algorithm are 29.49 (ASR) and 17.95 (STR). △ Less

Submitted 8 May, 2019; originally announced May 2019.

Comments: 13 pages, 11 figures

Journal ref: IEEE Access, vol. 7, pp. 120681-120694, 2019

Showing 1–50 of 56 results for author: Lu, S