Search | arXiv e-print repository

Activating the Flexibility in Distribution systems via a Unified Quantification Approach

Authors: Yilin Wen, Yi Guo, Zechun Hu, Gabriela Hug

Abstract: Effective quantification of the cost of flexibility and its contribution to the system is essential for activating the flexibility of the numerous distributed energy resources (DERs) in power systems operation. We first propose a unified flexibility cost quantification method for heterogonous DERs based on the adjustment ranges in both power and accumulated energy consumption. Compared with tradit… ▽ More Effective quantification of the cost of flexibility and its contribution to the system is essential for activating the flexibility of the numerous distributed energy resources (DERs) in power systems operation. We first propose a unified flexibility cost quantification method for heterogonous DERs based on the adjustment ranges in both power and accumulated energy consumption. Compared with traditional power range-based approaches, the proposed method can reflect the potential inconvenience caused to different DER users by flexibility activation and directly capture the time-dependent characteristics of DER's flexibility regions. Extending to the aggregator level, we then propose a flexibility cost formulation aligned with the aggregated flexibility model. On this basis, a model for distribution system operators (DSOs) is proposed to coordinate the DER aggregators' participation in the transmission system operation for energy arbitrage and ancillary services provision. This model derives a method for quantifying the contribution of each aggregator's activated flexibility region to the DSO's revenue from the transmission system, which is used to allocate the revenue to each aggregator while ensuring that the DSO is non-profit. Numerical tests on the IEEE 33-node distribution system validate the proposed methods. △ Less

Submitted 22 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.01167 [pdf, other]

doi 10.1109/TPWRS.2024.3415652

Multiple Joint Chance Constraints Approximation for Uncertainty Modeling in Dispatch Problems

Authors: Yilin Wen, Yi Guo, Zechun Hu, Gabriela Hug

Abstract: Uncertainty modeling has become increasingly important in power system decision-making. The widely-used tractable uncertainty modeling method-chance constraints with Conditional Value at Risk (CVaR) approximation, can be overconservative and even turn an originally feasible problem into an infeasible one. This paper proposes a new approximation method for multiple joint chance constraints (JCCs) t… ▽ More Uncertainty modeling has become increasingly important in power system decision-making. The widely-used tractable uncertainty modeling method-chance constraints with Conditional Value at Risk (CVaR) approximation, can be overconservative and even turn an originally feasible problem into an infeasible one. This paper proposes a new approximation method for multiple joint chance constraints (JCCs) to model the uncertainty in dispatch problems, which solves the conservativeness and potential infeasibility concerns of CVaR. The proposed method is also convenient for controlling the risk levels of different JCCs, which is necessary for power system applications since different resources may be affected by varying degrees of uncertainty or have different importance to the system. We then formulate a data-driven distributionally robust chance-constrained programming model for the power system multiperiod dispatch problem and leverage the proposed approximation method to solve it. In the numerical simulations, two small general examples clearly demonstrate the superiority of the proposed method, and the results of the multiperiod dispatch problem on IEEE test cases verify its practicality. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2312.13654 [pdf, other]

Free Space Optical Integrated Sensing and Communication Based on DCO-OFDM: Performance Metrics and Resource Allocation

Authors: Yunfeng Wen, Fang Yang, Jian Song, Zhu Han

Abstract: As one of the six usage scenarios of the sixth generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered considerable attention, and numerous studies have been conducted on radio-frequency (RF)-ISAC. Benefitting from the communication and sensing capabilities of an optical system, free space optical (FSO)-ISAC becomes a potential complement to RF-ISAC. I… ▽ More As one of the six usage scenarios of the sixth generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered considerable attention, and numerous studies have been conducted on radio-frequency (RF)-ISAC. Benefitting from the communication and sensing capabilities of an optical system, free space optical (FSO)-ISAC becomes a potential complement to RF-ISAC. In this paper, a direct-current-biased optical orthogonal frequency division multiplexing (DCO-OFDM) scheme is proposed for FSO-ISAC. To derive the spectral efficiency for communication and the Fisher information for sensing as performance metrics, we model the clip** noise of DCO-OFDM as additive colored Gaussian noise to obtain the expression of the signal-to-noise ratio. Based on the derived performance metrics, joint power allocation problems are formulated for both communication-centric and sensing-centric scenarios. In addition, the non-convex joint optimization problems are decomposed into sub-problems for DC bias and subcarriers, which can be solved by block coordinate descent algorithms. Furthermore, numerical simulations demonstrate the proposed algorithms and reveal the trade-off between communication and sensing functionalities of the OFDM-based FSO-ISAC system. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 13 pages, 8 figures

arXiv:2312.13640 [pdf, other]

Optical Integrated Sensing and Communication: Architectures, Potentials and Challenges

Authors: Yunfeng Wen, Fang Yang, Jian Song, Zhu Han

Abstract: Integrated sensing and communication (ISAC) is viewed as a crucial component of future mobile networks and has gained much interest in both academia and industry. Similar to the emergence of radio-frequency (RF) ISAC, the integration of free space optical communication and optical sensing yields optical ISAC (O-ISAC), which is regarded as a powerful complement to its RF counterpart. In this articl… ▽ More Integrated sensing and communication (ISAC) is viewed as a crucial component of future mobile networks and has gained much interest in both academia and industry. Similar to the emergence of radio-frequency (RF) ISAC, the integration of free space optical communication and optical sensing yields optical ISAC (O-ISAC), which is regarded as a powerful complement to its RF counterpart. In this article, we first introduce the generalized system structure of O-ISAC, and then elaborate on three advantages of O-ISAC, i.e., increasing communication rate, enhancing sensing precision, and reducing interference. Next, waveform design and resource allocation of O-ISAC are discussed based on pulsed waveform, constant-modulus waveform, and multi-carrier waveform. Furthermore, we put forward future trends and challenges of O-ISAC, which are expected to provide some valuable directions for future research. △ Less

Submitted 10 March, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: 7 pages, 5 figures

arXiv:2311.08667 [pdf, other]

EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis

Authors: Ge Zhu, Yutong Wen, Marc-André Carbonneau, Zhiyao Duan

Abstract: Audio diffusion models can synthesize a wide variety of sounds. Existing models often operate on the latent domain with cascaded phase recovery modules to reconstruct waveform. This poses challenges when generating high-fidelity audio. In this paper, we propose EDMSound, a diffusion-based generative model in spectrogram domain under the framework of elucidated diffusion models (EDM). Combining wit… ▽ More Audio diffusion models can synthesize a wide variety of sounds. Existing models often operate on the latent domain with cascaded phase recovery modules to reconstruct waveform. This poses challenges when generating high-fidelity audio. In this paper, we propose EDMSound, a diffusion-based generative model in spectrogram domain under the framework of elucidated diffusion models (EDM). Combining with efficient deterministic sampler, we achieved similar Fréchet audio distance (FAD) score as top-ranked baseline with only 10 steps and reached state-of-the-art performance with 50 steps on the DCASE2023 foley sound generation benchmark. We also revealed a potential concern regarding diffusion based audio generation models that they tend to generate samples with high perceptual similarity to the data from training data. Project page: https://agentcooper2002.github.io/EDMSound/ △ Less

Submitted 18 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: Accepted at NeurIPS Workshop: Machine Learning for Audio (Camera Ready)

arXiv:2309.04156 [pdf, other]

Cross-Utterance Conditioned VAE for Speech Generation

Authors: Yang Li, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian, Ying Wen, Wei Pan, Chao Zhang, Jun Wang, Yang Yang, Fanglei Sun

Abstract: Speech synthesis systems powered by neural networks hold promise for multimedia production, but frequently face issues with producing expressive speech and seamless editing. In response, we present the Cross-Utterance Conditioned Variational Autoencoder speech synthesis (CUC-VAE S2) framework to enhance prosody and ensure natural speech generation. This framework leverages the powerful representat… ▽ More Speech synthesis systems powered by neural networks hold promise for multimedia production, but frequently face issues with producing expressive speech and seamless editing. In response, we present the Cross-Utterance Conditioned Variational Autoencoder speech synthesis (CUC-VAE S2) framework to enhance prosody and ensure natural speech generation. This framework leverages the powerful representational capabilities of pre-trained language models and the re-expression abilities of variational autoencoders (VAEs). The core component of the CUC-VAE S2 framework is the cross-utterance CVAE, which extracts acoustic, speaker, and textual features from surrounding sentences to generate context-sensitive prosodic features, more accurately emulating human prosody generation. We further propose two practical algorithms tailored for distinct speech synthesis applications: CUC-VAE TTS for text-to-speech and CUC-VAE SE for speech editing. The CUC-VAE TTS is a direct application of the framework, designed to generate audio with contextual prosody derived from surrounding texts. On the other hand, the CUC-VAE SE algorithm leverages real mel spectrogram sampling conditioned on contextual information, producing audio that closely mirrors real sound and thereby facilitating flexible speech editing based on text such as deletion, insertion, and replacement. Experimental results on the LibriTTS datasets demonstrate that our proposed models significantly enhance speech synthesis and editing, producing more natural and expressive speech. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 13 pages;

arXiv:2309.03905 [pdf, other]

ImageBind-LLM: Multi-modality Instruction Tuning

Authors: Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao

Abstract: We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training… ▽ More We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder. Then, the image features transformed by the bind network are added to word tokens of all layers in LLaMA, which progressively injects visual instructions via an attention-free and zero-initialized gating mechanism. Aided by the joint embedding of ImageBind, the simple image-text training enables our model to exhibit superior multi-modality instruction-following capabilities. During inference, the multi-modality inputs are fed into the corresponding ImageBind encoders, and processed by a proposed visual cache model for further cross-modal embedding enhancement. The training-free cache model retrieves from three million image features extracted by ImageBind, which effectively mitigates the training-inference modality discrepancy. Notably, with our approach, ImageBind-LLM can respond to instructions of diverse modalities and demonstrate significant language generation quality. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter. △ Less

Submitted 11 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: Code is available at https://github.com/OpenGVLab/LLaMA-Adapter

arXiv:2308.12749 [pdf, other]

Block-Level Interference Exploitation Precoding for MU-MISO: An ADMM Approach

Authors: Yiran Wang, Yunsi Wen, Ang Li, Xiaoyan Hu, Christos Masouros

Abstract: We study constructive interference based block-level precoding (CI-BLP) in the downlink of multi-user multiple-input single-output (MU-MISO) systems. Specifically, our aim is to extend the analysis on CI-BLP to the case where the considered number of symbol slots is smaller than that of the users. To this end, we mathematically prove the feasibility of using the pseudo-inverse to obtain the optima… ▽ More We study constructive interference based block-level precoding (CI-BLP) in the downlink of multi-user multiple-input single-output (MU-MISO) systems. Specifically, our aim is to extend the analysis on CI-BLP to the case where the considered number of symbol slots is smaller than that of the users. To this end, we mathematically prove the feasibility of using the pseudo-inverse to obtain the optimal CI-BLP precoding matrix in a closed form. Similar to the case when the number of users is small, we show that a quadratic programming (QP) optimization on simplex can be constructed. We also design a low-complexity algorithm based on the alternating direction method of multipliers (ADMM) framework, which can efficiently solve large-scale QP problems. We further analyze the convergence and complexity of the proposed algorithm. Numerical results validate our analysis and the optimality of the proposed algorithm, and further show that the proposed algorithm offers a flexible performance-complexity tradeoff by limiting the maximum number of iterations, which motivates the use of CI-BLP in practical wireless systems. △ Less

Submitted 30 August, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2307.14547 [pdf, other]

Mitigating Cross-Database Differences for Learning Unified HRTF Representation

Authors: Yutong Wen, You Zhang, Zhiyao Duan

Abstract: Individualized head-related transfer functions (HRTFs) are crucial for accurate sound positioning in virtual auditory displays. As the acoustic measurement of HRTFs is resource-intensive, predicting individualized HRTFs using machine learning models is a promising approach at scale. Training such models require a unified HRTF representation across multiple databases to utilize their respectively l… ▽ More Individualized head-related transfer functions (HRTFs) are crucial for accurate sound positioning in virtual auditory displays. As the acoustic measurement of HRTFs is resource-intensive, predicting individualized HRTFs using machine learning models is a promising approach at scale. Training such models require a unified HRTF representation across multiple databases to utilize their respectively limited samples. However, in addition to differences on the spatial sampling locations, recent studies have shown that, even for the common location, HRTFs across databases manifest consistent differences that make it trivial to tell which databases they come from. This poses a significant challenge for learning a unified HRTF representation across databases. In this work, we first identify the possible causes of these cross-database differences, attributing them to variations in the measurement setup. Then, we propose a novel approach to normalize the frequency responses of HRTFs across databases. We show that HRTFs from different databases cannot be classified by their database after normalization. We further show that these normalized HRTFs can be used to learn a more unified HRTF representation across databases than the prior art. We believe that this normalization approach paves the road to many data-intensive tasks on HRTF modeling. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: 5 pages, 4 figures, accepted by IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023

arXiv:2307.13953 [pdf, other]

The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features

Authors: Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj

Abstract: This work unveils the enigmatic link between phonemes and facial features. Traditional studies on voice-face correlations typically involve using a long period of voice input, including generating face images from voices and reconstructing 3D face meshes from voices. However, in situations like voice-based crimes, the available voice evidence may be short and limited. Additionally, from a physiolo… ▽ More This work unveils the enigmatic link between phonemes and facial features. Traditional studies on voice-face correlations typically involve using a long period of voice input, including generating face images from voices and reconstructing 3D face meshes from voices. However, in situations like voice-based crimes, the available voice evidence may be short and limited. Additionally, from a physiological perspective, each segment of speech -- phoneme -- corresponds to different types of airflow and movements in the face. Therefore, it is advantageous to discover the hidden link between phonemes and face attributes. In this paper, we propose an analysis pipeline to help us explore the voice-face relationship in a fine-grained manner, i.e., phonemes v.s. facial anthropometric measurements (AM). We build an estimator for each phoneme-AM pair and evaluate the correlation through hypothesis testing. Our results indicate that AMs are more predictable from vowels compared to consonants, particularly with plosives. Additionally, we observe that if a specific AM exhibits more movement during phoneme pronunciation, it is more predictable. Our findings support those in physiology regarding correlation and lay the groundwork for future research on speech-face multimodal learning. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: Interspeech 2023

arXiv:2307.13948 [pdf, other]

Rethinking Voice-Face Correlation: A Geometry View

Authors: Xiang Li, Yandong Wen, Muqiao Yang, **glu Wang, Rita Singh, Bhiksha Raj

Abstract: Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion. In this paper, we aim to investigate the capability of reconstructing the 3D facial shape from voice from a geometry perspective without any semantic information. We propose a voice-anthropometric mea… ▽ More Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion. In this paper, we aim to investigate the capability of reconstructing the 3D facial shape from voice from a geometry perspective without any semantic information. We propose a voice-anthropometric measurement (AM)-face paradigm, which identifies predictable facial AMs from the voice and uses them to guide 3D face reconstruction. By leveraging AMs as a proxy to link the voice and face geometry, we can eliminate the influence of unpredictable AMs and make the face geometry tractable. Our approach is evaluated on our proposed dataset with ground-truth 3D face scans and corresponding voice recordings, and we find significant correlations between voice and specific parts of the face geometry, such as the nasal cavity and cranium. Our work offers a new perspective on voice-face correlation and can serve as a good empirical study for anthropometry science. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: ACM Multimedia 2023

arXiv:2303.01691 [pdf, other]

doi 10.1109/TSG.2023.3340157

Improved Inner Approximation for Aggregating Power Flexibility in Active Distribution Networks and its Applications

Authors: Yilin Wen, Zechun Hu, **hua He, Yi Guo

Abstract: Concise and reliable modeling for aggregating power flexibility of distributed energy resources in active distribution networks (ADNs) is a crucial technique for coordinating transmission and distribution networks. Our recent research has successfully derived an explicit expression for the exact aggregation model (EAM) of power flexibility at the substation level under linearized distribution netw… ▽ More Concise and reliable modeling for aggregating power flexibility of distributed energy resources in active distribution networks (ADNs) is a crucial technique for coordinating transmission and distribution networks. Our recent research has successfully derived an explicit expression for the exact aggregation model (EAM) of power flexibility at the substation level under linearized distribution network constraints. The EAM, however, is impractical for decision-making purposes due to its exponential complexity. In this paper, we propose an inner approximation method for aggregating flexibility in ADNs that utilizes the properties of the EAM to improve performance. Specifically, the geometric prototype of the inner approximation model is defined according to a subset of the coefficient vector set of the EAM, which enhances the accuracy. On the other hand, the computation efficiency of the inner approximation is also significantly improved by exploiting the regularity of coefficient vectors in the EAM in the parameter calculation process. The inner approximated flexibility model of ADNs is further incorporated into the security-constrained unit commitment problem as an application. Numerical simulations verify the effectiveness of the proposed method. △ Less

Submitted 24 June, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: 10 pages

arXiv:2301.06244 [pdf, other]

Haptic Transparency and Interaction Force Control for a Lower-Limb Exoskeleton

Authors: Emek Barış Küçüktabak, Yue Wen, Sangjoon J. Kim, Matthew Short, Daniel Ludvig, Levi Hargrove, Eric Perreault, Kevin Lynch, Jose Pons

Abstract: Controlling the interaction forces between a human and an exoskeleton is crucial for providing transparency or adjusting assistance or resistance levels. However, it is an open problem to control the interaction forces of lower-limb exoskeletons designed for unrestricted overground walking. For these types of exoskeletons, it is challenging to implement force/torque sensors at every contact betwee… ▽ More Controlling the interaction forces between a human and an exoskeleton is crucial for providing transparency or adjusting assistance or resistance levels. However, it is an open problem to control the interaction forces of lower-limb exoskeletons designed for unrestricted overground walking. For these types of exoskeletons, it is challenging to implement force/torque sensors at every contact between the user and the exoskeleton for direct force measurement. Moreover, it is important to compensate for the exoskeleton's whole-body gravitational and dynamical forces, especially for heavy lower-limb exoskeletons. Previous works either simplified the dynamic model by treating the legs as independent double pendulums, or they did not close the loop with interaction force feedback. The proposed whole-exoskeleton closed-loop compensation (WECC) method calculates the interaction torques during the complete gait cycle by using whole-body dynamics and joint torque measurements on a hip-knee exoskeleton. Furthermore, it uses a constrained optimization scheme to track desired interaction torques in a closed loop while considering physical and safety constraints. We evaluated the haptic transparency and dynamic interaction torque tracking of WECC control on three subjects. We also compared the performance of WECC with a controller based on a simplified dynamic model and a passive version of the exoskeleton. The WECC controller results in a consistently low absolute interaction torque error during the whole gait cycle for both zero and nonzero desired interaction torques. In contrast, the simplified controller yields poor performance in tracking desired interaction torques during the stance phase. △ Less

Submitted 22 January, 2024; v1 submitted 15 January, 2023; originally announced January 2023.

Comments: 19 pages, 13 figures. Accepted for publication in the IEEE Transactions on Robotics (T-RO)

arXiv:2301.02348 [pdf, other]

High-Speed High-Accuracy Spatial Curve Tracking Using Motion Primitives in Industrial Robots

Authors: Honglu He, Chen-lung Lu, Yunshi Wen, Glenn Saunders, **hai Yang, Jeffrey Schoonover, Agung Julius, John T. Wen

Abstract: Industrial robots are increasingly deployed in applications requiring an end effector tool to closely track a specified path, such as in spraying and welding. Performance and productivity present possibly conflicting objectives: tracking accuracy, path speed, and motion uniformity. Industrial robots are programmed through motion primitives consisting of waypoints connected by pre-defined motion se… ▽ More Industrial robots are increasingly deployed in applications requiring an end effector tool to closely track a specified path, such as in spraying and welding. Performance and productivity present possibly conflicting objectives: tracking accuracy, path speed, and motion uniformity. Industrial robots are programmed through motion primitives consisting of waypoints connected by pre-defined motion segments, with specified parameters such as path speed and blending zone. The actual executed robot motion depends on the robot joint servo controller and joint motion constraints (velocity, acceleration, etc.) which are largely unknown to the users. Programming a robot to achieve the desired performance today is time-consuming and mostly manual, requiring tuning a large number of coupled parameters in the motion primitives. The performance also depends on the choice of additional parameters: possible redundant degrees of freedom, location of the target curve, and the robot configuration. This paper presents a systematic approach to optimize the robot motion primitives for performance. The approach first selects the static parameters, then the motion primitives, and finally iteratively update the waypoints to minimize the tracking error. The ultimate performance objective is to maximize the path speed subject to the tracking accuracy and speed uniformity constraints over the entire path. We have demonstrated the effectiveness of this approach in simulation for ABB and FANUC robots for two challenging example curves, and experimentally for an ABB robot. Comparing with the baseline using the current industry practice, the optimized performance shows over 200% performance improvement. △ Less

Submitted 5 January, 2023; originally announced January 2023.

arXiv:2205.10750 [pdf, other]

Multi-Agent Feedback Enabled Neural Networks for Intelligent Communications

Authors: Fanglei Sun, Yang Li, Ying Wen, **gchen Hu, Jun Wang, Yang Yang, Kai Li

Abstract: In the intelligent communication field, deep learning (DL) has attracted much attention due to its strong fitting ability and data-driven learning capability. Compared with the typical DL feedforward network structures, an enhancement structure with direct data feedback have been studied and proved to have better performance than the feedfoward networks. However, due to the above simple feedback m… ▽ More In the intelligent communication field, deep learning (DL) has attracted much attention due to its strong fitting ability and data-driven learning capability. Compared with the typical DL feedforward network structures, an enhancement structure with direct data feedback have been studied and proved to have better performance than the feedfoward networks. However, due to the above simple feedback methods lack sufficient analysis and learning ability on the feedback data, it is inadequate to deal with more complicated nonlinear systems and therefore the performance is limited for further improvement. In this paper, a novel multi-agent feedback enabled neural network (MAFENN) framework is proposed, which make the framework have stronger feedback learning capabilities and more intelligence on feature abstraction, denoising or generation, etc. Furthermore, the MAFENN framework is theoretically formulated into a three-player Feedback Stackelberg game, and the game is proved to converge to the Feedback Stackelberg equilibrium. The design of MAFENN framework and algorithm are dedicated to enhance the learning capability of the feedfoward DL networks or their variations with the simple data feedback. To verify the MAFENN framework's feasibility in wireless communications, a multi-agent MAFENN based equalizer (MAFENN-E) is developed for wireless fading channels with inter-symbol interference (ISI). Experimental results show that when the quadrature phase-shift keying (QPSK) modulation scheme is adopted, the SER performance of our proposed method outperforms that of the traditional equalizers by about 2 dB in linear channels. When in nonlinear channels, the SER performance of our proposed method outperforms that of either traditional or DL based equalizers more significantly, which shows the effectiveness and robustness of our proposal in the complex channel environment. △ Less

Submitted 22 May, 2022; originally announced May 2022.

arXiv:2205.04120 [pdf, other]

Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech

Authors: Yang Li, Cheng Yu, Guangzhi Sun, Hua Jiang, Fanglei Sun, Weiqin Zu, Ying Wen, Yang Yang, Jun Wang

Abstract: Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems. In this paper, a cross-utterance conditional VAE (CUC-VAE) is proposed to estimate a posterior probability distribution of the latent prosody features for each phoneme by conditioning on acoustic features, speaker information, and text features obtained from both past… ▽ More Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems. In this paper, a cross-utterance conditional VAE (CUC-VAE) is proposed to estimate a posterior probability distribution of the latent prosody features for each phoneme by conditioning on acoustic features, speaker information, and text features obtained from both past and future sentences. At inference time, instead of the standard Gaussian distribution used by VAE, CUC-VAE allows sampling from an utterance-specific prior distribution conditioned on cross-utterance information, which allows the prosody features generated by the TTS system to be related to the context and is more similar to how humans naturally produce prosody. The performance of CUC-VAE is evaluated via a qualitative listening test for naturalness, intelligibility and quantitative measurements, including word error rates and the standard deviation of prosody attributes. Experimental results on LJ-Speech and LibriTTS data show that the proposed CUC-VAE TTS system improves naturalness and prosody diversity with clear margins. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: ACL 2022 camera ready

arXiv:2205.03553 [pdf, other]

doi 10.1016/j.patcog.2023.110205

From heavy rain removal to detail restoration: A faster and better network

Authors: Yuanbo Wen, Tao Gao, **g Zhang, Kaihao Zhang, Ting Chen

Abstract: The profound accumulation of precipitation during intense rainfall events can markedly degrade the quality of images, leading to the erosion of textural details. Despite the improvements observed in existing learning-based methods specialized for heavy rain removal, it is discerned that a significant proportion of these methods tend to overlook the precise reconstruction of the intricate details.… ▽ More The profound accumulation of precipitation during intense rainfall events can markedly degrade the quality of images, leading to the erosion of textural details. Despite the improvements observed in existing learning-based methods specialized for heavy rain removal, it is discerned that a significant proportion of these methods tend to overlook the precise reconstruction of the intricate details. In this work, we introduce a simple dual-stage progressive enhancement network, denoted as DPENet, aiming to achieve effective deraining while preserving the structural accuracy of rain-free images. This approach comprises two key modules, a rain streaks removal network (R$^2$Net) focusing on accurate rain removal, and a details reconstruction network (DRNet) designed to recover the textural details of rain-free images. Firstly, we introduce a dilated dense residual block (DDRB) within R$^2$Net, enabling the aggregation of high-level and low-level features. Secondly, an enhanced residual pixel-wise attention block (ERPAB) is integrated into DRNet to facilitate the incorporation of contextual information. To further enhance the fidelity of our approach, we employ a comprehensive loss function that accentuates both the marginal and regional accuracy of rain-free images. Extensive experiments conducted on publicly available benchmarks demonstrates the noteworthy efficiency and effectiveness of our proposed DPENet. The source code and pre-trained models are currently available at \url{https://github.com/chdwyb/DPENet}. △ Less

Submitted 18 December, 2023; v1 submitted 7 May, 2022; originally announced May 2022.

Comments: Accepted by Pattern Recognition

arXiv:2204.10151 [pdf, ps, other]

doi 10.1109/PCS.2016.7906400

A Bitstream Feature Based Model for Video Decoding Energy Estimation

Authors: Christian Herglotz, Yongjun Wen, Bowen Dai, Matthias Kränzler, André Kaup

Abstract: In this paper we show that a small amount of bit stream features can be used to accurately estimate the energy consumption of state-of-the-art software and hardware accelerated decoder implementations for four different video codecs. By testing the estimation performance on HEVC, H.264, H.263, and VP9 we show that the proposed model can be used for any hybrid video codec. We test our approach on a… ▽ More In this paper we show that a small amount of bit stream features can be used to accurately estimate the energy consumption of state-of-the-art software and hardware accelerated decoder implementations for four different video codecs. By testing the estimation performance on HEVC, H.264, H.263, and VP9 we show that the proposed model can be used for any hybrid video codec. We test our approach on a high amount of different test sequences to prove the general validity. We show that less than 20 features are sufficient to obtain mean estimation errors that are smaller than 8%. Finally, an example will show the performance trade-offs in terms of rate, distortion, and decoding energy for all tested codecs. △ Less

Submitted 21 April, 2022; originally announced April 2022.

Comments: 5 pages, 2 figures, 2016 Picture Coding Symposium (PCS)

arXiv:2111.04963 [pdf, other]

doi 10.1109/TSG.2022.3179998

Aggregated Feasible Region of Heterogeneous Demand-Side Flexible Resources -- Part I: Theoretical Derivation of the Exact Model

Authors: Yilin Wen, Zechun Hu, Shi You, Xiaoyu Duan

Abstract: In the first part of the two-part series, the model to describe the exact aggregated feasible region (AFR) of multiple types of demand-side resources is derived. Based on a discrete-time unified individual model of heterogeneous resources, the calculation of AFR is, in fact, a feasible region projection problem. Therefore, the Fourier-Motzkin Elimination (FME) method is used for derivation. By ana… ▽ More In the first part of the two-part series, the model to describe the exact aggregated feasible region (AFR) of multiple types of demand-side resources is derived. Based on a discrete-time unified individual model of heterogeneous resources, the calculation of AFR is, in fact, a feasible region projection problem. Therefore, the Fourier-Motzkin Elimination (FME) method is used for derivation. By analyzing the redundancy of all possible constraints in the FME process, the mathematical expression and calculation method for the exact AFR is proposed. The number of constraints is linear with the number of resources and is exponential with the number of time intervals, respectively. The computational complexity has been dramatically simplified compared with the original FME. However, the number of constraints in the model is still exponential and cannot be simplified anymore. Hence, In Part II of this paper, several approximation methods are proposed and analyzed in detail. △ Less

Submitted 9 November, 2021; originally announced November 2021.

Comments: 10 pages

Journal ref: IEEE Transactions on Smart Grid, early access, 2022

arXiv:2110.04800 [pdf, other]

Self-Supervised 3D Face Reconstruction via Conditional Estimation

Authors: Yandong Wen, Weiyang Liu, Bhiksha Raj, Rita Singh

Abstract: We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos. CEST is based on the process of analysis by synthesis, where the 3D facial parameters (shape, reflectance, viewpoint, and illumination) are estimated from the face image, and then recombined to reconstruct the 2D face image. In order to learn semant… ▽ More We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos. CEST is based on the process of analysis by synthesis, where the 3D facial parameters (shape, reflectance, viewpoint, and illumination) are estimated from the face image, and then recombined to reconstruct the 2D face image. In order to learn semantically meaningful 3D facial parameters without explicit access to their labels, CEST couples the estimation of different 3D facial parameters by taking their statistical dependency into account. Specifically, the estimation of any 3D facial parameter is not only conditioned on the given image, but also on the facial parameters that have already been derived. Moreover, the reflectance symmetry and consistency among the video frames are adopted to improve the disentanglement of facial parameters. Together with a novel strategy for incorporating the reflectance symmetry and consistency, CEST can be efficiently trained with in-the-wild video clips. Both qualitative and quantitative experiments demonstrate the effectiveness of CEST. △ Less

Submitted 10 October, 2021; originally announced October 2021.

Comments: ICCV 2021 (15 pages)

arXiv:2108.05743 [pdf]

Data-Driven Scheduling of Electric Boiler with Thermal Storage for Providing Power Balancing Service

Authors: Likai Liu, Zechun Hu, Jian Ning, Yilin Wen

Abstract: The rapid development of renewable energy has increased the peak to valley difference of the netload, making the netload following being a new challenge to the power system. Electric boiler with thermal storage (EBTS) occupies a non-negligible part of the load in the winter season in Northern China. EBTS operation optimization can not only save its own energy cost but also reduce the peak shaving… ▽ More The rapid development of renewable energy has increased the peak to valley difference of the netload, making the netload following being a new challenge to the power system. Electric boiler with thermal storage (EBTS) occupies a non-negligible part of the load in the winter season in Northern China. EBTS operation optimization can not only save its own energy cost but also reduce the peak shaving and valley filling pressure of the system. To this end, the operation optimization of EBTS for providing the power balancing service is studied in this paper, which mainly includes three parts: First, the joint probability distribution between the predicted and actual temperatures is built by utilizing the Copula theory; Secondly, the actual temperatures are sampled based on the predicted temperatures of the next day, and the scenario set is generated by clustering these samples, where K-means clustering method are used; Thirdly, the stochastic operation optimization model of EBTS considering the uncertainty of outdoor temperature is constructed. Through the case study, it is found that the proposed method can save the total operation cost of the EBTS compared with the deterministic EBTS operation optimization model. △ Less

Submitted 12 August, 2021; originally announced August 2021.

arXiv:2107.01322 [pdf, other]

Physical Layer Security for NOMA-Enabled Multi-Access Edge Computing Wireless Networks

Authors: Yating Wen, Tong-Xing Zheng, Yongxia Tong, Hao-Wen Liu, Xin Chen, Pengcheng Mu, Hui-Ming Wang

Abstract: Multi-access edge computing (MEC) has been regarded as a promising technique for enhancing computation capabilities for wireless networks. In this paper, we study physical layer security in an MEC system where multiple users offload partial of their computation tasks to a base station simultaneously based on non-orthogonal multiple access (NOMA), in the presence of a malicious eavesdropper. Secrec… ▽ More Multi-access edge computing (MEC) has been regarded as a promising technique for enhancing computation capabilities for wireless networks. In this paper, we study physical layer security in an MEC system where multiple users offload partial of their computation tasks to a base station simultaneously based on non-orthogonal multiple access (NOMA), in the presence of a malicious eavesdropper. Secrecy outage probability is adopted to measure the security performance of the computation offloading against eavesdrop** attacks. We aim to minimize the sum energy consumption of all the users, subject to constraints in terms of the secrecy offloading rate, the secrecy outage probability, and the decoding order of NOMA. Although the original optimization problem is non-convex and challenging to solve, we put forward an efficient algorithm based on sequential convex approximation and penalty dual decomposition. Numerical results are eventually provided to validate the convergence of the proposed algorithm and its superior energy efficiency with secrecy requirements. △ Less

Submitted 2 July, 2021; originally announced July 2021.

Comments: 6 pages, 3 figures, and Accepted to present at IEEE/CIC ICCC 2021

arXiv:2102.01549 [pdf, ps, other]

Medical Datasets Collections for Artificial Intelligence-based Medical Image Analysis

Authors: Yang Wen

Abstract: We collected 32 public datasets, of which 28 for medical imaging and 4 for natural images, to conduct study. The images of these datasets are captured by different cameras, thus vary from each other in modality, frame size and capacity. For data accessibility, we also provide the websites of most datasets and hope this will help the readers reach the datasets. We collected 32 public datasets, of which 28 for medical imaging and 4 for natural images, to conduct study. The images of these datasets are captured by different cameras, thus vary from each other in modality, frame size and capacity. For data accessibility, we also provide the websites of most datasets and hope this will help the readers reach the datasets. △ Less

Submitted 17 February, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

Comments: 6 pages, 1 table

arXiv:2012.06834 [pdf, ps, other]

Deep Reinforcement Learning for Tropical Air Free-Cooled Data Center Control

Authors: Duc Van Le, Rongrong Wang, Yingbo Liu, Rui Tan, Yew-Wah Wong, Yonggang Wen

Abstract: Air free-cooled data centers (DCs) have not existed in the tropical zone due to the unique challenges of year-round high ambient temperature and relative humidity (RH). The increasing availability of servers that can tolerate higher temperatures and RH due to the regulatory bodies' prompts to raise DC temperature setpoints sheds light upon the feasibility of air free-cooled DCs in tropics. However… ▽ More Air free-cooled data centers (DCs) have not existed in the tropical zone due to the unique challenges of year-round high ambient temperature and relative humidity (RH). The increasing availability of servers that can tolerate higher temperatures and RH due to the regulatory bodies' prompts to raise DC temperature setpoints sheds light upon the feasibility of air free-cooled DCs in tropics. However, due to the complex psychrometric dynamics, operating the air free-cooled DC in tropics generally requires adaptive control of supply air condition to maintain the computing performance and reliability of the servers. This paper studies the problem of controlling the supply air temperature and RH in a free-cooled tropical DC below certain thresholds. To achieve the goal, we formulate the control problem as Markov decision processes and apply deep reinforcement learning (DRL) to learn the control policy that minimizes the cooling energy while satisfying the requirements on the supply air temperature and RH. We also develop a constrained DRL solution for performance improvements. Extensive evaluation based on real data traces collected from an air free-cooled testbed and comparisons among the unconstrained and constrained DRL approaches as well as two other baseline approaches show the superior performance of our proposed solutions. △ Less

Submitted 12 December, 2020; originally announced December 2020.

Journal ref: ACM Transactions on Sensor Networks, Special Issue on Computational Intelligence in Internet of Things, 2021

arXiv:2010.09776 [pdf, other]

SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving

Authors: Ming Zhou, Jun Luo, Julian Villella, Yaodong Yang, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat , et al. (12 additional authors not shown)

Abstract: Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse a… ▽ More Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse and competent driving interactions. To meet this need, we develop a dedicated simulation platform called SMARTS (Scalable Multi-Agent RL Training School). SMARTS supports the training, accumulation, and use of diverse behavior models of road users. These are in turn used to create increasingly more realistic and diverse interactions that enable deeper and broader research on multi-agent interaction. In this paper, we describe the design goals of SMARTS, explain its basic architecture and its key features, and illustrate its use through concrete multi-agent experiments on interactive scenarios. We open-source the SMARTS platform and the associated benchmark tasks and evaluation metrics to encourage and empower research on multi-agent learning for autonomous driving. Our code is available at https://github.com/huawei-noah/SMARTS. △ Less

Submitted 31 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: 20 pages, 11 figures. Paper accepted to CoRL 2020

arXiv:2009.01871 [pdf, other]

doi 10.1007/978-3-030-60548-3_18

Federated Learning for Breast Density Classification: A Real-World Implementation

Authors: Holger R. Roth, Ken Chang, Praveer Singh, Nir Neumark, Wenqi Li, Vikash Gupta, Sharut Gupta, Liangqiong Qu, Alvin Ihsani, Bernardo C. Bizzo, Yuhong Wen, Varun Buch, Meesam Shah, Felipe Kitamura, Matheus Mendonça, Vitor Lavor, Ahmed Harouni, Colin Compas, Jesse Tetreault, Prerna Dogra, Yan Cheng, Selnur Erdal, Richard White, Behrooz Hashemian, Thomas Schultz , et al. (18 additional authors not shown)

Abstract: Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Report… ▽ More Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Reporting & Data System (BI-RADS). We show that despite substantial differences among the datasets from all sites (mammography system, class distribution, and data set size) and without centralizing data, we can successfully train AI models in federation. The results show that models trained using FL perform 6.3% on average better than their counterparts trained on an institute's local data alone. Furthermore, we show a 45.8% relative improvement in the models' generalizability when evaluated on the other participating sites' testing data. △ Less

Submitted 20 October, 2020; v1 submitted 3 September, 2020; originally announced September 2020.

Comments: Accepted at the 1st MICCAI Workshop on "Distributed And Collaborative Learning"; add citation to Fig. 1 & 2 and update Fig. 5; fix typo in affiliations

Journal ref: In: Albarqouni S. et al. (eds) Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning. DART 2020, DCL 2020. Lecture Notes in Computer Science, vol 12444. Springer, Cham

arXiv:2007.15778 [pdf, other]

Weakly supervised one-stage vision and language disease detection using large scale pneumonia and pneumothorax studies

Authors: Leo K. Tam, Xiaosong Wang, Evrim Turkbey, Kevin Lu, Yuhong Wen, Daguang Xu

Abstract: Detecting clinically relevant objects in medical images is a challenge despite large datasets due to the lack of detailed labels. To address the label issue, we utilize the scene-level labels with a detection architecture that incorporates natural language information. We present a challenging new set of radiologist paired bounding box and natural language annotations on the publicly available MIM… ▽ More Detecting clinically relevant objects in medical images is a challenge despite large datasets due to the lack of detailed labels. To address the label issue, we utilize the scene-level labels with a detection architecture that incorporates natural language information. We present a challenging new set of radiologist paired bounding box and natural language annotations on the publicly available MIMIC-CXR dataset especially focussed on pneumonia and pneumothorax. Along with the dataset, we present a joint vision language weakly supervised transformer layer-selected one-stage dual head detection architecture (LITERATI) alongside strong baseline comparisons with class activation map** (CAM), gradient CAM, and relevant implementations on the NIH ChestXray-14 and MIMIC-CXR dataset. Borrowing from advances in vision language architectures, the LITERATI method demonstrates joint image and referring expression (objects localized in the image using natural language) input for detection that scales in a purely weakly supervised fashion. The architectural modifications address three obstacles -- implementing a supervised vision and language detection method in a weakly supervised fashion, incorporating clinical referring expression natural language information, and generating high fidelity detections with map probabilities. Nevertheless, the challenging clinical nature of the radiologist annotations including subtle references, multi-instance specifications, and relatively verbose underlying medical reports, ensures the vision language detection task at scale remains stimulating for future investigation. △ Less

Submitted 30 July, 2020; originally announced July 2020.

Comments: Accepted at Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2020

arXiv:2007.04517 [pdf, other]

Distributed Energy Trading and Scheduling among Microgrids via Multiagent Reinforcement Learning

Authors: Guanyu Gao, Yonggang Wen, Xiaohu Wu, Ran Wang

Abstract: The development of renewable energy generation empowers microgrids to generate electricity to supply itself and to trade the surplus on energy markets. To minimize the overall cost, a microgrid must determine how to schedule its energy resources and electrical loads and how to trade with others. The control decisions are influenced by various factors, such as energy storage, renewable energy yield… ▽ More The development of renewable energy generation empowers microgrids to generate electricity to supply itself and to trade the surplus on energy markets. To minimize the overall cost, a microgrid must determine how to schedule its energy resources and electrical loads and how to trade with others. The control decisions are influenced by various factors, such as energy storage, renewable energy yield, electrical load, and competition from other microgrids. Making the optimal control decision is challenging, due to the complexity of the interconnected microgrids, the uncertainty of renewable energy generation and consumption, and the interplay among microgrids. The previous works mainly adopted the modeling-based approaches for deriving the control decision, yet they relied on the precise information of future system dynamics, which can be hard to obtain in a complex environment. This work provides a new perspective of obtaining the optimal control policy for distributed energy trading and scheduling by directly interacting with the environment, and proposes a multiagent deep reinforcement learning approach for learning the optimal control policy. Each microgrid is modeled as an agent, and different agents learn collaboratively for maximizing their rewards. The agent of each microgrid can make the local scheduling decision without knowing others' information, which can well maintain the autonomy of each microgrid. We evaluate the performances of our proposed method using real-world datasets. The experimental results show that our method can significantly reduce the cost of the microgrids compared with the baseline methods. △ Less

Submitted 8 July, 2020; originally announced July 2020.

arXiv:2006.09008 [pdf, other]

Reinforcement Learning Control of Robotic Knee with Human in the Loop by Flexible Policy Iteration

Authors: Xiang Gao, Jennie Si, Yue Wen, Minhan Li, He, Huang

Abstract: We are motivated by the real challenges presented in a human-robot system to develop new designs that are efficient at data level and with performance guarantees such as stability and optimality at systems level. Existing approximate/adaptive dynamic programming (ADP) results that consider system performance theoretically are not readily providing practically useful learning control algorithms for… ▽ More We are motivated by the real challenges presented in a human-robot system to develop new designs that are efficient at data level and with performance guarantees such as stability and optimality at systems level. Existing approximate/adaptive dynamic programming (ADP) results that consider system performance theoretically are not readily providing practically useful learning control algorithms for this problem; and reinforcement learning (RL) algorithms that address the issue of data efficiency usually do not have performance guarantees for the controlled system. This study fills these important voids by introducing innovative features to the policy iteration algorithm. We introduce flexible policy iteration (FPI), which can flexibly and organically integrate experience replay and supplemental values from prior experience into the RL controller. We show system level performances including convergence of the approximate value function, (sub)optimality of the solution, and stability of the system. We demonstrate the effectiveness of the FPI via realistic simulations of the human-robot system. It is noted that the problem we face in this study may be difficult to address by design methods based on classical control theory as it is nearly impossible to obtain a customized mathematical model of a human-robot system either online or offline. The results we have obtained also indicate the great potential of RL control to solving realistic and challenging problems with high dimensional control inputs. △ Less

Submitted 17 January, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

arXiv:2006.06518 [pdf, ps, other]

doi 10.1109/TRO.2021.3078317

Towards Expedited Impedance Tuning of a Robotic Prosthesis for Personalized Gait Assistance by Reinforcement Learning Control

Authors: Minhan Li, Yue Wen, Xiang Gao, Jennie Si, He Helen Huang

Abstract: Personalizing medical devices such as lower limb wearable robots is challenging. While the initial feasibility of automating the process of knee prosthesis control parameter tuning has been demonstrated in a principled way, the next critical issue is to improve tuning efficiency and speed it up for the human user, in clinic settings, while maintaining human safety. We, therefore, propose a policy… ▽ More Personalizing medical devices such as lower limb wearable robots is challenging. While the initial feasibility of automating the process of knee prosthesis control parameter tuning has been demonstrated in a principled way, the next critical issue is to improve tuning efficiency and speed it up for the human user, in clinic settings, while maintaining human safety. We, therefore, propose a policy iteration with constraint embedded (PICE) method as an innovative solution to the problem under the framework of reinforcement learning. Central to PICE is the use of a projected Bellman equation with a constraint of assuring positive semidefiniteness of performance values during policy evaluation. Additionally, we developed both online and offline PICE implementations that provide additional flexibility for the designer to fully utilize measurement data, either from on-policy or off-policy, to further improve PICE tuning efficiency. Our human subject testing showed that the PICE provided effective policies with significantly reduced tuning time. For the first time, we also experimentally evaluated and demonstrated the robustness of the deployed policies by applying them to different tasks and users. Putting it together, our new way of problem solving has been effective as PICE has demonstrated its potential toward truly automating the process of control parameter tuning for robotic knee prosthesis users. △ Less

Submitted 5 June, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

Journal ref: IEEE Transactions on Robotics, 2021

arXiv:2001.10681 [pdf, other]

doi 10.1145/3408308.3427982

Kalibre: Knowledge-based Neural Surrogate Model Calibration for Data Center Digital Twins

Authors: Ruihang Wang, Xin Zhou, Linsen Dong, Yonggang Wen, Rui Tan, Li Chen, Guan Wang, Feng Zeng

Abstract: Computational fluid dynamics (CFD) model has been widely used for prototy** data centers. Evolving it to high-fidelity {\em digital twin} is desirable for the management and operations of large-scale data centers. Manually calibrating CFD model parameters to achieve twin-class fidelity by specially trained domain expert is tedious and labor-intensive. To reduce manual efforts, existing automatic… ▽ More Computational fluid dynamics (CFD) model has been widely used for prototy** data centers. Evolving it to high-fidelity {\em digital twin} is desirable for the management and operations of large-scale data centers. Manually calibrating CFD model parameters to achieve twin-class fidelity by specially trained domain expert is tedious and labor-intensive. To reduce manual efforts, existing automatic calibration approaches developed for various computational models apply heuristics to search model configurations within an empirically defined parameter bound. However, in the context of CFD, each search step requires long-lasting CFD model's iterated solving, rendering these approaches impractical with increased model complexity. This paper presents Kalibre, a knowledge-based neural surrogate approach that performs CFD model calibration by iterating four key steps of i) training a neural surrogate model based on CFD-generated data, ii) finding the optimal parameters at the moment through neural surrogate retraining based on sensor-measured data, iii) configuring the found parameters back to the CFD model, and iv) validating the CFD model using sensor-measured data as the ground truth. Thus, the parameter search is offloaded to the neural surrogate which is ultra-faster than CFD model's iterated solving. To speed up the convergence of Kalibre, we integrate prior knowledge of the twinned data center's thermophysics into the neural surrogate design to improve its learning efficiency. With about five hours computation on a 32-core processor, Kalibre achieves mean absolute errors (MAEs) of $0.81^o$C and $0.75^o$C in calibrating two CFD models for two production data halls hosting thousands of servers each while requires fewer CFD solving processes than existing baseline approaches. △ Less

Submitted 10 November, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

Comments: Accepted to ACM BuildSys 2020

ACM Class: J.6; I.6.5

arXiv:1912.07383 [pdf, other]

A Survey of Predictive Maintenance: Systems, Purposes and Approaches

Authors: Tianwen Zhu, Yongyi Ran, Xin Zhou, Yonggang Wen

Abstract: This paper highlights the importance of maintenance techniques in the coming industrial revolution, reviews the evolution of maintenance techniques, and presents a comprehensive literature review on the latest advancement of maintenance techniques, i.e., Predictive Maintenance (PdM), with emphasis on system architectures, optimization objectives, and optimization methods. In industry, any outages… ▽ More This paper highlights the importance of maintenance techniques in the coming industrial revolution, reviews the evolution of maintenance techniques, and presents a comprehensive literature review on the latest advancement of maintenance techniques, i.e., Predictive Maintenance (PdM), with emphasis on system architectures, optimization objectives, and optimization methods. In industry, any outages and unplanned downtime of machines or systems would degrade or interrupt a company's core business, potentially resulting in significant penalties and immeasurable reputation and economic loss. Existing traditional maintenance approaches, such as Reactive Maintenance (RM) and Preventive Maintenance (PM), suffer from high prevent and repair costs, inadequate or inaccurate mathematical degradation processes, and manual feature extraction. The incoming fourth industrial revolution is also demanding for a new maintenance paradigm to reduce the maintenance cost and downtime, and increase system availability and reliability. Predictive Maintenance (PdM) is envisioned the solution. In this survey, we first provide a high-level view of the PdM system architectures including PdM 4.0, Open System Architecture for Condition Based Monitoring (OSA-CBM), and cloud-enhanced PdM system. Then, we review the specific optimization objectives, which mainly comprise cost minimization, availability/reliability maximization, and multi-objective optimization. Furthermore, we present the optimization methods to achieve the aforementioned objectives, which include traditional Machine Learning (ML) based and Deep Learning (DL) based approaches. Finally, we highlight the future research directions that are critical to promote the application of DL techniques in the context of PdM. △ Less

Submitted 21 March, 2024; v1 submitted 12 December, 2019; originally announced December 2019.

Comments: 38 pages, 23 figures

arXiv:1911.09401 [pdf, other]

Segmenting Medical MRI via Recurrent Decoding Cell

Authors: Ying Wen, Kai Xie, Lianghua He

Abstract: The encoder-decoder networks are commonly used in medical image segmentation due to their remarkable performance in hierarchical feature fusion. However, the expanding path for feature decoding and spatial recovery does not consider the long-term dependency when fusing feature maps from different layers, and the universal encoder-decoder network does not make full use of the multi-modality informa… ▽ More The encoder-decoder networks are commonly used in medical image segmentation due to their remarkable performance in hierarchical feature fusion. However, the expanding path for feature decoding and spatial recovery does not consider the long-term dependency when fusing feature maps from different layers, and the universal encoder-decoder network does not make full use of the multi-modality information to improve the network robustness especially for segmenting medical MRI. In this paper, we propose a novel feature fusion unit called Recurrent Decoding Cell (RDC) which leverages convolutional RNNs to memorize the long-term context information from the previous layers in the decoding phase. An encoder-decoder network, named Convolutional Recurrent Decoding Network (CRDN), is also proposed based on RDC for segmenting multi-modality medical MRI. CRDN adopts CNN backbone to encode image features and decode them hierarchically through a chain of RDCs to obtain the final high-resolution score map. The evaluation experiments on BrainWeb, MRBrainS and HVSMR datasets demonstrate that the introduction of RDC effectively improves the segmentation accuracy as well as reduces the model size, and the proposed CRDN owns its robustness to image noise and intensity non-uniformity in medical MRI. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: 8 pages, 7 figures, AAAI-20

arXiv:1909.11199 [pdf, ps, other]

Security Risk Analysis of the Shorter-Queue Routing Policy for Two Symmetric Servers

Authors: Yu Tang, Yining Wen, Li **

Abstract: In this article, we study the classical shortest queue problem under the influence of malicious attacks, which is relevant to a variety of engineering system including transportation, manufacturing, and communications. We consider a homogeneous Poisson arrival process of jobs and two parallel exponential servers with symmetric service rates. A system operator route incoming jobs to the shorter que… ▽ More In this article, we study the classical shortest queue problem under the influence of malicious attacks, which is relevant to a variety of engineering system including transportation, manufacturing, and communications. We consider a homogeneous Poisson arrival process of jobs and two parallel exponential servers with symmetric service rates. A system operator route incoming jobs to the shorter queue; if the queues are equal, the job is routed randomly. A malicious attacker is able to intercept the operator's routing instruction and overwrite it with a randomly generated one. The operator is able to defend individual jobs to ensure correct routing. Both attacking and defending induce technological costs. The attacker's (resp. operator's) decision is the probability of attacking (resp. defending) the routing of each job. We first quantify the queuing cost for given strategy profiles by deriving a theoretical upper bound for the cost. Then, we formulate a non-zero-sum attacker-defender game, characterize the equilibria in multiple regimes, and quantify the security risk. We find that the attacker's best strategy is either to attack all jobs or not to attack, and the defender's strategy is strongly influenced by the arrival rate of jobs. Finally, as a benchmark, we compare the security risks of the feedback-controlled system to a corresponding open-loop system with Bernoulli routing. We show that the shorter-queue policy has a higher (resp. lower) security risk than the Bernoulli policy if the demand is lower (resp. higher) than the service rate of one server. △ Less

Submitted 1 October, 2019; v1 submitted 24 September, 2019; originally announced September 2019.

arXiv:1905.10604 [pdf, other]

Reconstructing faces from voices

Authors: Yandong Wen, Rita Singh, Bhiksha Raj

Abstract: Voice profiling aims at inferring various human parameters from their speech, e.g. gender, age, etc. In this paper, we address the challenge posed by a subtask of voice profiling - reconstructing someone's face from their voice. The task is designed to answer the question: given an audio clip spoken by an unseen person, can we picture a face that has as many common elements, or associations as pos… ▽ More Voice profiling aims at inferring various human parameters from their speech, e.g. gender, age, etc. In this paper, we address the challenge posed by a subtask of voice profiling - reconstructing someone's face from their voice. The task is designed to answer the question: given an audio clip spoken by an unseen person, can we picture a face that has as many common elements, or associations as possible with the speaker, in terms of identity? To address this problem, we propose a simple but effective computational framework based on generative adversarial networks (GANs). The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set. We evaluate the performance of the network by leveraging a closely related task - cross-modal matching. The results show that our model is able to generate faces that match several biometric characteristics of the speaker, and results in matching accuracies that are much better than chance. △ Less

Submitted 31 May, 2019; v1 submitted 25 May, 2019; originally announced May 2019.

arXiv:1901.04693 [pdf, other]

Energy-Efficient Thermal Comfort Control in Smart Buildings via Deep Reinforcement Learning

Authors: Guanyu Gao, Jie Li, Yonggang Wen

Abstract: Heating, Ventilation, and Air Conditioning (HVAC) is extremely energy-consuming, accounting for 40% of total building energy consumption. Therefore, it is crucial to design some energy-efficient building thermal control policies which can reduce the energy consumption of HVAC while maintaining the comfort of the occupants. However, implementing such a policy is challenging, because it involves var… ▽ More Heating, Ventilation, and Air Conditioning (HVAC) is extremely energy-consuming, accounting for 40% of total building energy consumption. Therefore, it is crucial to design some energy-efficient building thermal control policies which can reduce the energy consumption of HVAC while maintaining the comfort of the occupants. However, implementing such a policy is challenging, because it involves various influencing factors in a building environment, which are usually hard to model and may be different from case to case. To address this challenge, we propose a deep reinforcement learning based framework for energy optimization and thermal comfort control in smart buildings. We formulate the building thermal control as a cost-minimization problem which jointly considers the energy consumption of HVAC and the thermal comfort of the occupants. To solve the problem, we first adopt a deep neural network based approach for predicting the occupants' thermal comfort, and then adopt Deep Deterministic Policy Gradients (DDPG) for learning the thermal control policy. To evaluate the performance, we implement a building thermal control simulation system and evaluate the performance under various settings. The experiment results show that our method can improve the thermal comfort prediction accuracy, and reduce the energy consumption of HVAC while improving the occupants' thermal comfort. △ Less

Submitted 15 January, 2019; originally announced January 2019.

arXiv:1810.12093 [pdf]

80-Channel WDM-MDM Transmission over 50-km Ring-Core Fiber Using a Compact OAM DEMUX and Modular 4x4 MIMO Equalization

Authors: Junwei Zhang, Yuanhui Wen, Heyun Tan, Jie Liu, Lei Shen, Maochun Wang, Jiangbo Zhu, Changjian Guo, Yujie Chen, Zhaohui Li, Siyuan Yu

Abstract: 8-OAM modes each carrying 10 wavelengths with 2.56-Tbit/s aggregated capacity and 10.24-bit/s/Hz spectral efficiency have been transmitted over 50-km specially designed ring-core fiber, using a compact OAM mode sorter and only modular 4x4 MIMO equalization. 8-OAM modes each carrying 10 wavelengths with 2.56-Tbit/s aggregated capacity and 10.24-bit/s/Hz spectral efficiency have been transmitted over 50-km specially designed ring-core fiber, using a compact OAM mode sorter and only modular 4x4 MIMO equalization. △ Less

Submitted 22 October, 2018; originally announced October 2018.

Comments: 3 pages,2 figures, conference

arXiv:1709.08275 [pdf]

Compressed Air Energy Storage-Part II: Application to Power System Unit Commitment

Authors: Junpeng Zhan, Yunfeng Wen, Osama Aslam Ansari, C. Y. Chung

Abstract: Unit commitment (UC) is one of the most important power system operation problems. To integrate higher penetration of wind power into power systems, more compressed air energy storage (CAES) plants are being built. Existing cavern models for the CAES used in power system optimization problems are not accurate, which may lead to infeasible solutions, e.g., the air pressure in the cavern is outside… ▽ More Unit commitment (UC) is one of the most important power system operation problems. To integrate higher penetration of wind power into power systems, more compressed air energy storage (CAES) plants are being built. Existing cavern models for the CAES used in power system optimization problems are not accurate, which may lead to infeasible solutions, e.g., the air pressure in the cavern is outside its operating range. In this regard, an accurate CAES model is proposed for the UC problem based on the accurate bi-linear cavern model proposed in the first paper of this two-part series. The minimum switch time between the charging and discharging processes of CAES is considered. The whole model, i.e., the UC model with an accurate CAES model, is a large-scale mixed integer bi-linear programming problem. To reduce the complexity of the whole model, three strategies are proposed to reduce the number of bi-linear terms without sacrificing accuracy. McCormick relaxation and piecewise linearization are then used to linearize the whole model. To decrease the solution time, a method to obtain an initial solution of the linearized model is proposed. A modified RTS-79 system is used to verify the effectiveness of the whole model and the solution methodology. △ Less

Submitted 25 November, 2022; v1 submitted 24 September, 2017; originally announced September 2017.

Comments: 8 pages, 6 figures

arXiv:1709.05077 [pdf, other]

Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning

Authors: Yuanlong Li, Yonggang Wen, Kyle Guan, Dacheng Tao

Abstract: Cooling system plays a critical role in a modern data center (DC). Develo** an optimal control policy for DC cooling system is a challenging task. The prevailing approaches often rely on approximating system models that are built upon the knowledge of mechanical cooling, electrical and thermal management, which is difficult to design and may lead to sub-optimal or unstable performances. In this… ▽ More Cooling system plays a critical role in a modern data center (DC). Develo** an optimal control policy for DC cooling system is a challenging task. The prevailing approaches often rely on approximating system models that are built upon the knowledge of mechanical cooling, electrical and thermal management, which is difficult to design and may lead to sub-optimal or unstable performances. In this paper, we propose utilizing the large amount of monitoring data in DC to optimize the control policy. To do so, we cast the cooling control policy design into an energy cost minimization problem with temperature constraints, and tap it into the emerging deep reinforcement learning (DRL) framework. Specifically, we propose an end-to-end cooling control algorithm (CCA) that is based on the actor-critic framework and an off-policy offline version of the deep deterministic policy gradient (DDPG) algorithm. In the proposed CCA, an evaluation network is trained to predict an energy cost counter penalized by the cooling status of the DC room, and a policy network is trained to predict optimized control settings when gave the current load and weather information. The proposed algorithm is evaluated on the EnergyPlus simulation platform and on a real data trace collected from the National Super Computing Centre (NSCC) of Singapore. Our results show that the proposed CCA can achieve about 11% cooling cost saving on the simulation platform compared with a manually configured baseline control algorithm. In the trace-based study, we propose a de-underestimation validation mechanism as we cannot directly test the algorithm on a real DC. Even though with DUE the results are conservative, we can still achieve about 15% cooling energy saving on the NSCC data trace if we set the inlet temperature threshold at 26.6 degree Celsius. △ Less

Submitted 18 July, 2018; v1 submitted 15 September, 2017; originally announced September 2017.

Showing 1–39 of 39 results for author: Wen, Y