Hybrid-Task Meta-Learning: A Graph Neural Network Approach for Scalable and Transferable Bandwidth Allocation

Xin Hao, Changyang She, Phee Lep Yeoh, , Yuhong Liu, Branka Vucetic, and Yonghui Li Part of this work was presented at the 2023 IEEE International Conference on Communications Workshops (ICC workshops) [1]. (Corresponding author: Changyang She.)

Abstract

In this paper, we develop a deep learning-based bandwidth allocation policy that is: 1) scalable with the number of users and 2) transferable to different communication scenarios, such as non-stationary wireless channels, different quality-of-service (QoS) requirements, and dynamically available resources. To support scalability, the bandwidth allocation policy is represented by a graph neural network (GNN), with which the number of training parameters does not change with the number of users. To enable the generalization of the GNN, we develop a hybrid-task meta-learning (HML) algorithm that trains the initial parameters of the GNN with different communication scenarios during meta-training. Next, during meta-testing, a few samples are used to fine-tune the GNN with unseen communication scenarios. Simulation results demonstrate that our HML approach can improve the initial performance by $8.79\%$ , and sampling efficiency by $73\%$ , compared with existing benchmarks. After fine-tuning, our near-optimal GNN-based policy can achieve close to the same reward with much lower inference complexity compared to the optimal policy obtained using iterative optimization.

Index Terms:

Bandwidth allocation, graph neural network, meta-learning, quality-of-service.

I Introduction

Throughout the rapid evolution of wireless communication systems, the spectral efficiency, which is the amount of information that can be transmitted over a given bandwidth while maintaining a certain quality of service (QoS) level, still remains one of the most critical performance metrics for future sixth-generation (6G) wireless communications [1, 2]. To maximize spectrum efficiency, low-complexity bandwidth allocation solutions are critical for real-time decision-making within each transmission time interval (TTI) that could be shorter than one millisecond in current fifth-generation (5G) wireless communications. Furthermore, the number of users requesting bandwidth in each TTI is stochastic [3, 4], each user may have different QoS requirements [6, 5, 7], and wireless channels are non-stationary [8, 9], making it difficult to develop a low-complexity bandwidth allocation policy that is scalable with the number of users and can satisfy a diverse range of communication scenarios.

Existing iterative optimization algorithms can obtain optimal bandwidth allocation policies, but their computational complexity is generally too high to be implemented in real time [10, 11, 12]. To reduce the computational complexity, deep learning is a promising approach for 6G communications [14, 13]. The idea is to train a deep neural network that maps the network status to the optimal decision. After training, the deep neural network can be used in communication systems for real-time decision-making, referred to as inference [15]. Although deep learning has much lower inference complexity compared with iterative optimization algorithms, existing deep learning solutions using fully connected neural networks (FNNs) are not scalable to different number of users in wireless networks [16]. This is because the number of training parameters of an FNN depends on the dimensions of the input and output, which change with the number of users. Thus, a well-trained FNN is not applicable in wireless networks with stochastic user requests. In contrast to FNNs, graph neural networks (GNNs) have scalable numbers of training parameters that adapt to the number of users [17] — making them highly-suitable for develo** scalable deep learning-based resource allocation solutions for wireless networks [19, 18]. Furthermore, improving the generalization ability of GNN in wireless networks with diverse QoS requirements remains an open problem.

A key 5G application that requires flexible resource allocation solutions is network slicing, where resources from a shared physical infrastructure is partitioned into distinct network slices supporting diverse QoS requirements, such as data rate [21, 20], latency [22, 23], and security [25, 26, 24, 27], in both long and short coding blocklength regimes [30, 28, 29]. To reserve resources for a single slice, the authors of [31] proposed to compute the weights of different slices based on the corresponding QoS requirements and the number of service requests. With this approach, the amount of reserved resources for each slice is stochastic. Meanwhile, since the wireless channels are non-stationary, the reserved resources and the wireless channels in the training stage could be different from the actual required resources in the testing stage [32, 33]. As such, the mismatch between training data samples and testing data samples remains a crucial bottleneck for implementing efficient learning-based policies in practical wireless networks.

Recent works have proposed to reduce the online training time by transfer learning, which involves offline pre-training and online fine-tuning [10]. This method effectively reuses previously well-trained neural network features and significantly improves the sample efficiency. To further improve the online training efficiency for unseen tasks, meta-learning has been proposed [34, 37, 35, 36]. One of the meta-learning algorithms, model-agnostic meta-learning (MAML), has been applied to solve policy mismatch issues caused by varying user requests and non-stationary wireless channels [39, 38, 8, 9]. While these aforementioned works have highlighted the generalization ability of meta-learning for non-stationary wireless resource allocation, no works have addressed the impact of diverse QoS requirements in different communication scenarios.

In this paper, we put forth a low-complexity bandwidth allocation framework by designing a GNN that is scalable with the number of users and applying meta-learning to generalize the GNN to different communication scenarios. The main contributions are summarized as follows,

•

Our proposed GNN is designed to handle six diverse QoS requirements of data rate, latency, and security in each of the long and short coding blocklength regimes. This generalization is achieved by using feature engineering to translate the channel state information (CSI) and customized QoS requirement of individual users into the minimum required bandwidth.
•

Based on the extracted feature of minimum required bandwidth, we design a GNN-based bandwidth allocation policy that is scalable to the number of users. To train the GNN, we apply an unsupervised learning method to maximize the sum reward of the users with different QoS requirements in a network-slicing architecture.
•

The optimal bandwidth allocation policies are obtained based on an iterative optimization algorithm to obtain the performance limit of the GNN-based policy in terms of the sum reward. By analyzing the computational complexity, we show that the GNN has a much lower inference complexity compared with the iterative optimization algorithm that is optimal.
•

Finally, we develop our generalized hybrid-task meta-learning (HML) algorithm that is transferable to different communication scenarios by using meta-training to train the initial parameters of the GNN. We note that only a few samples are required to fine-tune the parameters of the GNN in meta-testing which validates that our GNN-based policy initialized by HML can be efficiently transferred to previously unseen communication scenarios. Simulation results show that our GNN-based policy achieves near-optimal performance and HML significantly outperforms the three considered benchmarks of MAML, MTL transfer (multi-task learning based transfer learning), and random initialisation.

In our simulations, the gap between the sum reward achieved by the GNN-based policy and that of the optimal bandwidth allocation policy obtained from the iterative optimization algorithm is less than $6\%$ . HML also improves the initial performance by up to $8.79\%$ and sample efficiency by up to $73\%$ compared with the MAML benchmark. We also show that the performance gains of HML is even higher when compared to the other two benchmarks.

II Related Works

II-A Deep Learning for Resource Allocation in Wireless Communications

Applying deep learning for resource allocation in wireless networks has been widely studied in the existing literature [15, 16]. In [15], the authors showed that learning-based algorithms could obtain near-optimal solutions, and the computational complexity in inference is low. In [16], the authors proposed a FNN-based unsupervised learning algorithm to optimize the bandwidth allocation policy. More recently, due to the fact that FNN is not scalable to the number of users, GNNs have been applied in wireless networks optimizations [19, 18]. In [18], the authors designed a GNN, which is scalable to the number of users in a wireless network, to minimize the summation of queuing delay violation probability and packet loss probability. In [19], the authors developed GNN-based scalable learning-based methods to solve radio resource management problems.

TABLE I: Considered QoS Requirements in Related Works

	Data rate		Latency		Security
	Long	Short	Long	Short	Long	Short
[21, 20]	$\checkmark$
[22, 23]			$\checkmark$
[24, 25, 26]					$\checkmark$
[27]						$\checkmark$
[28]				$\checkmark$		$\checkmark$
[30]	$\checkmark$	$\checkmark$
[10]	$\checkmark$	$\checkmark$	$\checkmark$

II-B Generalization of Deep Learning Policies in Non-Stationary Wireless Networks

In wireless networks, the user requests, wireless channels, and available resources for each type of service can be non-stationary. Table I summarizes some QoS requirements considered in the related works. For example, data rate, latency, and security have been investigated in [20, 21, 22, 23, 24, 25, 26]. These papers mainly focus on scenarios with long channel coding blocklengths, where the achievable rate of a wireless link can be approximated by the Shannon capacity. In 5G, the coding blocklength can be short, and Shannon capacity is not applicable. As such, the authors of [27, 28] established how to optimize wireless communication systems using the achievable rate in the short blocklength regime [29]. Meanwhile, different services may co-exist in one network, and the authors of [30, 10] considered different QoS requirements in both long and short blocklength regimes. To support diverse QoS requirements in network slicing, the authors of [31] proposed to reserve bandwidth for different slices based on the number of users and the required QoS.

Further considering that the number of requests, the reserved resources, and the wireless channels are dynamic, improving the generalization ability of deep learning policies has attracted significant research interests in recent years. One approach to address this challenge is to carefully initialize the neural network and fine-tune it online. The authors of [10] applied transfer learning to fine-tune the parameters of deep neural networks that are trained offline in dynamic wireless networks. To further improve the sample efficiency in an unseen communication scenario, meta-learning has been adopted in [8, 9, 38, 39], where the hyper-parameters of a deep neural network, such as the initial parameters, are updated according to a set of communication scenarios in meta-training. In [38], meta-learning was applied to optimize computing resource allocation policies in mobile edge computing networks to fit both time-varying wireless channels and different requests of computing tasks. In [39], meta-learning was applied in virtual reality to quickly adapt to the user movement patterns changing over time. To improve the training efficiency in non-stationary vehicle networks, the authors in [8] proposed optimizing the beamforming using meta-learning. In [9], the authors combined meta-learning and support vector regression to extract the features for beamforming optimization, further improving training efficiency over non-stationary channels.

III System Model and Problem Formulation

We consider an uplink orthogonal-frequency-division-multiple-access communication system with network slicing where $U$ users are requesting different types of services from one base station (BS). The BS first reserves bandwidth for each type of service according to the QoS requirement and the number of users. Then, it allocates bandwidth to different users within each slice. The resource reservation for different slices has been extensively studied in the existing literature, so we will focus on develo** bandwidth allocation policies for individual slices with different numbers of users, non-stationary wireless channels, and dynamic available bandwidth.

III-A Different QoS in Infinite and Short Blocklength Regimes

To investigate the generalization ability of our proposed bandwidth allocation policy, we consider both long and short blocklength regimes with three types of QoS requirements, i.e., data rate, queuing delay, and security. Thus, there are six scenarios in total. We denote the reward of the $u$ -th user by

r_{u}^{\Phi,\xi},\quad\Phi\in\{D,E,S\}\text{ and }\xi\in\{\mathcal{I,F}\},

(1)

where superscripts $D,E,S$ represent data rate, effective capacity with queuing delay constraint, and secrecy rate, respectively, whilst the superscripts $\mathcal{I,F}$ represent the scenarios in the infinite long and finite short blocklength regimes, respectively.

III-A1 Data Rate Requirement

When the blocklength is long, the data rate reward of the $u$ -th user can be expressed as

r_{u}^{D,\mathcal{I}}=\frac{w_{u}}{\ln 2}\ln\left(1+\frac{P_{u}h_{u}}{w_{u}N_{% 0}}\right),

(2)

where $w_{u}$ is the bandwidth allocated to the $u$ -th user, $P_{u}$ is the transmit power of the $u$ -th user, $N_{0}$ is the single-sided noise spectral density, and $h_{u}=\alpha_{u}g_{u}$ is the channel gain, where ${\alpha}_{u}$ and $g_{u}$ represent the large-scale and small-scale channel gains between the $u$ -th user and the BS, respectively.

When the blocklength is short, decoding errors cannot be neglected. As such, the data rate reward of the $u$ -th user can be approximated by [29]

r_{u}^{D,\mathcal{F}}\approx r_{u}^{D,\mathcal{I}}-\sqrt{\frac{V_{u}}{L_{u}}}% \frac{f_{Q}^{-1}(\epsilon_{u})}{\ln 2/w_{u}}

(3)

where $V_{u}=1-{\left(1+\frac{P_{u}h_{u}}{w_{u}N_{0}}\right)^{-2}}$ is the channel dispersion that measures the stochastic variability of the channel related to a deterministic channel with the same capacity, $L_{u}=T_{\mathrm{s}}w_{u}$ is the blocklength, and $T_{\mathrm{s}}$ is the transmission duration of each coding block. The function $f_{Q}^{-1}(x)$ is the inverse of the Gaussian Q-function, and $\epsilon_{u}$ is the decoding error probability.

III-A2 Latency Requirement

When considering latency constraints due to queueing delays, the effective capacity is applied to characterize the statistical QoS requirement in wireless communications, and is expressed as [28]

\begin{split}r_{u}^{E,\xi}=&-\frac{1}{\vartheta_{u}T_{\mathrm{c}}}\ln\left(% \mathbb{E}_{g_{u}}\left[\exp\left(-\vartheta_{u}T_{\mathrm{c}}r_{u}^{D,\xi}% \right)\right]\right),\xi\in\{\mathcal{I,F}\},\end{split}

(4)

where $T_{\mathrm{c}}$ is the channel coherence time, $\vartheta_{u}$ is the QoS exponent for queuing delay, $\mathbb{E}[\cdot]$ denotes the expectation, and $r_{u}^{D,\xi}$ is the data rate in (2) or (3). We note that $\vartheta_{u}=\frac{\ln(1/\varepsilon_{u})}{a_{u}{\tau}_{\max}}$ is determined by the maximum tolerable delay bound violation probability, $\varepsilon_{u}$ , the packet arrival rate, $a_{u}$ , and the threshold of queuing delay, $\tau_{\max}$ .

III-A3 Security Requirement

To formulate the wireless security requirement, we consider that there is an eavesdropper that attempts to wiretap the information transmitted by each user.

In the long blocklength regime, the secrecy rate of the $u$ -th user can be expressed as [24]

r_{u}^{S,\mathcal{I}}=\left[r_{u}^{D,\mathcal{I}}-r_{u}^{e,\mathcal{I}}\right]% ^{+},

(5)

where $[x]^{+}=\max\{0,x\}$ , and $r_{u}^{e,\mathcal{I}}=\frac{w_{u}}{\ln 2}\ln\left(1+\frac{P_{u}h_{u}^{e}}{w_{u% }N_{0}}\right)$ is the data rate of the wiretapped channel from the $u$ -th user to the eavesdropper. The channel gain of the wiretapped channel is denoted by $h_{u}^{e}=\alpha_{u}^{e}g_{u}^{e}$ , where ${\alpha}_{u}^{e}$ and $g_{u}^{e}$ represent the large-scale and small-scale channel gains between the $u$ -th user and the eavesdropper, respectively.

In the short blocklength regime, the achievable secrecy rate of the $u$ -th user can be approximated as [27],

\displaystyle r_{u}^{S,\mathcal{F}}=\begin{cases}r_{u}^{S,\mathcal{I}}-\sqrt{% \frac{V_{u}}{L_{u}}}\frac{f_{Q}^{-1}(\epsilon_{u})}{\ln 2/w_{u}}-\sqrt{\frac{V% _{u}^{e}}{L_{u}}}\frac{f_{Q}^{-1}(\delta_{u})}{\ln 2/w_{u}},&h_{u}>h_{u}^{e}\\ 0,&h_{u}\leq h_{u}^{e},\end{cases}

(6)

where $V_{u}^{e}=1-{\left(1+\frac{P_{u}h_{u}^{e}}{w_{u}N0}\right)^{-2}}$ , and $\delta_{u}$ represents the information leakage, which describes the statistical independence between the transmitted confidential message and the eavesdropper’s observation, and is measured by the total variation distance [27].

III-B Bandwidth Reservation for Different Slices

We assume that there can be multiple bandwidth reservation policies for different slices in network slicing. Given the total bandwidth of the BS, $W_{\max}$ , the bandwidth reserved for the $\tau$ -th slice is given by

W_{\tau}^{\Phi,\xi}=f_{\tau}^{\mathrm{NS}}\left(U_{\tau}^{\Phi,\xi},I_{u_{\tau% }}^{\Phi,\xi}\right)\cdot W_{\max},

(7)

where $U_{\tau}^{\Phi,\xi}$ is the number of users in the $\tau$ -th slice, $I_{u_{\tau}}^{\Phi,\xi}$ is the QoS class identifier (QCI) of the $u_{\tau}$ -th user in the $\tau$ -th slice, and $f_{\tau}^{\mathrm{NS}}\left(\cdot,\cdot\right)$ is the network function for bandwidth reservation in network slicing. Since the sum of the bandwidth reserved for all the slices equals the total bandwidth of the BS, thus

\sum_{\tau=1}^{T_{\mathrm{NS}}}f_{\tau}^{\mathrm{NS}}\left(U_{\tau}^{\Phi,\xi}% ,I_{u_{\tau}}^{\Phi,\xi}\right)=1.

(8)

where $T_{\mathrm{NS}}$ is the number of slices. Inspired by [31], the bandwidth reserved for each slice depends on the number of users in this slice and the QCI of these users, e.g.,

f_{\tau}^{\mathrm{NS}}(\cdot)=\frac{\sum_{u\in\mathcal{U}_{\tau}^{\Phi,\xi}}I_% {u_{\tau}}^{\Phi,\xi}}{\sum_{\tau=1}^{T_{\mathrm{NS}}}\sum_{u\in\mathcal{U}_{% \tau}^{\Phi,\xi}}I_{u_{\tau}}^{\Phi,\xi}}.

(9)

III-C Problem Formulation

To maximize the sum reward subject to the QoS requirements in each slice, we formulate the bandwidth allocation problem as follows,

$\displaystyle\max_{\bm{w}_{\tau}^{\Phi,\xi}}\quad$	$\displaystyle\sum_{u\in\mathcal{U}_{\tau}^{\Phi,\xi}}r_{u}^{\Phi,\xi},$	(10)
$\displaystyle\mathrm{s.t.}\quad$	$\displaystyle\sum\limits_{u\in\mathcal{U}_{\tau}^{\Phi,\xi}}w_{u}^{\Phi,\xi}% \leq W_{\tau}^{\Phi,\xi},$	(10a)
	$\displaystyle w_{u}^{\Phi,\xi}\geq 0,$	(10b)
	$\displaystyle r_{u}^{\Phi,\xi}\geq r_{\tau}^{\Phi,\xi},$	(10c)

where $\bm{w}_{\tau}^{\Phi,\xi}=[w_{1}^{\Phi,\xi},w_{2}^{\Phi,\xi},\cdots,w_{U_{\tau}% }^{\Phi,\xi}]^{\mathrm{T}}$ is the bandwidth allocated to the users, and $r_{\tau}^{\Phi,\xi}$ is the minimum threshold of the QoS required by the users. Thus, constraint (10c) guarantees the QoS of all the users.

III-D Analysis of Problem Feasibility

Given the available bandwidth constraint in (10a) and the QoS constraint in (10c), problem (10) will be infeasible when some of the users in this slice have weak channels. We denote the minimum bandwidth required to meet constraint (10c) by ${\bm{w}}_{\min}^{\Phi,\xi}=\left[{w}_{1,\min}^{\Phi,\xi},\cdots,{w}_{U_{\tau},% \min}^{\Phi,\xi}\right]^{\mathrm{T}}$ . If some users experience deep fading, leading to $\sum_{u\in\mathcal{U}_{\tau}^{\Phi,\xi}}{w}_{u,\min}^{\Phi,\xi}>W_{\tau}^{\Phi% ,\xi}$ , then problem (10) is infeasible. In this case, the BS will only schedule the users with sufficiently strong channels. Alternatively, to maximize the number of scheduled users in problem (10), we consider that the BS schedules the $K$ users with the smallest bandwidth requirement. Denote the set of scheduled users by $\mathcal{K}_{\tau}^{\Phi,\xi}$ . Then, for any $k\in\mathcal{K}_{\tau}^{\Phi,\xi}$ and $u\notin\mathcal{K}_{\tau}^{\Phi,\xi}$ , we have ${w}_{k,\min}^{\Phi,\xi}\leq{w}_{u,\min}^{\Phi,\xi}$ . After user scheduling, problem (10) can be reformulated as follows,

$\displaystyle\max_{\bm{w}_{\tau}^{\Phi,\xi}}\quad$	$\displaystyle\sum_{k\in\mathcal{K}_{\tau}^{\Phi,\xi}}{r_{k}^{\Phi,\xi}},$	(11)
$\displaystyle\mathrm{s.t.}\quad$	$\displaystyle\sum\limits_{k\in\mathcal{K}_{\tau}^{\Phi,\xi}}{w_{k}^{\Phi,\xi}}% \leq W_{\tau}^{\Phi,\xi},$	(11a)
	$\displaystyle w_{k}^{\Phi,\xi}\geq w_{k,\min}^{\Phi,\xi}.$	(11b)

In the following, we investigate how to find the optimal solution to problem (11).

IV Hybrid-Task Meta-Learning for GNN-based Scalable Bandwidth Allocation

In this section, we first illustrate how to obtain the optimal bandwidth allocation by using an iterative optimization algorithm. Next, we utilize feature engineering techniques to reformulate the problem, and represent the bandwidth allocation policy by a GNN. To generalize the GNN, the feature of required minimum bandwidth that can be used to represent different QoS requirements is used as the GNN’s input. Then, we develop a meta-learning approach to train the GNN. The goal is to obtain a policy that is scalable to the number of users and can generalize well in diverse communication scenarios with different channel distributions, QoS requirements, and available bandwidth.

1 Initialize: Bandwidth of a resource block:

\Delta w

2 Use user scheduling algorithm to get the minimum required bandwidth for each scheduled user:

w_{k}^{\Phi,\xi}=w_{k,\min}^{\Phi,\xi},\forall k\in\mathcal{K}_{\tau}^{\Phi,\xi}

3 while $W_{\tau}^{\Phi,\xi}-\sum\nolimits_{k\in\mathcal{K}_{\tau}^{\Phi,\xi}}{w_{k}^{% \Phi,\xi}}\geq\Delta w$ do

4 for $k\in\mathcal{K}_{\tau}^{\Phi,\xi}$ do

\Delta r_{k}^{\Phi,\xi}(w_{k}^{\Phi,\xi})=r_{k}^{\Phi,\xi}(w_{k}^{\Phi,\xi}+% \Delta w)-r_{k}^{\Phi,\xi}(w_{k}^{\Phi,\xi})

6 end for

7 Identify user has highest

\Delta r_{k}^{\Phi,\xi}(w_{k}^{\Phi,\xi})

\mathcal{K}_{\tau}^{\Phi,\xi}

k_{\mathrm{allo}}=\arg\max\limits_{k}{\Delta r_{k}^{\Phi,\xi}}(w_{k}^{\Phi,\xi})

8 Allocate extra

\Delta w

bandwidth to the

k_{\mathrm{allo}}

-th user:

w_{k_{\mathrm{allo}}}^{\Phi,\xi}=w_{k_{\mathrm{allo}}}^{\Phi,\xi}+\Delta w

9 end while

Output: Optimal bandwidth allocation policy:

\bm{w}^{\Phi,\xi,\mathrm{opt}}=\bm{w}^{\Phi,\xi}

Algorithm 1 Iterative Bandwidth Allocation Algorithm

IV-A Optimal Bandwidth Allocation by Iterative Algorithm

Inspired by the optimization algorithm for resource allocation in [10], we propose an iterative optimization algorithm for solving our problems. We denote the bandwidth of each resource block by $\Delta w$ . At the beginning of the iteration, the bandwidth allocated to each user is $w_{k,\min}^{\Phi,\xi}$ . In each iteration, we calculate the incremental reward of each user when an extra resource block is allocated to it, denoted by $\Delta{r}_{k}^{\Phi,\xi}(w_{k})={r}_{k}^{\Phi,\xi}({w}_{k}^{\Phi,\xi}+\Delta{w% })-{r}_{k}^{\Phi,\xi}({w}_{k}^{\Phi,\xi})$ . Finally, the resource block is allocated to the user with the highest $\Delta{r}_{k}^{\Phi,\xi}$ . The details of the algorithm can be found in Algorithm 1. The optimality of the algorithm depends on the properties of the problems. For problem (11), if it is a convex problem, then Algorithm 1 can obtain the optimal solution [10]. To validate whether problem (11) is convex or not, we only need to validate whether $r_{u}^{\Phi,\xi}$ is concave or not. In the long blocklength regime, we can prove that the secrecy rate is concave in bandwidth. See proof in Appendix A. Since Shannon’s capacity is a special case of the secrecy rate when the wiretapped channel is in deep fading, thus Shannon’s capacity is also concave in bandwidth. In addition, the authors of [41] proved that the effective capacity is concave in bandwidth. Therefore, Algorithm 1 can obtain the optimal solution in the long blocklength regime. In the short-blocklength regime, $r_{u}^{\Phi,\xi}$ is not concave when $w_{k}^{\Phi,\xi}\in(0,\infty)$ . Nevertheless, based on the results in [42], the optimal bandwidth can be obtained in a region $[0,w_{\rm th}]\subset(0,\infty)$ , where $r_{u}^{\Phi,\xi}$ is concave in bandwidth. By searching for the optimal bandwidth in $[0,w_{\rm th}]$ , Algorithm 1 can obtain the optimal solution in the short blocklength regime.

IV-B Feature Engineering and Problem Reformulation

To obtain a policy that can generalize well in different scenarios, we propose to use feature engineering technology to represent the channels and QoS requirements with more general features. Specifically, we first normalize the bandwidth allocation policy by the bandwidth reserved for this slice. The normalized bandwidth allocated to the $k$ -th user, $k\in\mathcal{K}_{\tau}^{\Phi,\xi}$ , is given by ${\tilde{w}_{k}^{\Phi,\xi}}\triangleq{w}_{k}^{\Phi,\xi}/W_{\tau}^{\Phi,\xi}$ . Then, the normalized minimum bandwidth required by the scheduled users is denoted by $\tilde{\bm{w}}_{\tau,\min}^{\Phi,\xi}=[\tilde{w}_{1,\min}^{\Phi,\xi},\tilde{w}% _{2,\min}^{\Phi,\xi},...,\tilde{w}_{K_{\tau},\min}^{\Phi,\xi}]^{\mathrm{T}}$ . We define the surplus bandwidth as $w_{\mathrm{S}}^{\Phi,\xi}=W_{\tau}^{\Phi,\xi}-\sum_{k\in\mathcal{K}_{\tau}^{% \Phi,\xi}}w_{k}$ , and further denote the normalized surplus bandwidth by $\tilde{w_{\mathrm{S}}}^{\Phi,\xi}\triangleq w_{\mathrm{S}}^{\Phi,\xi}/W_{\tau}% ^{\Phi,\xi}$ .

We note that bandwidth allocation policy maps channels and constraints to the bandwidth allocated to each user. After scheduling and normalization, the features of the channel state information and constraints (11a) and (11b) can be represented by $\tilde{\bm{w}}_{\tau,\min}^{\Phi,\xi}$ . Therefore, the bandwidth allocation policy can be reformulated as the map** from $\tilde{\bm{w}}_{\tau,\min}^{\Phi,\xi}$ and $\tilde{w}_{\rm S}^{\Phi,\xi}$ to $\tilde{\bm{w}}^{\Phi,\xi}$ . We denote this function by

\tilde{\bm{w}}_{\tau}^{\Phi,\xi}=\bm{f}^{\rm W}\left(\tilde{\bm{w}}_{\tau,\min% }^{\Phi,\xi},\tilde{w}_{\rm S}^{\Phi,\xi}\right)

(12)

where $\bm{f}^{\mathrm{W}}(\tilde{\bm{w}}_{\tau,\min}^{\Phi,\xi},\tilde{w}_{\rm S}^{% \Phi,\xi})=\left[f_{1}^{\mathrm{W}}(\tilde{\bm{w}}_{\tau,\min}^{\Phi,\xi},% \tilde{w}_{\rm S}^{\Phi,\xi}),\cdots,f_{K}^{\mathrm{W}}(\tilde{\bm{w}}_{\tau,% \min}^{\Phi,\xi},\tilde{w}_{\rm S}^{\Phi,\xi})\right]^{\mathrm{T}}$ and $\tilde{w}_{k}^{\Phi,\xi}=f_{k}^{\mathrm{W}}(\tilde{\bm{w}}_{\tau,\min}^{\Phi,% \xi},\tilde{w}_{\mathrm{S}}^{\Phi,\xi})$ . Given the bandwidth reserved for this slice, the achievable rates of the scheduled users can be expressed as

\begin{split}\bm{r}_{\tau}^{\Phi,\xi}&=\bm{f}^{\Phi,\xi}\left(\tilde{\bm{w}}^{% \Phi,\xi}\cdot W_{\tau}^{\Phi,\xi}\right)\\ &=\bm{f}^{\Phi,\xi}\left(\bm{f}^{\mathrm{W}}\left(\tilde{\bm{w}}_{\tau,\min}^{% \Phi,\xi},\tilde{w}_{\rm S}^{\Phi,\xi}\right)\cdot W_{\tau}^{\Phi,\xi}\right)% \end{split}

(13)

where $\bm{r}_{\tau}^{\Phi,\xi}=\left[r_{1}^{\Phi,\xi},\cdots,r_{K_{\tau}}^{\Phi,\xi}% \right]^{\mathrm{T}}$ , $\bm{f}^{\Phi,\xi}(\tilde{\bm{w}}_{\tau}^{\Phi,\xi}\cdot W_{\tau}^{\Phi,\xi})=% \left[f_{1}^{\Phi,\xi}(\tilde{w}_{1}^{\Phi,\xi}\cdot W_{\tau}^{\Phi,\xi}),% \cdots,f_{K}^{\Phi,\xi}(\tilde{w}_{K_{\tau}}^{\Phi,\xi}\cdot W_{\tau}^{\Phi,% \xi})\right]^{\mathrm{T}}$ , and $r_{k}^{\Phi,\xi}=f_{k}^{\Phi,\xi}\left(f_{k}^{\mathrm{W}}(\tilde{\bm{w}}_{\tau% ,\min}^{\Phi,\xi},\tilde{w}_{\rm S})\cdot W_{\tau}^{\Phi,\xi}\right)$ . Then, we can reformulate problem (11) as a functional optimization problem,

$\displaystyle\max_{\bm{f}^{\mathrm{W}}(\cdot)}$	$\displaystyle\sum\limits_{k\in\mathcal{K}_{\tau}^{\Phi,\xi}}f_{k}^{\Phi,\xi}% \left(f_{k}^{\mathrm{W}}\left(\tilde{\bm{w}}_{\tau,\min}^{\Phi,\xi},\tilde{w}_% {\mathrm{S}}^{\Phi,\xi}\right)\cdot W_{\tau}^{\Phi,\xi}\right),$	(14)
$\displaystyle\mathrm{s.t.}\quad$	$\displaystyle 1-\sum\limits_{k\in\mathcal{K}_{\tau}^{\Phi,\xi}}f_{k}^{\mathrm{% W}}\left(\tilde{\bm{w}}_{\tau,\min}^{\Phi,\xi},\tilde{w}_{\rm S}^{\Phi,\xi}% \right)\geq 0,$	(14a)
	$\displaystyle f_{k}^{\mathrm{W}}\left(\tilde{\bm{w}}_{\tau,\min}^{\Phi,\xi},% \tilde{w}_{\rm S}^{\Phi,\xi}\right)\geq\tilde{w}_{k,\min}^{\Phi,\xi}.$	(14b)

In the rest part of this section, we will find the optimal solution to problem (14).

IV-C Proposed GNN

In this subsection, we propose a GNN-based unsupervised learning algorithm to obtain a scalable bandwidth allocation policy.

Refer to caption — Figure 1: GNN-based scalable bandwidth allocation.

1 Initialize batch size,

J

, number of training epochs,

N

, and learning rate

\beta_{\theta}

2 Randomly initialize

\theta^{0}

3 for $n=0,1,\cdots,N-1$ do

4 Message passing:

x_{k}^{n}=f_{\mathrm{FNN}}\left(\tilde{w}_{k,\min}^{\Phi,\xi},\tilde{w}_{% \mathrm{S}}^{\Phi,\xi}\Big{|}\theta^{n}\right),\forall k\in\mathcal{K}_{\tau}^% {\Phi,\xi}.

Aggregation:

\bm{x}^{n}=f_{\mathtt{Concat}}\left(x_{1}^{n},\cdots,x_{K}^{n}\right)=\left[x_% {1}^{n},\cdots,x_{K}^{n}\right]^{\mathrm{T}}

and

\bm{y}^{n}=f_{\mathtt{Softmax}}(\bm{x}^{n}).

Readout:

\bm{\tilde{w}}^{n}=f_{\mathtt{Readout}}(\bm{y}^{n})=\bm{y}^{n}\cdot\tilde{w}_{% \mathrm{S}}^{\Phi,\xi}+\bm{\tilde{w}}_{\min}^{\Phi,\xi}.

Update the loss function by eq. (15), denoted by

f^{\mathrm{L}}(\theta^{n})

. Update parameters of the GNN by SGA:

{\theta}^{n+1}={\theta}^{n}+\beta_{\theta}{\nabla_{\theta}{f^{\mathrm{L}}}(% \theta^{n})}.

5 end for

Return the parameters of the GNN as:

\theta^{\mathrm{opt}}

Algorithm 2 GNN for Scalable Bandwidth Allocation.

IV-C1 Structure of GNN

As shown in Fig. 1, the proposed GNN-based bandwidth allocation algorithm comprises three key steps: message passing, aggregation, and readout.

Message passing

Each scheduled user is a vertex in the GNN. We use a fully connected neural network (FNN) to obtain the embedding of each vertex, denoted by $x_{k},\forall k\in\mathcal{K}_{\tau}^{\Phi,\xi}$ . The inputs of each FNN include $\tilde{w}_{k,\min}^{\Phi,\xi}$ and $\tilde{w}_{\mathrm{S}}^{\Phi,\xi}=1-\sum_{k\in\mathcal{K}_{\tau}^{\Phi,\xi}}% \tilde{w}_{k,\min}^{\Phi,\xi}$ . We use $\theta$ to denote the training parameters of the FNN. In the $n$ -th epoch, the message passing function is given by $x_{k}^{n}=f_{\mathrm{FNN}}\left(\tilde{w}_{k,\min}^{\Phi,\xi},\tilde{w}_{% \mathrm{S}}^{\Phi,\xi}\Big{|}\theta^{n}\right)$ . Since the vertices are homogeneous, the training parameters of all the FNNs are the same.

Aggregation

In the aggregation step, we first aggregate the embeddings of all the scheduled users by using a concatenation function, $f_{\mathtt{Concat}}(\cdot)$ , followed by a Softmax function, $f_{\mathtt{Softmax}}(\cdot)$ , which serves as the activation function in the aggregation. The output after aggregation is denoted by $\bm{y}$ .

Readout

The GNN’s output of each vertex is updated by a readout function given by, $f_{\mathtt{Readout}}(\bm{y})=\bm{y}\cdot\tilde{w}_{\mathrm{S}}^{\Phi,\xi}+% \tilde{\bm{w}}_{\tau,\min}^{\Phi,\xi}$ . Since $\bm{y}$ is obtained from the $\mathtt{Softmax}$ function, the summation of its elements is one. From the readout function, all the surplus bandwidth is allocated to the users, and constraints (14a) and (14b) can be satisfied.

IV-C2 Unsupervised Learning

The learning algorithm is detailed in Algorithm 2. Specifically, in the $n$ -th epoch, we use our GNN to obtain the bandwidth allocation and estimate the expectation of the objective function by using the batch samples according to

\displaystyle f^{\mathrm{L}}(\theta)=\frac{1}{J}\sum\limits_{j=1}^{J}\sum% \limits_{k\in\mathcal{K}_{\tau}^{\Phi,\xi}}f_{k}^{\Phi,\xi}\left(\tilde{w}_{j,% k}^{\Phi,\xi}\cdot W_{j,\tau}^{\mathrm{NS}}\right),

(15)

where $J$ is the batch size. Then, we use stochastic gradient descent (SGA) to maximize the estimated expectation of the objective function in (14). As shown in [16], maximizing the expectation of the objective function, where the expectation is taken over channels, is equivalent to maximizing the objective function with given channels. Thus, from Algorithm 2, we can find the bandwidth allocation policy that maximizes the objective function in (14).

IV-C3 Computational Complexity

We compare the computational complexity of our GNN with the iterative algorithm introduced in Section IV-A. In cellular systems, both algorithms will be implemented in each transmission time interval with a duration of less than 1 ms. Thus, we are interested in the inference complexity of our GNN, i.e., the number of operations to be executed to obtain the bandwidth allocation in each transmission time interval.

Inference complexity of our GNN

To compute the embedding of each vertex, we need to compute the output of the FNN in Fig. 1. We denote the number of layers of the FNN by $L_{\mathrm{FNN}}$ and the number of neurons in the $\ell$ -th layer by $m_{\mathrm{FNN}}^{\ell}$ . Then, the number of multiplications required to compute the output of the $\ell$ -th layer is $m_{\mathrm{FNN}}^{\ell}\cdot m_{\mathrm{FNN}}^{\ell+1}$ and the total number of multiplications for computing the embedding is $M_{\mathrm{FNN}}=\sum\nolimits_{\ell=1}^{L_{\mathrm{FNN}}}m_{\mathrm{FNN}}^{% \ell}\cdot m_{\mathrm{FNN}}^{\ell+1}$ [10]. After obtaining the embeddings of $K$ users, the number of multiplications required by $f_{\mathtt{Softmax}}(\bm{x})$ and $f_{\mathtt{Readout}}(\bm{y})$ is $2K$ . Therefore, the inference complexity of the GNN-based bandwidth allocation policy is

O_{\mathrm{GNN}}=O(K\cdot(M_{\mathrm{FNN}}+2)).

(16)

Complexity of the iterative algorithm

In each iteration of the optimization algorithm, we assign a small portion of the normalized surplus bandwidth, denoted by $\Delta\tilde{w}$ , to a user that can maximize the objective function. The algorithm needs to compute the objective function $K$ times and find the best user. We denote the complexity for computing the objective function by $\Omega$ , then the complexity of the iterative algorithm is given by

O_{\mathrm{iter}}=O\left(K\cdot\frac{w_{\mathrm{S}}}{\Delta w}\cdot\Omega% \right),

(17)

where $w_{\mathrm{S}}/{\Delta{w}}$ represents the number of iterations used in the iterative algorithm.

Complexity comparison

To obtain bandwidth allocation in each transmission time interval, the transmitter either uses the forward propagation algorithm to compute the outcome of the GNN or executes the iterative algorithm. From eqs. (16) and (17), we can see that the computational complexity of our GNN and the iterative algorithm increase linearly with the number of users. Recall that $M_{\mathrm{FNN}}$ in eq. (16) is quite limited. In contrast, the complexity of the iterative algorithm also increases with the amount of surplus bandwidth and the resource block and thus depends on the channels of the users. In addition, the computing complexity for evaluating the objective function, denoted by $\Omega$ in eq. (17), in each iteration of the optimization algorithm could also be extremely high. Thus, the inference complexity of the GNN is much lower than the complexity of the iterative optimization algorithm.

IV-D Hybrid-Task Meta-Learning

To obtain a GNN with strong generalization ability, we propose an HML algorithm that combines multi-task learning and meta-learning.

IV-D1 Task, Sample, and Taskset

To apply the meta-learning framework, we first define tasks, samples, and tasksets in the context of bandwidth allocation problems. A task is a specific bandwidth allocation problem with a unique combination of system parameters, including the number of users, $U$ , the channel model (i.e., path loss model, shadowing, and small-scale channel fading), the QoS requirement, $r_{\tau}^{\Phi,\xi}$ , and the reserved bandwidth, $W_{\tau}^{\Phi,\xi}$ . If any of the above system parameters change, it would result in a different task. For each task, the samples correspond to the wireless channels that have been transformed into the minimum bandwidth requirement by feature engineering, as specified in constraint (14b). There are four tasksets in meta-learning, and a taskset consists of multiple tasks. We will provide their definitions in the sequel.

IV-D2 Support Set and Query Set in Meta-Training

As shown in Fig. 2(a), most meta-learning learning algorithms, such as MAML, consist of a meta-training stage and a meta-testing stage. In meta-training, there are two tasksets, support set $\mathcal{T}^{\mathrm{S}}$ and query set $\mathcal{T}^{\mathrm{Q}}$ . The tasks in the two tasksets are the same, but the samples of each task in the two tasksets are different. Specifically, we first set the initialize parameters of the GNN to $\phi$ , which is randomly initialized at the beginning of meta-training, and updated in every iteration of the meta-training. Then, we train the parameters of the GNN by using the tasks and the corresponding samples in the support set, where $\theta$ is initialized with parameters $\phi$ . Then, we update the initial parameters $\phi$ by using the tasks and the corresponding samples in the query set. We denote the initial parameters trained in meta-training of MAML by $\phi^{*}$ . The details of the MAML algorithm can be found in [37].

IV-D3 Fine-Tuning Set and Evaluation Set in Meta-Testing

To evaluate the generalization ability of the GNN, a different set of tasks that are unseen in the meta-training stage are used in meta-testing. As shown in Fig. 2(a), the tasks in meta-testing are divided into a fine-tuning set and an evaluation set, denoted by $\mathcal{T}^{\mathrm{F}}$ and $\mathcal{T}^{\mathrm{E}}$ , respectively. The tasks in $\mathcal{T}^{\mathrm{F}}$ and $\mathcal{T}^{\mathrm{E}}$ are the same, but the samples of each task in these two tasksets are different. For each new task in meta-testing, the samples from the fine-tuning set are used to fine-tune $\theta$ , which is initialized by $\phi^{*}$ obtained in meta-training. After fine-tuning, the updated GNN is tested with the samples from the evaluation set. If no sample is used to fine-tune the GNN in the meta-testing stage, we refer to this approach as zero-shot meta-learning. Otherwise, it is known as few-shot meta-learning. The meta-testing algorithm is detailed in Algorithm 3.

1 Initialize the number of training epochs,

N

, and the learning rate of the target testing task,

\beta_{\theta}

2 Select the

i

-th task from the fine-tuning set and the evaluation set:

\mathcal{T}_{i}^{\mathrm{F}}\in\mathcal{T}^{\mathrm{F}}

and

\mathcal{T}_{i}^{\mathrm{E}}\in\mathcal{T}^{\mathrm{E}}

3 Set the initialization parameters of the GNN as:

\theta^{0}=\phi^{*}

4 for $n=0,1,\cdots,N-1$ do

5 Randomly select

J

samples from task

\mathcal{T}_{i}^{\mathrm{F}}

6 Calculate loss of in the fine-tuning set according to (15), denoted by

f^{\mathrm{L,F}}(\theta^{n})

7 Update the parameters of GNN by:

\theta^{n+1}=\theta^{n}+{\beta_{\theta}}\nabla_{\theta}{f}^{\mathrm{L,F}}(% \theta^{n})

9 end for

10Randomly select

J^{\prime}

samples from task

\mathcal{T}_{i}^{\mathrm{E}}

11 Evaluate the fine-tuned policies of the

i

-th task using

\theta^{N}

\bm{w}_{i}^{\Phi,\xi}=f_{\mathrm{GNN}}\left(\tilde{\bm{w}}_{\min,i,j^{\prime}}% ^{\Phi,\xi},\tilde{w}_{\mathrm{S},i,j^{\prime}}^{\Phi,\xi}\big{|}\theta^{N}\right)

Algorithm 3 Meta-Testing

1 Randomly initialize the training parameters for all the tasks,

\phi

, the number of meta-training epochs,

M

, the learning rate of meta-training,

\beta_{\phi}

, and the learning rate of each task,

\beta_{\theta}

2 for $m=0,1,\cdots,M-1$ do

3 Select a batch of

I

tasks from the support set:

\mathcal{T}_{i}^{\mathrm{S}}\in\mathcal{T}^{\mathrm{S}},\quad i\in\{1,2,\cdots% ,I\}

4 for $i=1,2,\cdots,I$ do

5 Set the initial parameters of the GNN to

\theta_{i}^{m}=\phi^{m}.

6 for $n=0,1,\cdots,N-1$ do

7 Randomly select

J

samples from task

\mathcal{T}_{i}^{\mathrm{S}}

8 Calculate the loss function in the support set according to (15), denoted by

{f}^{\mathrm{L,S}}(\theta_{i}^{m,n})

. Update the parameters by:

\theta_{i}^{m,n+1}=\theta_{i}^{m,n}+{\beta_{\theta}}\nabla_{\theta}{f}^{% \mathrm{L,S}}(\theta_{i}^{m,n}).

9 end for

10 Select a batch of

I^{\prime}

tasks from the query set:

\mathcal{T}_{i^{\prime}}^{\mathrm{Q}}\in\mathcal{T}^{\mathrm{Q}},\quad i^{% \prime}\in\{1,2,\cdots,I^{\prime}\}

. for $i^{\prime}=1,\cdots,I^{\prime}$ do

11 Randomly select

J^{\prime}

samples from task

\mathcal{T}_{i^{\prime}}^{\mathrm{Q}},\quad j^{\prime}\in\{1,2,...,J^{\prime}\}

12 Calculate the loss function in the query task:

f_{i}^{\mathrm{L,Q}}\left(\theta_{i}^{m,N}\right)=\frac{1}{I^{\prime}}\frac{1}% {J^{\prime}}\sum\limits_{i^{\prime}=1}^{I^{\prime}}\sum\limits_{j^{\prime}=1}^% {J^{\prime}}\sum\limits_{k\in\mathcal{K}_{\tau}^{\Phi,\xi}}f_{k}^{\Phi,\xi}% \left(\tilde{w}_{i^{\prime},j^{\prime},k}^{\Phi,\xi,m}\cdot W_{\tau,j}^{% \mathrm{NS}}\right)

13 end for

15 end for

16 Calculate the loss function in meta-training:

{f}^{\mathrm{L,Meta},m}\left(\phi^{m}\right)=\frac{1}{I}\sum\limits_{i=1}^{I}{% f}_{i}^{\mathrm{L,Q}}\left(\theta_{i}^{m,N}\right)

. Update the initial parameters:

\phi^{m+1}=\phi^{m}+\beta_{\phi}\nabla_{\phi}{f}^{\mathrm{L,Meta},m}\left(\phi% ^{m}\right)

18 end for

Return the optimal initial parameters of the GNN:

\phi^{\mathrm{opt}}=\phi^{M}

Algorithm 4 Meta-Training of Hybrid-Task Meta-Learning

IV-D4 Meta-Training of Proposed HML Algorithm

Fig. 2(b) illustrates the tasks and tasksets used in the meta-training and meta-testing of the proposed HML algorithm. The difference between MAML and HML lies in the selection of tasks from the query set. In MAML, the tasks selected from the query set are identical to those selected from the support set in each meta-training epoch. To improve the generalization ability, in HML, we select different tasks from the query set to train the initial parameters of the GNN. Specifically, $I^{\prime}$ tasks are randomly selected from the query set to estimate the average loss of the GNN parameterized by $\phi^{m}$ in the $m$ -th epoch of meta-training. The step-by-step algorithm for meta-training of the proposed HML algorithm is described in Algorithm 4, and the meta-testing algorithm of HML is the same as that of MAML in Algorithm 3.

TABLE II: Key Simulation Parameters

Simulation parameters	Values
Transmit power of each user, $P_{u}$	23 dBm
Single-sided noise spectral density, $N_{0}$	-174 dBm/Hz
Channel coherence time, $T_{\mathrm{c}}$	$1$ ms [28]
Duration of one time slot, $T_{\mathrm{s}}$	$0.125$ ms
Decoding error probability, $\epsilon_{u}$	$10^{-5}$ [28]
Information leakage, $\delta_{u}$	$10^{-2}$ [28]
QoS exponent of queuing delay, $\vartheta_{u}$	$10^{-3}$ [28]
Size of bandwidth resource block, $\Delta{w}$	$10$ kHz
Learning rates, $\beta_{\theta}/\beta_{\phi}$	$10^{-4}$
Batch sizes of GNN, $J/J^{\prime}$	32
Batch sizes of meta optimizer, $I,I^{\prime}$	4, 2

TABLE III: System Parameters of Different Tasks

	Parameters	$\mathcal{T}^{\mathrm{S}}$ & $\mathcal{T}^{\mathrm{Q}}$	$\mathcal{T}^{\mathrm{F}}$ & $\mathcal{T}^{\mathrm{E}}$
Network scale	Number of users	$U_{\tau}^{\Phi,\xi}\in\{10,11,\cdots,30\}$	$U_{\tau}^{\Phi,\xi}=50$
Channel models	Path loss: $\alpha_{u}=(d_{u})^{-\gamma_{u}}$	$\gamma_{u}\in\{2,3\}$	$\gamma_{u}=4$
	Shadowing: $p_{u}^{\mathrm{S}}(\psi)=\frac{10/\ln 10}{\sqrt{2\pi}\sigma_{\psi_{\mathrm{dB}% }}\psi}\exp\left(-\frac{(10\log_{10}\psi-\mu_{\psi_{\mathrm{dB}}})^{2}}{2% \sigma_{\psi_{\mathrm{dB}}}^{2}}\right)$	$\psi_{\mathrm{dB}}\in\{3,4,5\}$	$\psi_{\mathrm{dB}}=8$
	Small-scale channels: $p_{u}^{\mathrm{I}}(z\|s,\sigma)=\frac{z}{\sigma^{2}}\exp\left(-\frac{z^{2}+s^{2% }}{2\sigma^{2}}\right)\cdot I_{0}\left(\frac{zs}{\sigma^{2}}\right)$ , $p_{u}^{\mathrm{N}}(z\|m,\sigma)=\frac{2m^{m}z^{2m-1}}{\Gamma(m){(2\sigma^{2})}^% {m}}\exp\left(-\frac{mz^{2}}{2\sigma^{2}}\right)$ , $p_{u}^{\mathrm{R}}(z\|\sigma)=\frac{z}{\sigma^{2}}\exp\left(-\frac{z^{2}}{2% \sigma^{2}}\right)$	$p_{u}^{\mathrm{I}}(z\|s,\sigma),s\in\{1\cdots 5\}$ , $p_{u}^{\mathrm{N}}(z\|m,\sigma),m\in\{2,\cdots,6\}$	$p_{u}^{\mathrm{R}}(z\|\sigma)$
QoS	Rewards with different QoS requirements	$\max\limits_{\bm{w}}\sum\limits_{u\in\mathcal{U}_{\tau}^{S,\mathcal{I}}}r_{u}^% {S,\mathcal{I}}$	$\max\limits_{\bm{w}}\sum\limits_{u\in\mathcal{U}_{\tau}^{\Phi,\xi}}r_{u}^{\Phi% ,\xi}$ , $\Phi\in\{D,E,S\}$ , $\xi\in\{\mathcal{I,F}\}$
QoS	Values of QoS constraints (Mbps)	$r_{\tau}^{S,\mathcal{I}}\in\{1,\cdots,10\}$	$r_{\tau}^{\Phi,\xi}=10$
Reserved bandwidth	Constraints on reserved bandwidth (MHz)	$W_{\tau}^{S,\mathcal{I}}\in\{10,\cdots,100\}$	$W_{\tau}^{\Phi,\xi}=100$

V Performance Evaluation

In this section, we evaluate the performance of our GNN-based HML algorithm. The GNN is first initialized by the parameters obtained from meta-training, where all the tasks aim to maximize the sum of the secrecy rate with different numbers of users and channel models. Then, we evaluate the performance of our GNN in unseen tasks with different numbers of users, channel models, objective functions, QoS constraints, and reserved bandwidth.

V-A System Setup

We consider a BS, located at $(0,0)$ m, serving multiple users randomly distributed in a rectangular area, where the coordinates of the users are denoted by $(c_{x,u},c_{y,u})$ , where $c_{x,u}$ and $c_{y,u}$ $\in[-100,100]$ . When the QoS requirement is secrecy rate, an eavesdropper is randomly located in the above rectangular area. The transmitted signal of each user is a complex Gaussian process with zero-mean and equal variance, $\sigma^{2}=1$ . Channel models include large-scale channels and small-scale channels. Specifically, the large-scale channels depend on path loss and shadowing fading, whilst small-scale channels follow Rice, Nakagami, and Rayleigh distributions with various parameters in Table III. The number of neurons in each layer of the GNN is $2/32/64/64/32/1$ . Unless otherwise mentioned, the simulation parameters are summarized in Table II, and the parameters of tasksets are defined in Table III.

V-B Performance of GNN

Fig. 3 shows the training losses when the number of users increases from $10$ to $50$ . The results show that the unsupervised learning algorithm can converge after a few hundred training epochs for different numbers of users, and the convergence time increases slightly with the number of users.

After the training stage of the unsupervised learning algorithm, we select $1000$ samples from the evaluation set of the same task to evaluate the constraint and reward achieved by the GNN in Fig. 4. The results in Fig. 4(a) show that the secrecy rates of all the scheduled users are equal to or higher than the requirement, $r_{\tau}^{S,\mathcal{I}}=10$ Mbps. The results in Fig. 4(b) show that the sum secrecy rate achieved by the GNN is close to that achieved by the iterative optimization algorithm in Section IV-A (with legend “Optimal”). In other words, the unsupervised learning algorithm can obtain a near-optimal solution.

V-C Meta-Testing Performance of HML

In this subsection, we evaluate the generalization ability of the proposed HML algorithm. The differences between tasks in meta-training and meta-testing are shown in Table. III. In meta-testing, we first select an unseen task that is not included in meta-training. In each training epoch of the meta-testing, $32$ samples are randomly selected from $\mathcal{T}^{\mathrm{F}}$ to fine-tune the GNN, whilst all the $1000$ testing samples from the same task in $\mathcal{T}^{\mathrm{E}}$ are used to evaluate the performance.

V-C1 Different Wireless Channels and QoS Requirements

In this part, we set $W_{\tau}^{S,\mathcal{I}}=100$ MHz and $r_{\tau}^{S,\mathcal{I}}=10$ Mbps for all types of services. The other parameters follow the rules in $\mathcal{T}^{\mathrm{S}}$ and $\mathcal{T}^{\mathrm{Q}}$ as shown in Table. III. We compare the initial performance and sample efficiency of HML with four benchmarks: 1) Optimal, 2) Model-agnostic meta-learning (MAML), 3) Multi-task learning-based transfer learning (MTL Transfer), and 4) Random initialization.

•

Optimal: The optimal solution is obtained by the iterative algorithm detailed in Section IV-A, and its optimality has been proved in [10].
•

MAML: MAML is one of the most widely used meta-learning algorithms, and its key ideas have been discussed in Section IV-D.
•

MTL Transfer: Transfer learning improves the sample efficiency by fine-tuning the parameters of the pre-trained GNN in a task with fewer training samples. With multi-task learning (MTL), the initial performance is much better than random initialization as the GNN is pre-trained in multiple tasks [37, 43]. To execute MTL transfer learning, we only need to replace the initialization in line 2 of Algorithm 2 by the pre-trained parameters.
•

Random Initialization: Random initialization is the conventional method that trains the GNN from scratch with a new task.

In figures 5-7, the horizontal axis represents the training epochs used to fine-tune the GNN, and $32$ samples from $\mathcal{T}^{\rm F}$ are used to train the GNN. The vertical axis represents the sum of the rewards of all the users, and the average is taken over samples, i.e., $1000$ testing samples from $\mathcal{T}^{\rm E}$ are used. We refer to it as the average sum reward.

In Fig. 5, we consider the average sum of secrecy rates and illustrate the impacts of the number of users, channel models, and coding blocklength on the initial performance and sample efficiency of different methods. The results in Fig. 5 show that HML achieves the best initial average sum secrecy rate and the highest sample efficiency compared with all the benchmarks. In Fig. 5(a), HML can converge in $8$ training epochs. Both MAML and MTL transfer learning takes more than $30$ epochs to converge. Thus, HML can reduce the convergence time by up to $73\%$ . After the fine-tuning, the gap between learning methods and the optimal solution is around $1.45$ %. In Fig. 5(b), the coding blocklength in meta-testing is also different from that in meta-training. As a result, the gap between the initial performance of HML and the optimal solution is $7.93\%$ . After fine-tuning, the gap reduced to $3.74\%$ , which is larger than the gap in Fig. 5(a), where the blocklength is the same in meta-training and meta-testing.

Fig. 6 shows the average sum of data rates achieved by different methods. The results indicate that when the reward function and the QoS constraint in meta-testing are different from that in meta-training, the gaps between the initial performance of HML and the optimal solution increase to $13.77\%$ and $14.93\%$ in long and short blocklength regimes, respectively. After fine-tuning, the gaps between the learning methods and the optimal solution are smaller than that in Fig. 5. This is because Shannon’s capacity/achievable rate are two special cases of the secrecy rate in the long/short blocklength regimes when the wiretapped channels are in deep fading. It is easier to learn a good policy when the problem becomes less complicated.

Fig. 7 shows the average sum of effective capacities achieved in the meta-testing stage, where the initial parameters of the GNN are obtained from meta-training, and the GNN is trained with tasks maximizing the sum secrecy rate in the long blocklength regime. In other words, the QoS requirement in meta-testing is queuing delay requirement, which is quite different from the security requirement in meta-training. By comparing the results in Figs. 7 and 5, we can observe that the gaps between the HML and the optimal solution in Fig. 7 are larger than the gaps in Fig. 5. Nevertheless, HML can still converge in around $10$ to $30$ epochs and outperforms the other benchmarks in Fig. 7.

V-C2 Meta-Testing with Different System Parameters

In this part, we focus on secrecy rates in the long blocklength regime in both meta-training and meta-testing, and change the values of $r_{\tau}^{S,\mathcal{I}}$ , $W_{\tau}^{S,\mathcal{I}}$ , and $U_{\tau}^{S,\mathcal{I}}$ to investigate their impacts on the initial performance and sample efficiency of HML in meta-testing.

In Fig. 8, we evaluate the initial performance and sample efficiency with different $r_{\tau}^{S,\mathcal{I}}$ in support sets and query sets in meta-training. Specifically, we set $r_{\tau}^{S,\mathcal{I}}$ to $10$ Mbps and $1$ Mbps in meta-training in Figs. 8(a) and 8(b), respectively. In Fig. 8(c), $r_{\tau}^{S,\mathcal{I}}$ is randomly selected from the set $\{1,\cdots,10\}$ Mbps in meta-training. In meta-testing, we increase $r_{\tau}^{S,\mathcal{I}}$ from $1$ Mbps to $10$ Mbps. The results in Figs. 8(a) and 8(b) indicate that the gaps between zero-shot learning (with $0$ training epochs in meta-testing) and the optimal solution increase with the difference between $r_{\tau}^{S,\mathcal{I}}$ in meta-training and $r_{\tau}^{S,\mathcal{I}}$ in meta-testing. To increase the generalization ability, we can increase the diversity of tasks in meta-training as shown in Fig. 8(c). In this way, our GNN is near-optimal with zero-shot learning.

In Fig. 9, we validated the generalization ability of our GNN with dynamic bandwidth $W_{\tau}^{S,\mathcal{I}}$ . In meta-training, $W_{\tau}^{S,\mathcal{I}}$ is randomly selecting from the set $\{10,\cdots,100\}$ MHz. In meta-testing, we increase $W_{\tau}^{S,\mathcal{I}}$ from $10$ to $100$ MHz. The results in Fig. 9 show that our GNN is near-optimal with different values of $W_{\tau}^{S,\mathcal{I}}$ . In Fig. 10, we further validate the generalization ability of our GNN with different numbers of users. In meta-training, the number of total users is randomly selected, $U_{\tau}^{S,\mathcal{I}}\in\{10,11,...,30\}$ . In meta-testing, we increase the number of total users from $5$ to $50$ . The results in Fig. 10 show that the proposed HML can obtain a GNN that has strong generalization ability with different numbers of users. The gap between the GNN and the optimal policy increases slightly with $U_{\tau}^{S,\mathcal{I}}$ . This is because the scale of the problem increases with $U_{\tau}^{S,\mathcal{I}}$ , and it is more difficult to learn the bandwidth allocation policy of a large-scale problem compared with that of a small-scale problem.

VI Conclusion

In this paper, we developed an HML approach to train a GNN-based scalable bandwidth allocation policy that can generalize well in various communication scenarios, including different number of users, wireless channels, QoS requirements, and bandwidth. The main idea is to train the initial parameters of the GNN with various tasks in meta-training, and then fine-tune the parameters with a few samples in meta-testing. Simulation results showed that the performance gap between the GNN and the optimal policy obtained by an iterative algorithm is less than $5$ % in most of the cases. For unseen communication scenarios, the GNN can converge in $10$ to $30$ training epochs, which are much faster than the existing benchmarks. Our approach can be extended beyond bandwidth allocation, such as power allocation, precoding, and repetitions. Nevertheless, the featuring engineering and the structure of GNN in other scenarios deserve further investigation.

Appendix A Proof of Concavity for Secrecy Rate in Long Blocklength Regimes

To prove the concavity of the secrecy rate in long blocklength regimes, we only need to prove that the second derivative of the secrecy rate is positive. We first calculate the partial derivative of the secrecy rate of the $k$ -th scheduled user as follows,

\begin{split}\frac{\partial r_{k}^{S,\mathcal{I}}(w_{\tau,k}^{D,\mathcal{I}})}% {\partial{w_{\tau,k}^{D,\mathcal{I}}}}=&\frac{\partial\left(r_{k}^{D,\mathcal{% I}}(w_{\tau,k}^{D,\mathcal{I}})-r_{k}^{e,\mathcal{I}}(w_{\tau,k}^{D,\mathcal{I% }})\right)}{\partial w_{k}}\\ =&\frac{-\zeta_{k}+\ln\left(1+\frac{\zeta_{k}}{w_{\tau,k}^{D,\mathcal{I}}}% \right)(w_{\tau,k}^{D,\mathcal{I}}+\zeta_{k})}{\ln(2)(w_{\tau,k}^{D,\mathcal{I% }}+\zeta_{k})}\\ &-\frac{-\zeta_{k}^{e}+\ln\left(1+\frac{\zeta_{k}^{e}}{w_{\tau,k}^{D,\mathcal{% I}}}\right)(w_{\tau,k}^{D,\mathcal{I}}+\zeta_{k}^{e})}{\ln(2)(w_{\tau,k}^{D,% \mathcal{I}}+\zeta_{k}^{e})}\\ =&\frac{\ln\left(\frac{w_{\tau,k}^{D,\mathcal{I}}+\zeta_{k}}{w_{\tau,k}^{D,% \mathcal{I}}+\zeta_{k}^{e}}\right)}{\ln(2)}\\ &+\frac{(\zeta_{k}^{e}-\zeta_{k})w_{\tau,k}^{D,\mathcal{I}}}{\ln(2)(w_{\tau,k}% ^{D,\mathcal{I}}+\zeta_{k})(w_{\tau,k}^{D,\mathcal{I}}+\zeta_{k}^{e})},\end{split}

(18)

where $\zeta_{k}={P_{k}h_{k}}/{N0}$ and $\zeta_{k}^{e}={P_{k}h_{k}^{e}}/{N0}$ . Since the secrecy rate of the user increases with the increasing of the allocated bandwidth, we have ${\partial r_{k}^{S,\mathcal{I}}(w_{\tau,k}^{D,\mathcal{I}})}/{\partial{w_{\tau% ,k}^{D,\mathcal{I}}}}<0$ . The second derivative of $r_{k}^{S,\mathcal{I}}(w_{\tau,k}^{D,\mathcal{I}})$ can be derived as follows,

\begin{split}\frac{\partial^{2}r_{k}^{S,\mathcal{I}}(w_{\tau,k}^{D,\mathcal{I}% })}{\partial{w_{\tau,k}^{D,\mathcal{I}}}^{2}}=&\frac{\partial}{\partial w_{% \tau,k}^{D,\mathcal{I}}}\left(\frac{\partial r_{k}^{S,\mathcal{I}}(w_{\tau,k}^% {D,\mathcal{I}})}{\partial{w_{\tau,k}^{D,\mathcal{I}}}}\right)\\ =&\dfrac{(\zeta_{k}^{e}-\zeta_{k})\left((\zeta_{k}^{e}+\zeta_{k})w_{\tau,k}^{D% ,\mathcal{I}}+2\zeta_{k}^{e}\zeta_{k}\right)}{\ln(2)(w_{\tau,k}^{D,\mathcal{I}% }+\zeta_{k})^{2}(w_{\tau,k}^{D,\mathcal{I}}+\zeta_{k}^{e})^{2}}.\end{split}

(19)

For any scheduled user, we have $\zeta_{k}>\zeta_{k}^{e}$ . Thus, ${\partial^{2}r_{k}^{S,\mathcal{I}}(w_{\tau,k}^{D,\mathcal{I}})}/{\partial{w_{% \tau,k}^{D,\mathcal{I}}}^{2}}<0$ . Therefore, $r_{k}^{S,\mathcal{I}}(w_{\tau,k}^{D,\mathcal{I}})$ is concave. This completes the proof. $\square$

References

[1] X. Hao, P. L. Yeoh, Y. Liu, C. She, B. Vucetic, and Y. Li, “Graph neural network-based bandwidth allocation for secure wireless communications,” in Proc. 2023 IEEE Int. Conf. Commun. Workshops (ICC workshops), Rome, Italy, 2023, pp. 332–337.
[2] M. Z. Chowdhury, M. Shahjalal, S. Ahmed, and Y. M. Jang, “6G wireless communication systems: Applications, requirements, technologies, challenges, and research directions,” IEEE Open J. Commun. Soc., vol. 1, pp. 957–975, 2020.
[3] Y. Gu, C. She, Z. Quan, C. Qiu, and X. Xu, “Graph neural networks for distributed power allocation in wireless networks: Aggregation over-the-air,” IEEE Trans. Wireless Commun., Early access.
[4] J. Guo and C. Yang, “Learning power allocation for multi-cell-multi-user systems with heterogeneous graph neural networks,” IEEE Trans. Wireless Commun., vol. 21, no. 2, pp. 884–897, Feb. 2022.
[5] R. D-Mohammady, M. Y. Naderi, and K. R. Chowdhury, “Spectrum allocation and QoS provisioning framework for cognitive radio with heterogeneous service classes,” IEEE Trans. Wireless Commun., vol. 13, no. 7, pp. 3938–3950, Jul. 2014.
[6] B. Han, V. Sciancalepore, X. Costa-Pérez, D. Feng, and H. D. Schotten, “Multiservice-based network slicing orchestration with impatient tenants,” IEEE Trans. Wireless Commun., vol. 19, no. 7, pp. 5010–5024, Jul. 2020.
[7] L. Zanzi, V. Sciancalepore, A. Garcia-Saavedra, H. D. Schotten, and X. Costa-Pérez, “LACO: A latency-driven network slicing orchestration in beyond-5G networks,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp. 667–682, Jan. 2021.
[8] Y. Yuan, G. Zheng, K. -K. Wong, B. Ottersten, and Z. -Q. Luo, “Transfer learning and meta learning-based fast downlink beamforming adaptation,” IEEE Trans. Wireless Commun., vol. 20, no. 3, pp. 1742–1755, Mar. 2021.
[9] J. Zhang, Y. Yuan, G. Zheng, I. Krikidis, and K. -K. Wong, “Embedding model-based fast meta learning for downlink beamforming adaptation,” IEEE Trans. Wireless Commun., vol. 21, no. 1, pp. 149–162, Jan. 2022.
[10] R. Dong, C. She, W. Hardjawana, Y. Li, and B. Vucetic, “Deep learning for radio resource allocation with diverse quality-of-service requirements in 5G,” IEEE Trans. Wireless Commun., vol. 20, no. 4, pp. 2309–2324, Apr. 2021.
[11] H. Lee, J. Park, S. H. Lee, and I. Lee, “Message-passing based user association and bandwidth allocation in HetNets with wireless backhaul,” IEEE Trans. Wireless Commun., vol. 22, no. 1, pp. 704–717, Jan. 2023.
[12] Q. Xu, Z. Su, D. Fang, and Y. Wu, “Hierarchical bandwidth allocation for social community-oriented multicast in space-air-ground integrated networks,” IEEE Trans. Wireless Commun., vol. 22, no. 3, pp. 1915–1930, Mar. 2023.
[13] K. B. Letaief, Y. Shi, J. Lu, and J. Lu, “Edge artificial intelligence for 6G: Vision, enabling technologies, and applications,” IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 5–36, Jan. 2022.
[14] C. She, C. Sun, Z. Gu, Y. Li, C. Yang, H. V. Poor, B. Vucetic, “A tutorial on ultrareliable and low-latency communications in 6G: Integrating domain knowledge into deep learning,” Proc. IEEE, vol. 109, no. 3, pp. 204–246, Mar. 2021.
[15] D. He, C. Liu, H. Wang, and T. Q. S. Quek, “Learning-based wireless powered secure transmission,” IEEE Wireless Commun. Lett., vol. 8, no. 2, pp. 600–603, Apr. 2019.
[16] C. Sun, C, She, and C. Yang, “Unsupervised deep learning for optimizing wireless systems with instantaneous and statistic constraints” in Ultra-reliable and low-latency communications (URLLC) theory and practice: Advances in 5G and beyond, 1st ed. Hoboken, NJ, USA: John Wiley&Sons, Ltd. 2023, ch. 4, pp. 85–118.
[17] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in Proc. Int. Conf. Mach. Learn. (ICML), Sydney, Australia, pp. 1263–1272, Apr. 2017.
[18] Y. Liu, C. She, Y. Zhong, W. Hardjawana, F.-C. Zheng, and B. Vucetic, “Interference-limited ultra-reliable and low-latency communications: Graph neural networks or stochastic geometry?,” 2022, arXiv:2207.06918.
[19] Y. Shen, Y. Shi, J. Zhang, and K. B. Letaief, “Graph neural networks for scalable radio resource management: Architecture design and theoretical analysis,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 101–115, Jan. 2021.
[20] J. Guo and C. Yang, “Deep neural networks with data rate model: Learning power allocation efficiently,” IEEE Trans. Commun., vol. 71, no. 3, pp. 1447–1461, Mar. 2023.
[21] C. Guo, L. Liang, and G. Y. Li, “Resource allocation for vehicular communications with low latency and high reliability,” IEEE Trans. Wireless Commun., vol. 18, no. 8, pp. 3887–3902, Aug. 2019.
[22] D. Wu and R. Negi, “Effective capacity: a wireless link model for support of quality of service,” IEEE Trans. Wireless Commun., vol. 2, no. 4, pp. 630-–643, Jul. 2003.
[23] J. Tang and X. Zhang, “Quality-of-service driven power and rate adaptation over wireless links,” IEEE Trans. Wireless Commun., vol. 6, no. 8, pp. 3058–3068, Aug. 2007.
[24] W. Yu, A. Chorti, L. Musavian, H. Vincent Poor, and Q. Ni, “Effective secrecy rate for a downlink NOMA network,” IEEE Trans. Wireless Commun., vol. 18, no. 12, pp. 5673–5690, Dec. 2019.
[25] H. Yang, Z. Xiong, J. Zhao, D. Niyato, L. Xiao, and Q. Wu, “Deep reinforcement learning based intelligent reflecting surface for secure wireless communications,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp. 375–388, Jan. 2021.
[26] C. Liu, J. Lee, and T. Q.S. Quek, “Safeguarding UAV communications against full-duplex active eavesdropper,” IEEE Trans. Wireless Commun., vol. 18, no. 6, pp. 2919–2931, Jun. 2019.
[27] H. -M. Wang, Q. Yang, Z. Ding, and H. V. Poor, “Secure short-packet communications for mission-critical IoT applications,” IEEE Trans. Wireless Commun., vol. 18, no. 5, pp. 2565–2578, May 2019.
[28] C. Li, C. She, N. Yang, and T. Q. S. Quek, “Secure transmission rate of short packets with queueing delay requirement,” IEEE Trans. Wireless Commun., vol. 21, no. 1, pp. 203–218, Jan. 2022.
[29] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2307–2359, May 2010.
[30] M. Alsenwi, N. H. Tran, M. Bennis, S. R. Pandey, A. K. Bairagi, and C. S. Hong, “Intelligent resource slicing for eMBB and URLLC coexistence in 5G and beyond: A deep reinforcement learning based approach,” IEEE Trans. Wireless Commun., vol. 20, no. 7, pp. 4585–4600, Jul. 2021.
[31] G. Sun, Z. T. Gebrekidan, G. O. Boateng, D. A.-Mensah, and W. Jiang, “Dynamic reservation and deep reinforcement learning based autonomous resource slicing for virtualized radio access networks,” IEEE Access, vol. 7, pp. 45758–45772, 2019.
[32] T. T. Do, T. J. Oechtering, S. M. Kim, M. Skoglund, and G. Peters, “Uplink waveform channel with imperfect channel state information and finite constellation Input,” IEEE Trans. Wireless Commun., vol. 16, no. 2, pp. 1107–1119, Feb. 2017.
[33] Y. Lu, P. Cheng, Z. Chen, W. H. Mow, Y. Li and B. Vucetic, “Deep multi-task learning for cooperative NOMA: System design and principles,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 61–78, Jan. 2021.
[34] M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, N. Freitas, “Learning to learn by gradient descent by gradient descent,” in Proc. 30th Conf. Neural Inf. Process. Syst. (NIPS 2016), Barcelona, Spain.
[35] A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,” 2018, arXiv:1803.02999.
[36] A. Raghu, M. Raghu, S. Bengio, and O. Vinyals, “Rapid learning or feature reuse? Towards understanding the effectiveness of MAML,” in Proc. Int. Conf. Learn. Representations (ICLR), Apr. 2020.
[37] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn. (ICML), Sydney, Australia, pp. 1126–1135, 2017.
[38] L. Huang, L. Zhang, S. Yang, L. P. Qian, and Y. Wu, “Meta-learning based dynamic computation task offloading for mobile edge computing networks,” IEEE Commun. Lett., vol. 25, no. 5, pp. 1568–1572, May 2021.
[39] Y. Wang, M. Chen, Z. Yang, W. Saad, T. Luo, S. Cui, H. V. Poor, “Meta-reinforcement learning for reliable communication in THz/VLC wireless VR networks,” IEEE Trans. Wireless Commun., vol. 21, no. 9, pp. 7778–7793, Sept. 2022.
[40] A. Goldsmith, Wireless communications. Cambridge, U.K.: Cambridge Univ. Press, 2005.
[41] C. Xiong, G. Y. Li, Y. Liu, Y. Chen, and S. Xu, “Energy-efficient design for downlink OFDMA with delay-sensitive traffic,” IEEE Trans. Wireless Commun., vol. 12, no. 6, pp. 3085–3095, Jun. 2013.
[42] C. Sun, C. She, C. Yang, T. Q. S. Quek, Y. Li, and B. Vucetic, “Optimizing resource allocation in the short blocklength regime for ultra-reliable and low-latency communications,” IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 402–415, Jan. 2019.
[43] N. Ye, X. Li, H. Yu, L. Zhao, W. Liu, and X. Hou, “DeepNOMA: A unified framework for NOMA using deep multi-task learning,” IEEE Trans. Wireless Commun., vol. 19, no. 4, pp. 2208–2225, Apr. 2020.
[44] S. Boyd and L. Vandenberghe, Convex Optimization. New York, NY, USA: Cambridge Univ. Press, 2004.