The role of data embedding in equivariant quantum convolutional neural networks
Abstract
Geometric deep learning refers to the scenario in which the symmetries of a dataset are used to constrain the parameter space of a neural network and thus, improve their trainability and generalization. Recently this idea has been incorporated into the field of quantum machine learning, which has given rise to equivariant quantum neural networks (EQNNs). In this work, we investigate the role of classical-to-quantum embedding on the performance of equivariant quantum convolutional neural networks (EQCNNs) for the classification of images. We discuss the connection between the data embedding method and the resulting representation of a symmetry group and analyze how changing representation affects the expressibility of an EQCNN. We numerically compare the classification accuracy of EQCNNs with three different basis-permuted amplitude embeddings to the one obtained from a non-equivariant quantum convolutional neural network (QCNN). Our results show a clear dependence of classification accuracy on the underlying embedding, especially for initial training iterations. The improvement in classification accuracy of EQCNN over non-equivariant QCNN may be present or absent depending on the particular embedding and dataset used. It is expected that the results of this work can be useful to the community for a better understanding of the importance of data embedding choice in the context of geometric quantum machine learning.
I Introduction
Quantum computing holds the promise to surpass classical supercomputers in achieving polynomial and exponential speed-up when performing certain tasks [1, 2]. The recent breakthrough [3] of realizing such speed-up with state-of-the-art noisy intermediate-scale quantum (NISQ) devices [4] has drawn widespread attention and research interest to this field. One of the most active research lines in recent times is to connect quantum computation with classical machine learning (ML). In this context there exist two approaches, the first one is to study how classical ML tools can facilitate quantum information processing tasks [5, 6, 7, 8, 9]. The other is the more frequently studied direction, i.e. to use a quantum system itself to build machine learning models [10, 11] and in particular quantum neural networks (QNNs) [12, 13, 14, 15, 16, 17, 18, 19]. The central component in a prototypical QNN is a quantum circuit with single and multiple qubit gates with trainable parameters. The latter is called parametric quantum circuit (PQC) or variational quantum circuit (VQC) [20, 21, 22, 23, 24]. PQC-based QNNs have been used to design quantum analogs of well-known classical ML networks, e.g., quantum autoencoder [22], quantum convolutional neural networks (QCNNs) [25, 26, 27], quantum generative adversarial networks (QGANs) [28, 29, 30, 31, 32, 33], quantum generative diffusion models [17, 18, 19], etc. It is important to note here that the cost function obtained from the above networks is optimized using a classical optimization routine. In this sense, the quantum machine learning networks are hybrid quantum-classical networks. Despite the potential speed-up and signatures of success of quantum machine learning over classical ML models [34, 35, 36, 37, 38], there exist several limitations for QNNs, the barren plateau problem being the principal among them. The latter means that for sufficiently deep quantum neural networks the gradient of the cost function vanishes exponentially with the size of the PQC [39, 40, 41, 42]. On one hand, an arbitrary PQC ideally should have high expressibility [43] to ensure that the solution of the optimization problem is close enough to the actual solution. On the other hand, PQCs with higher expressibility are more prone to exhibiting barren plateau [39, 40]. It is therefore a crucial task to mitigate barren plateau for a practical application of quantum machine learning algorithms.
One way to improve the trainability and generalization in machine learning algorithms is to introduce inductive bias in the network, i.e. to use some prior known information about the dataset to build a problem-specific model and constrain the optimization space of the network. Particularly, geometric machine learning refers to a scheme in which the known symmetries of the dataset are used to construct a network which respects those symmetries [44]. Such a network will contain a sufficiently good solution, yet explores a smaller parameter space while training. For instance, the convolutional neural network (CNN), which is greatly successful for image recognition, is constructed by leveraging the translational symmetry of 2D images. Inspired from this, a number of recent works study the role of symmetry in improving QNN architectures [45, 46, 47, 48, 49, 50, 51]. The results show that a linear map representing the QNN can adapt to a symmetric dataset if its action on the data points commute with the action of the symmetry group (two operators and commute if or equivalently if ). These QNNs are called equivariant quantum neural networks (EQNNs), in analogy to equivariant neural networks in classical geometric learning. EQNNs have less number of trainable parameters and reduced expressibility, but they are expected to show a faster training time and improved generalization compared to a general QNN with high expressibility. Indeed, studies show that some classes of symmetry-respecting QNNs are devoid of the barren plateau problem [52, 53, 54]. Moreover, the performance of an EQNN shows improvement over a non-equivariant QNN for pattern recognition and image classification [48, 55, 56, 54].
A set of symmetry operations form an abstract group . A representation of a group is a map** where is the space of invertible linear operators that preserve the group structure of . Let us suppose we are given a classical data point with a certain symmetry . If we encode the data point using an -dimensional quantum state, then the representations of the group elements are unitary matrices. However more than one unitary representations for the same group is possible. Change of representation is induced by the change in the quantum embedding of the data. In EQNN, the action of the network on the input state must commute with representation for all group elements . As different representations have different commutator space, the choice of representation decides which PQC and measurements are to be used in the EQNN. This has been discussed in Ref. [57] where the authors conceptualize a network in which the representation of the input state is altered at the intermediate layers by applying a linear transformation on the quantum state. Also in Refs. [55, 54] the authors encode classical images using an altered-basis amplitude embedding in a way so that the resulting group representation simplifies the construction of EQNN. However, whether the resulting EQNN has a similar performance as before the basis-change is not clear. In this work, our objective is to attain a better understanding of how a change in the data embedding affects the performance of EQCNN. We present our theoretical observations about the role of embedding and the resulting representation on the construction of equivariant convolutional and pooling layers in EQCNNs. For our numerical study, we choose standard amplitude embedding of images along with other permuted-basis amplitude embedding, similar to Refs. [55, 56]. We then consider datasets in which the class labels are symmetric with respect to reflection and rotation. We compare the classification accuracy of a number of EQCNNs with different permuted basis embeddings to a general non-equivariant QCNN for image classification. These results show that the choice of embedding substantially affects the performance of the EQCNN, and can be crucial for obtaining an improvement over non-equivariant QCNNs.
The manuscript is arranged as follows. In Sec. II, we start with an introduction to QCNN and EQCNN, followed by discussing the role of representation in constructing an EQCNN. In Sec. III we describe the reflection symmetry group and its different representations used in this work, alongside presenting the unitary ansatze used in building the EQCNNs and the non-equivariant QCNN. We present our findings in Sec. IV and conclude in Sec. V.
II Equivariant quantum convolutional neural network
II.1 Quantum Convolutional Neural Network (QCNN)
Convolutional neural networks (CNNs) are classical ML models extensively used for image classification, speech recognition etc [58, 59]. They consist of a sequence of convolutional and pooling layers followed by a fully-connected layer at the end. The convolution operation can be visualized as a matrix with trainable weights, known as kernel, traversing along the height and width of the input image. At each position of the kernel, the dot product between it and the block of the input image over which it is placed is calculated to get a feature map of the input image. The trainable weights of the kernel are the same within a convolutional layer. It is common to apply a nonlinear activation function after each convolutional layer. In the pooling layer, the dimension is reduced by aggregating over areas of the feature map by taking the average or the maximum value. After application of a number of convolution and pooling layers, the feature maps are flattened to 1D and used as the input layer of a fully-connected neural network that performs the prediction. The cost function is calculated using the output nodes and the network is trained. One important aspect of CNN is that it can recognize a particular feature of an image irrespective of the translation of that feature in the image plane. Thus, CNNs respect the translational symmetry of images.
In Ref. [25], the authors proposed a quantum analogue of CNN. We show this architecture in Fig. 1. The architecture is inspired from the multiscale entanglement renormalization ansatz (MERA) representation of quantum many-body states [60]. Any quantum neural network has three components– an -qubit quantum register encoding the input quantum state , a PQC acting on , and lastly a measurement on some (or all) of the qubits. In case of QCNN, consists of a series of convolutional and pooling layers. In the convolutional layer, an -qubit () trainable convolutional ansatz is applied on all combinations of neighboring qubits, mimicking the action of a kernel in CNN. Here is a set of trainable parameters which is the same for all in the convolutional layer. In the pooling layer, from each pair of neighboring qubits one qubit is measured in a particular basis, and conditioned on the measurement outcomes a set of parametrized rotations are applied on the other qubit. These actions constitute the pooling ansatz . Similar to convolution layers the parameters are shared within a pooling layer. Note that one can also relax the translational invariance condition by using convolutional and pooling ansatze with different parameters within a layer [61]. Following the pooling layer, all the measured qubits are traced out, thus reducing the effective dimension of the system. In real quantum devices this is synonymous with ignoring the qubits in the subsequent stages after the measurement, and considering only the dynamics on the remaining qubits. The sequence of a convolutional layer followed by a pooling layer is then repeated until a small fraction of qubits are left. In analogy to the fully-connected part of a CNN, one can then apply a general PQC on the remaining qubits at the end and finally measure them to obtain the prediction. The network is then trained using a suitable loss function and optimization algorithm. Compared to a generic deep quantum neural network, in this case due to progressive qubit reduction, the structure has a shallower depth of , which is preferable for training a QNN. In [25] the authors demonstrated the utility of QCNN for topological phase recognition of quantum many-body states as well as optimization of error correcting codes. Later, this architecture of QCNN was as well studied for classical image classification in [62]. In this case, there is an additional step in which the classical images are encoded as quantum states and used as the input to the network. In this regard, two widely used embedding methods are amplitude embedding and qubit embedding– the former encode each classical information value in one of the quantum state amplitudes while the latter employs one qubit to encode one value. For large images amplitude embedding is advantageous as we need logarithmically less qubits compared to number of pixels to encode them. For smaller images, qubit embedding is preferred since the information about each pixel is more easily encoded in the single qubits. In [62], the convolution and pooling layers were applied until only a single qubit remained, on which Pauli- measurement is performed. This structure is suitable for binary classification since the expectation value of the measurement operator on a qubit can be associated with the two classes to be distinguished. Overall, good classification accuracy for simple datasets (e.g., MNIST and Fashion MNIST) can be reached. However, one must note that the state-of-the-art QCNN is still too immature to learn the complex features of general complex image datasets. Thus, in general CNNs can achieve higher classification accuracies than QCNNs.
II.2 Equivariant QCNN with label-symmetry
The architecture of CNNs respect the translational symmetry of 2D images. However, there can exist additional symmetries in a dataset. Particularly, for classical images one of the most commonly occurring symmetry is label symmetry in which the class labels of the images remain unchanged under a set of operations. For example, in MNIST dataset the labels of digits 1, 8 and 0 are reflection invariant. Let us consider a binary-classification task for a dataset with images and corresponding class labels where . There is a function that maps the images to their labels. The task of the QCNN is to prepare a quantum circuit that closely approximates . A set of operations form a label symmetry group for if for applications of to the images, the assigned labels remain unchanged, i.e.,
(1) |
Note that the data points themselves may not be invariant under the group action, i.e., in general. The QCNN used to classify should not predict different labels for inputs related by the symmetry operations. The QCNNs that satisfy this condition are called EQCNNs. Recent works explore EQNN and its improvements over a general non-equivariant QNN for image classification [55, 56]. In particular, we use the result from [55] which shows that a PQC along with a measurement will construct an EQNN if the following condition holds,
(2) |
where is the unitary representation of . It was also proved in [49, 57] that a map and a measurement will form an EQNN if and only if
(3) |
Thus if we define the commutator space of to be the space of all operators that commute with , then and must belong to that commutator space. It is easy to see that Eq. (2) is satisfied when Eq. (3) is true.
In the context of EQCNN, can be obtained by assuring that each convolution and pooling layer is equivariant with respect to the representation in the output of the previous layer. Since in our case these layers are composed of two-qubit local variational ansatze, it is sufficient to make these ansatze equivariant with respect to the two-qubit local symmetry representations . In detail, for the layer,
(4) |
In particular, and are constituted of parameterized single-qubit and two-qubit rotational gates which can be expressed as , where is a Hermitian operator and is called the generator of that gate. In this case, Eq. (4) is satisfied if
(5) |
holds for all the generators of the single and two qubit gates used in the construction of and . Given a symmetry representation , these generators can be found using a variety of methods as discussed in Refs. [49, 48, 57].
II.3 Role of data embedding
If an image has symmetries, then the representation of that symmetry group depends on the particular way the image is encoded using qubits. In this work, we will consider amplitude embedding (AE) in which pixel values are encoded as the amplitudes of the basis states of qubits. Let us consider two simple label symmetries– reflection with respect to the vertical axis and rotation by . In standard AE the pixel values of the classical image are encoded row-wise into the amplitudes of the quantum state. Let us consider a small image for which and . For standard AE the representation of reflection group is and that for the rotation group is , where is the qubit identity operator and is the Pauli- operator. However, if one modifies the order in which the pixels are encoded, in a way presented in the table in Fig. 2, then the representation for the reflection symmetry group becomes and that for the rotation symmetry group becomes .
In a more general scenario, let us suppose is a matrix that acts on the standard amplitude-encoded input state and alters the order in which pixel values are encoded. Thus has to be a permutation matrix which permutes the coefficients of the canonical basis states of , or equivalently permutes the basis states. For input state with standard AE the group representation is , while for input state with basis-permuted AE, it is . If and are the states obtained after applying the corresponding symmetry transformations, then the relation between them can be summarized as that in Fig. 3. From this we can write the following–
(6) |
In other words, and are related by a basis permutation and they are two equivalent representations of the same group. In the same way, the unitary operators in the commutator space of are related by a basis permutation to the unitary operators in the commutator space of .
However, the architecture of EQCNNs is a special case of QCNN differing in the following aspects from a general QNN.
-
1.
is composed of -qubit local unitary ansatze which are equivariant with respect to the locally acting components of the full symmetry representation. Whether these locally equivariant ansatze can realize the full set of globally equivariant ansatze depends on the particular group and its representations [63].
-
2.
Due to translational symmetry, in one particular layer of EQCNN all local ansatze must be the same. However, the local symmetry representations may not be same for all groups of qubits. Let us suppose that in the layer there are distinct local representations () and the corresponding sets of equivariant generators are . Then the local ansatze in this layer must be generated from the elements which are common to all . Thus, the translation symmetry hinders the use of the full set of equivariant generators with respect to the local symmetries and reduces the expressibility of the network. In other words, the EQCNN is capable of taking advantage of only some of the local symmetries that exist in groups of qubits and not the global or more-than- qubit symmetries.
-
3.
This dependence on local symmetries becomes more significant after each pooling layer when some of the qubits are traced out. It means the local ansatze are not applied on all the local groups of qubits.
From the discussion above, it is evident that EQCNN ansatze depend on the local symmetries and the reduced symmetries in the subsequent layers. Thus, a change in representation is expected to vary the expressibility. We note here again that the changes in representation and the expressibility is accompanied with a basis permutation of the input state itself. Though the set of input states are different for different representations, the correlations between images related by a symmetry operation remain unchanged.
There are a few earlier works where particular basis-permuted embedding was used in order to facilitate construction of the ansatze that satisfy Eq. (3) [55, 54]. In this work, our aim is to compare the non-equivariant QCNN with standard AE to EQCNNs with a number of basis-permuted AEs. In order to do this, we first choose a particular and then we find the corresponding to be applied to the standard amplitude encoded state to get the basis-permuted embedding . Observing that the matrices that are related by a similarity transformation have the same eigenvalues, we can write
(7) |
where and are matrices constituted of eigenvectors of respectively and . Thus, the desired value of is
(8) |
In the next section we discuss in detail the architecture of our EQCNN and non-equivariant QCNN for different embeddings.
Dataset | # Qubits | Embedding | Ansatz | M | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
U1 |
V1 |
U2 |
V2 |
U3 |
V3 |
U4 |
V4 |
||||
Fashion MNIST | 8 |
AE |
1 | 6 | 1 | 6 | 4 | 7 | - |
- |
|
AE 1 |
1 | 6 | 2 | 6 | 3 | 6 | - |
- |
|||
AE 2 |
1 | 7 | 5 | 9 | 5 | 9 | - |
- |
|||
Cifar10/ Blood MNIST | 10 |
AE/AE1 |
1 | 6 | 1 | 7 | 1 | 9 | 4 |
8 |
|
AE 1/AE |
1 | 6 | 2 | 6 | 3 | 6 | 3 |
6 |
|||
AE 2 |
1 | 7 | 5 | 9 | 5 | 9 | 5 |
9 |
.
III EQCNN with different data embeddings
We consider classification of images whose labels remain invariant under a reflection about the vertical axis and rotation. In both cases, the underlying groups have only two group elements– the identity operation which keeps the image unchanged, and the operation corresponding to reflection or rotation. They form the abstract group . In our experiments, for the reflection-symmetric images we choose to employ the classes 0 (tshirt/top) and 1 (trouser) of Fashion MNIST dataset, and classes 1 (car) and 2 (bird) of Cifar10 dataset. For rotationally symmetric images, we perform classification between classes 1 and 6, as well as between classes 4 and 5 of Blood MNIST dataset. The Fashion MNIST dataset is a collection of 70000 greyscale images of different clothing articles and accessories. Each image has dimension pixels and the dataset has 10 classes in total. The Cifar10 dataset has 60000 RGB images of animals and vehicles divided into 10 classes, each image having dimension pixels. The Blood MNIST dataset is a collection of pixels RGB images of human Blood cells divided into 8 classes. Each class denotes a cell type and contains a few hundred to about a thousand of images. We downsize the Fashion MNIST images to pixels and encode them using 8 qubits. Regarding the Cifar10 and Blood MNIST images, we keep the dimension unchanged but they are transformed from RGB to greyscale and we use 10 qubits for the embedding. In all cases, we use two-qubit convolutional ansatze and assume periodicity in the qubit register i.e. the first and the last qubits are nearest neighbours.
For the non-equivariant QCNN we use standard amplitude embedding. For constructing , it is possible to choose from a large number of non-equivariant convolutional and pooling ansatze, which may affect the overall performance of the QCNN. In this work, we build the two-qubit convolutional ansatze by applying a parametrized arbitrary rotation on each qubit and a gate to entangle them, as shown in circuit 3 in Fig. 4, where . As the pooling ansatze we use parametrized controlled and controlled rotations on the target qubit, when the control qubit is in computational basis states and respectively. This is presented as circuit 9 in Fig. 4. In the last step we measure the remaining qubit 1 in basis.
For the EQCNN, we use standard AE along with two different basis-permuted embeddings, namely basis-permuted AE 1 and basis-permuted AE 2.
III.0.1 Standard AE
In standard amplitude embedding of 2D square images using qubits, the representation of the reflection group is
(9) |
where is the -qubit identity operator, and and denote respectively the identity operator and Pauli- operator applied on qubit. The second element of corresponding to reflection operation, leaves the first half qubits unchanged and flips the second half. This is because, in the standard AE of square images, the amplitudes related to a single row span all the bases with fixed values for the first half qubits and all the possible combinations of and for the second half qubits. For a rectangular image, the group representation becomes,
(10) |
Similarly, one can check that the representation of rotation group for standard AE is,
(11) |
III.0.2 Basis-permuted AE 1
Basis-permuted AE 1 swaps the representations of reflection and rotation groups in Eq. (9) and Eq. (11). We choose a basis-permuting matrix such that the image is reflected when is applied on all the qubits. Thus the representation of reflection group becomes,
(12) |
For images with rotational symmetry, we apply to permute the basis vectors so that the representation of the rotation group becomes,
(13) |
III.0.3 Basis-permuted AE 2
We permute the basis vectors in a way such that the representation of both the reflection and rotation group becomes
(14) |
i.e., the identity operator and Pauli-X operator are applied alternatively on the qubits. The matrix , however, is different for the two groups in this case.
Now we describe in detail the construction of the EQCNNs in these three cases. We try to use the largest possible set of equivariant generators in each case for constructing the equivariant convolutional and pooling ansatze, this is summarized in Table 1. However, we also investigate a scenario in which a subset of equivariant generators is used. All the convolutional and pooling ansatze used in this work are presented in Fig. 4.
Let us consider the standard AE for reflection group for which the representation is given by Eq. (9). In the first layer of the EQCNN, equivariant convolutional and pooling layers can be built if and commutes with all the possible two-qubit local group representations within which are . To satisfy the commuting condition, we use circuit 1 in Fig. 4(a) for and the circuit 6 in Fig. 4(b) for (note that the control qubit is in the basis). One can check that these are the only ansatze that commutes with all local representations in this layer. In circuit 6, we have used two gates with different parameters to draw an analogy with the other pooling circuits, however one can merge them into a single parametrized gate to be applied when the control qubit is either in or . Afterwards, we trace out the even-indexed qubits leaving the reduced representation for when considering the Cifar10 dataset, and for when considering the Fashion MNIST dataset. The two-qubit local representations for the convolutional ansatze remain unchanged, thus in the second layer we still use circuit 1 for . In the second pooling layer, we will trace out qubits 3 and 7 and apply conditional rotations on qubits 1 and 5. The local representations are for , and for . For the former, both circuit 7 and circuit 8 are equivariant generators, however since we can use only one ansatze in a pooling layer, we choose circuit 7 without loss of generality. For we use circuit 6 as . In the third layer, the reduced representations are for and for . For the former, we use the circuit 1 as . For pooling, we trace out qubit 5 by applying conditional rotation on qubit 1, thus should commute with . Here, we are free to use any pooling ansatze, so we choose to use the same one as in the non-equivariant QCNN, i.e. circuit 9. For , should commute with , thus both circuit 1 and circuit 4 are equivariant ansatze. Since we have already used circuit 1 in previous layers, we use circuit 4 for and the circuit 7 for . At this point for we measure the remaining qubit 1 in basis to get the result. For instead, we still have two qubits left with the reduced representation being . We apply a further layer with circuit 4 as . For the pooling layer, this time we choose circuit 8 as . We measure the remaining qubit 1 in basis.
For rotation group with standard AE, at every layer the reduced representation is a tensor product of Pauli-X matrices. Thus, and must commute with for all layers. The convolutional ansatze that satisfies the latter are circuit 1, circuit 2 and circuit 3 in Fig. 4(a). Therefore, we use circuit 1 for , circuit 2 for , and circuit 3 for and . For pooling ansatze, we are left with the unique choice of circuit 6 for at every layer. At the end, to obtain the prediction, we measure the remaining qubit 1 in the basis.
For basis-permuted AE 1, we just swap the EQCNN structure for reflection and rotation group discussed above.
For basis-permuted AE 2, in the first layer, must commute with and must commute with , thus we use respectively circuit 1 and circuit 7. In all the subsequent layers, the reduced group representation is simply , being the number of qubits in each layer. Since no equivariance constraint is imposed, we use the circuit 5 for and the circuit 9 for for . In the end, we measure the remaining qubit in the basis to get the result.
Note that, one can obtain the same EQCNN for symmetry representations in Eq. (9) and Eq. (14) by changing the qubits on which the first pooling layer acts. To elaborate, one can choose to trace out qubits in the first pooling layer when the representation is that in Eq. (9). However, we consider a more realistic scenario in which the two-qubit gate connectivity between the qubits is constrained due to the underlying architecture of the quantum hardware. In our case, we implicitly assume that the gate connectivity is exactly the one shown in Fig. 1. Therefore, our convolutional and pooling layers act on different set of qubits for these two representations.
IV Results
We have used Pennylane [64] quantum simulator to implement the networks. In this work we consider only binary classification. For each dataset we choose a batch size , which is the number of randomly sampled images used for one training iteration. As loss function, we use mean squared error (MSE) which is defined as
(15) |
where is the expectation value of the measurement operator for the input quantum state in the batch. Since the expectation values of Pauli operators lies in the range , the class labels corresponding to the two classes are mapped to for all the datasets. We use Nesterov moment optimizer for training with a learning rate 0.01. We train the network for 2000 or 3000 iterations depending on the dataset. This scheme is run for 10 instances in parallel, for each of which the parameters are randomly initialized. We calculate the average test accuracies and standard deviations over these parallel runs after each 10 iterations.
First, we discuss the results when the maximum possible set of equivariant generators are used. In the following paragraphs, we use ‘AE 1’ and ‘AE 2’ to imply respectively basis-permuted AE 1 and basis-permute AE 2. For the Fashion MNIST dataset, the training is performed for iterations of batches composed of 32 randomly-sampled images. We plot the average test set accuracies and standard deviations for the different equivariant and non-equivariant models with increasing number of iterations in Fig. 5(a). Even though Fashion MNIST is a relatively simple greyscale dataset, the advantage of using an EQCNN over non-equivariant QCNN is clear from the plots. Standard AE and AE 2 achieve a higher accuracy in a much shorter time. The performance of AE 1 lags behind and is comparable to non-equivariant QCNN. All EQCNNs have significantly lower standard deviation compared to the non-equivariant QCNN.
In Fig. 5(b) we present the test set accuracies for Cifar10 images. In this case, the network is trained for iterations with batch size 64. We again observe a very fast convergence and higher accuracy for standard AE and AE 2 for lower number of iterations, as well as very low standard deviation. Compared to that, AE 1 shows a slower convergence, lower accuracy and high standard deviation. Overall, non-equivariant QCNN achieves highest accuracy for longer training iterations.
The results for Blood MNIST dataset are presented in Fig. 6. We train the network for 2000 iterations with batch size 32 and 64 respectively for classifying between classes and classes . Remember that in this case the representations corresponding to standard AE and AE 1 are swapped. Let us consider the classification of classes 4 and 5. The test accuracies show a similar behaviour to those of reflection-symmetric datasets wherein AE 1 and AE 2 performs better than the others for lower iterations. For longer iterations, all equivariant and non-equivariant QCNNs performs equally well. For classification of classes 1 and 6, in contrast to all other results, AE 1 has significantly lower accuracy for low number of iterations, while all other embeddings performs equally well.
Let us now see how the above trends in accuracy change when using a subset of the equivariant generators. For this, we replace all use-cases of circuit 2 and circuit 3 by circuit 1. Thus, for reflection-symmetric images, the EQCNN corresponding AE 1 is now generated only from and gates. The same applies for rotationally-equivariant EQCNN corresponding to standard AE. We also replace all use-cases of circuit 7 by circuit 8, i.e. all pooling layers are generated from and gates. The resulting behaviour is presented in Fig. 7 and Fig. 8. For reflection-symmetric images, this improves the accuracy obtained from AE 1, which either surpasses or matches closely to that obtained from non-equivariant QCNN. For both cases of Blood MNIST dataset, the accuracy of AE 1 significantly decreases.
We note that the non-equivariant convolutional ansatze has six trainable parameters compared to three trainable parameters in all the equivariant convolutional ansatze. To compare these two QCNNs with an equal number of trainable parameters, we append circuit 2 to circuit 1 to build a new convolutional ansatze with six trainable parameters, and use it in all the layers of EQCNN with AE 1 for classification of Cifar10 dataset. In Fig. 9, we present the resulting behaviour which shows a higher accuracy for EQCNN compared to the non-equivariant QCNN.
Overall, we observe that the performance of an EQCNN varies depending the classical-to-quantum embedding. In particular, when the group representation is a tensor product of Pauli-X matrices acting on all the qubits, the EQCNN has a lower accuracy. On the other hand, when the representation is a tensor product of Pauli-X matrices acting on half of the qubits and the identity operator acting on the rest half, the EQCNNs have a faster convergence and higher accuracy. For the latter, the pooling choice can make a little difference in the initial training regime, i.e. the EQCNNs show slightly different performance when the pooling layers act on different set of qubits. We note, however, that this behaviour may not show for every dataset, as evident from Fig. 5.
V Conclusion
Quantum machine learning, though currently is at its inception, has already been useful in designing novel ML-based algorithms and also has shown some advantages over classical ML. However, there are still a number of open problems when it comes to understanding how to ensure a sufficiently good performance from a QNN. Equivariant QNNs are promising candidates to improve the training and generalization of quantum machine learning algorithms. A typical application of quantum machine learning is image classification which is an ubiquitous task in many daily-life scenarios. For this task, it is possible to construct equivariant QCNN compatible with the label symmetry of images additionally respecting the general translational symmetry of 2D images. In this work, we have explored the connections between the classical-to-quantum embedding of images, the resulting representation of a symmetry group, and the structure of the EQCNN respecting that symmetry. We considered datasets of images characterized by reflection and rotation symmetry and different amplitude embeddings of these images obtained by basis-permutation. Our theoretical observations ascertained that the local representations play a crucial role in deciding the equivariant ansatze to be used and hence the expressibility of the EQCNN. Our numerical results support this by showing a largely varying test set classification accuracy corresponding to different embeddings. It will be interesting to explore if instead of using 2-qubit local ansatze, an -qubit ansatz with can reduce the dependency of the EQCNN on local symmetries. It is also possible to compare amplitude embedding with other kinds of embedding, e.g. qubit embedding and dense qubit embedding [62], to investigate their effect on the EQCNN performances. Finally, one can also run the QCNN circuits in the real quantum hardwares, for example by using the corresponding plugins provided by Pennylane. In this case, it will be interesting to see if EQCNNs have better noise robustness compared to non-equivariant QCNNs due to reduced number of parametrized gates used in the former.
Acknowledgements.
This work was supported by the European Commission’s Horizon Europe Framework Programme under the Research and Innovation Action GA n. 101070546–MUQUABIS, by the European Union’s Horizon 2020 research and innovation programme under FET-OPEN GA n. 828946–PATHOS, by the European Defence Agency under the project Q-LAMPS Contract No B PRJ- RT-989, and by the MUR Progetti di Ricerca di Rilevante Interesse Nazionale (PRIN) Bando 2022 - project n. 20227HSE83 – ThAI-MIA funded by the European Union - Next Generation EU. S.M. acknowledges financial support from PNRR MUR project PE0000023-NQSTI. S.D. and S.M. thank Paolo Braccia for useful discussions regarding some of the initial ideas.References
- Shor [1997] P. W. Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer, SIAM Journal on Computing 26, 1484 (1997), https://doi.org/10.1137/S0097539795293172 .
- Grover [1996] L. K. Grover, A fast quantum mechanical algorithm for database search, in Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’96 (Association for Computing Machinery, New York, NY, USA, 1996) p. 212–219.
- Arute et al. [2019] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. S. L. Brandao, D. A. Buell, B. Burkett, Y. Chen, Z. Chen, B. Chiaro, R. Collins, W. Courtney, A. Dunsworth, E. Farhi, B. Foxen, A. Fowler, C. Gidney, M. Giustina, R. Graff, K. Guerin, S. Habegger, M. P. Harrigan, M. J. Hartmann, A. Ho, M. Hoffmann, T. Huang, T. S. Humble, S. V. Isakov, E. Jeffrey, Z. Jiang, D. Kafri, K. Kechedzhi, J. Kelly, P. V. Klimov, S. Knysh, A. Korotkov, F. Kostritsa, D. Landhuis, M. Lindmark, E. Lucero, D. Lyakh, S. Mandrà, J. R. McClean, M. McEwen, A. Megrant, X. Mi, K. Michielsen, M. Mohseni, J. Mutus, O. Naaman, M. Neeley, C. Neill, M. Y. Niu, E. Ostby, A. Petukhov, J. C. Platt, C. Quintana, E. G. Rieffel, P. Roushan, N. C. Rubin, D. Sank, K. J. Satzinger, V. Smelyanskiy, K. J. Sung, M. D. Trevithick, A. Vainsencher, B. Villalonga, T. White, Z. J. Yao, P. Yeh, A. Zalcman, H. Neven, and J. M. Martinis, Quantum supremacy using a programmable superconducting processor, Nature 574, 505 (2019).
- Preskill [2018] J. Preskill, Quantum Computing in the NISQ era and beyond, Quantum 2, 79 (2018).
- Huang et al. [2022a] H.-Y. Huang, R. Kueng, G. Torlai, V. V. Albert, and J. Preskill, Provably efficient machine learning for quantum many-body problems, Science 377, eabk3333 (2022a), https://www.science.org/doi/pdf/10.1126/science.abk3333 .
- Martina et al. [2022] S. Martina, L. Buffoni, S. Gherardini, and F. Caruso, Learning the noise fingerprint of quantum devices, Quantum Machine Intelligence 4, 8 (2022).
- Martina et al. [2023a] S. Martina, S. Gherardini, and F. Caruso, Machine learning classification of non-markovian noise disturbing quantum dynamics, Physica Scripta 98, 035104 (2023a).
- Martina et al. [2023b] S. Martina, S. Hernández-Gómez, S. Gherardini, F. Caruso, and N. Fabbri, Deep learning enhanced noise spectroscopy of a spin qubit environment, Machine Learning: Science and Technology 4, 02LT01 (2023b).
- [9] E. Canonici, S. Martina, R. Mengoni, D. Ottaviani, and F. Caruso, Machine learning based noise characterization and correction on neutral atoms nisq devices, Advanced Quantum Technologies n/a, 2300192, https://onlinelibrary.wiley.com/doi/pdf/10.1002/qute.202300192 .
- Dalla Pozza et al. [2022] N. Dalla Pozza, L. Buffoni, S. Martina, and F. Caruso, Quantum reinforcement learning: the maze problem, Quantum Machine Intelligence 4, 11 (2022).
- Das et al. [2023] S. Das, J. Zhang, S. Martina, D. Suter, and F. Caruso, Quantum pattern recognition on real quantum processing units, Quantum Machine Intelligence 5, 16 (2023).
- Schuld et al. [2015] M. Schuld, I. Sinayskiy, and F. Petruccione, An introduction to quantum machine learning, Contemporary Physics 56, 172 (2015), https://doi.org/10.1080/00107514.2014.964942 .
- Biamonte et al. [2017] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Quantum machine learning, Nature 549, 195 (2017).
- Benedetti et al. [2019] M. Benedetti, E. Lloyd, S. Sack, and M. Fiorentini, Parameterized quantum circuits as machine learning models, Quantum Science and Technology 4, 043001 (2019).
- Perdomo-Ortiz et al. [2018] A. Perdomo-Ortiz, M. Benedetti, J. Realpe-Gómez, and R. Biswas, Opportunities and challenges for quantum-assisted machine learning in near-term quantum computers, Quantum Science and Technology 3, 030502 (2018).
- Schuld and Killoran [2022] M. Schuld and N. Killoran, Is quantum advantage the right goal for quantum machine learning?, PRX Quantum 3, 030101 (2022).
- Parigi et al. [2023] M. Parigi, S. Martina, and F. Caruso, Quantum-noise-driven generative diffusion models (2023), arXiv:2308.12013 [quant-ph] .
- Zhang et al. [2023] B. Zhang, P. Xu, X. Chen, and Q. Zhuang, Generative quantum machine learning via denoising diffusion probabilistic models (2023), arXiv:2310.05866 [quant-ph] .
- Cacioppo et al. [2023] A. Cacioppo, L. Colantonio, S. Bordoni, and S. Giagu, Quantum diffusion models (2023), arXiv:2311.15444 [quant-ph] .
- Peruzzo et al. [2014] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien, A variational eigenvalue solver on a photonic quantum processor, Nature Communications 5, 4213 (2014).
- McClean et al. [2016] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, The theory of variational hybrid quantum-classical algorithms, New Journal of Physics 18, 023023 (2016).
- Romero et al. [2017] J. Romero, J. P. Olson, and A. Aspuru-Guzik, Quantum autoencoders for efficient compression of quantum data, Quantum Science and Technology 2, 045001 (2017).
- Mitarai et al. [2018] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, Quantum circuit learning, Phys. Rev. A 98, 032309 (2018).
- Schuld et al. [2020] M. Schuld, A. Bocharov, K. M. Svore, and N. Wiebe, Circuit-centric quantum classifiers, Phys. Rev. A 101, 032308 (2020).
- Cong et al. [2019] I. Cong, S. Choi, and M. D. Lukin, Quantum convolutional neural networks, Nature Physics 15, 1273 (2019).
- Li et al. [2020] Y. Li, R.-G. Zhou, R. Xu, J. Luo, and W. Hu, A quantum deep convolutional neural network for image recognition, Quantum Science and Technology 5, 044003 (2020).
- Henderson et al. [2020] M. Henderson, S. Shakya, S. Pradhan, and T. Cook, Quanvolutional neural networks: powering image recognition with quantum circuits, Quantum Machine Intelligence 2, 2 (2020).
- Braccia et al. [2021] P. Braccia, F. Caruso, and L. Banchi, How to enhance quantum generative adversarial learning of noisy information, New Journal of Physics 23, 053024 (2021).
- Braccia et al. [2022] P. Braccia, L. Banchi, and F. Caruso, Quantum noise sensing by generating fake noise, Phys. Rev. Appl. 17, 024002 (2022).
- Rudolph et al. [2022] M. S. Rudolph, N. B. Toussaint, A. Katabarwa, S. Johri, B. Peropadre, and A. Perdomo-Ortiz, Generation of high-resolution handwritten digits with an ion-trap quantum computer (2022), arXiv:2012.03924 [quant-ph] .
- Boyle and Nikandish [2023] A. O. Boyle and R. Nikandish, A hybrid quantum-classical generative adversarial network for near-term quantum processors (2023), arXiv:2307.03269 [quant-ph] .
- Tsang et al. [2023] S. L. Tsang, M. T. West, S. M. Erfani, and M. Usman, Hybrid quantum-classical generative adversarial network for high resolution image generation (2023), arXiv:2212.11614 [quant-ph] .
- Zhou et al. [2023] N.-R. Zhou, T.-F. Zhang, X.-W. Xie, and J.-Y. Wu, Hybrid quantum–classical generative adversarial networks for image generation via learning discrete distribution, Signal Processing: Image Communication 110, 116891 (2023).
- Rebentrost et al. [2014] P. Rebentrost, M. Mohseni, and S. Lloyd, Quantum support vector machine for big data classification, Phys. Rev. Lett. 113, 130503 (2014).
- Liu et al. [2021] Y. Liu, S. Arunachalam, and K. Temme, A rigorous and robust quantum speed-up in supervised machine learning, Nature Physics 17, 1013 (2021).
- Abbas et al. [2021] A. Abbas, D. Sutter, C. Zoufal, A. Lucchi, A. Figalli, and S. Woerner, The power of quantum neural networks, Nature Computational Science 1, 403 (2021).
- Huang et al. [2022b] H.-Y. Huang, M. Broughton, J. Cotler, S. Chen, J. Li, M. Mohseni, H. Neven, R. Babbush, R. Kueng, J. Preskill, and J. R. McClean, Quantum advantage in learning from experiments, Science 376, 1182 (2022b), https://www.science.org/doi/pdf/10.1126/science.abn7293 .
- Caro et al. [2022] M. C. Caro, H.-Y. Huang, M. Cerezo, K. Sharma, A. Sornborger, L. Cincio, and P. J. Coles, Generalization in quantum machine learning from few training data, Nature Communications 13, 4919 (2022).
- McClean et al. [2018] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, and H. Neven, Barren plateaus in quantum neural network training landscapes, Nature Communications 9, 4812 (2018).
- Holmes et al. [2022] Z. Holmes, K. Sharma, M. Cerezo, and P. J. Coles, Connecting ansatz expressibility to gradient magnitudes and barren plateaus, PRX Quantum 3, 010313 (2022).
- Cerezo et al. [2021] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. Coles, Cost function dependent barren plateaus in shallow parametrized quantum circuits, Nature Communications 12, 1791 (2021).
- Wang et al. [2021] S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone, L. Cincio, and P. J. Coles, Noise-induced barren plateaus in variational quantum algorithms, Nature Communications 12, 6961 (2021).
- Sim et al. [2019] S. Sim, P. D. Johnson, and A. Aspuru-Guzik, Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms, Advanced Quantum Technologies 2, 1900070 (2019), https://onlinelibrary.wiley.com/doi/pdf/10.1002/qute.201900070 .
- Bronstein et al. [2021] M. M. Bronstein, J. Bruna, T. Cohen, and P. Veličković, Geometric deep learning: Grids, groups, graphs, geodesics, and gauges (2021), arXiv:2104.13478 [cs.LG] .
- Larocca et al. [2022] M. Larocca, F. Sauvage, F. M. Sbahi, G. Verdon, P. J. Coles, and M. Cerezo, Group-invariant quantum machine learning, PRX Quantum 3, 030341 (2022).
- Mernyei et al. [2022] P. Mernyei, K. Meichanetzidis, and İsmail İlkan Ceylan, Equivariant quantum graph circuits (2022), arXiv:2112.05261 [cs.LG] .
- Skolik et al. [2023] A. Skolik, M. Cattelan, S. Yarkoni, T. Bäck, and V. Dunjko, Equivariant quantum circuits for learning on weighted graphs (2023), arXiv:2205.06109 [quant-ph] .
- Meyer et al. [2023] J. J. Meyer, M. Mularski, E. Gil-Fuster, A. A. Mele, F. Arzani, A. Wilms, and J. Eisert, Exploiting symmetry in variational quantum machine learning, PRX Quantum 4, 010328 (2023).
- Nguyen et al. [2022] Q. T. Nguyen, L. Schatzki, P. Braccia, M. Ragone, P. J. Coles, F. Sauvage, M. Larocca, and M. Cerezo, Theory for equivariant quantum neural networks (2022), arXiv:2210.08566 [quant-ph] .
- Zheng et al. [2023] H. Zheng, Z. Li, J. Liu, S. Strelchuk, and R. Kondor, Speeding up learning quantum states through group equivariant convolutional quantum ansätze, PRX Quantum 4, 020327 (2023).
- East et al. [2023] R. D. P. East, G. Alonso-Linaje, and C.-Y. Park, All you need is spin: Su(2) equivariant variational quantum circuits based on spin networks (2023), arXiv:2309.07250 [quant-ph] .
- Pesah et al. [2021] A. Pesah, M. Cerezo, S. Wang, T. Volkoff, A. T. Sornborger, and P. J. Coles, Absence of barren plateaus in quantum convolutional neural networks, Phys. Rev. X 11, 041011 (2021).
- Schatzki et al. [2022] L. Schatzki, M. Larocca, Q. T. Nguyen, F. Sauvage, and M. Cerezo, Theoretical guarantees for permutation-equivariant quantum neural networks (2022), arXiv:2210.09974 [quant-ph] .
- West et al. [2023a] M. T. West, J. Heredge, M. Sevior, and M. Usman, Provably trainable rotationally equivariant quantum machine learning (2023a), arXiv:2311.05873 [quant-ph] .
- West et al. [2023b] M. T. West, M. Sevior, and M. Usman, Reflection equivariant quantum neural networks for enhanced image classification, Machine Learning: Science and Technology 4, 035027 (2023b).
- Chang et al. [2023] S. Y. Chang, M. Grossi, B. L. Saux, and S. Vallecorsa, Approximately equivariant quantum neural network for group symmetries in images (2023), arXiv:2310.02323 [quant-ph] .
- Ragone et al. [2023] M. Ragone, P. Braccia, Q. T. Nguyen, L. Schatzki, P. J. Coles, F. Sauvage, M. Larocca, and M. Cerezo, Representation theory for geometric quantum machine learning (2023), arXiv:2210.07980 [quant-ph] .
- LeCun et al. [2015] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature 521, 436 (2015).
- Schmidhuber [2015] J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61, 85 (2015).
- Vidal [2008] G. Vidal, Class of quantum many-body states that can be efficiently simulated, Phys. Rev. Lett. 101, 110501 (2008).
- Liu et al. [2023] Y.-J. Liu, A. Smith, M. Knap, and F. Pollmann, Model-independent learning of quantum phases of matter with quantum convolutional neural networks, Phys. Rev. Lett. 130, 220603 (2023).
- Hur et al. [2022] T. Hur, L. Kim, and D. K. Park, Quantum convolutional neural network for classical data classification, Quantum Machine Intelligence 4, 3 (2022).
- Marvian [2022] I. Marvian, Restrictions on realizable unitary operations imposed by symmetry and locality, Nature Physics 18, 283 (2022).
- Bergholm et al. [2022] V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, S. Ahmed, V. Ajith, M. S. Alam, G. Alonso-Linaje, B. AkashNarayanan, A. Asadi, J. M. Arrazola, U. Azad, S. Banning, C. Blank, T. R. Bromley, B. A. Cordier, J. Ceroni, A. Delgado, O. D. Matteo, A. Dusko, T. Garg, D. Guala, A. Hayes, R. Hill, A. Ijaz, T. Isacsson, D. Ittah, S. Jahangiri, P. Jain, E. Jiang, A. Khandelwal, K. Kottmann, R. A. Lang, C. Lee, T. Loke, A. Lowe, K. McKiernan, J. J. Meyer, J. A. Montañez-Barrera, R. Moyard, Z. Niu, L. J. O’Riordan, S. Oud, A. Panigrahi, C.-Y. Park, D. Polatajko, N. Quesada, C. Roberts, N. Sá, I. Schoch, B. Shi, S. Shu, S. Sim, A. Singh, I. Strandberg, J. Soni, A. Száva, S. Thabet, R. A. Vargas-Hernández, T. Vincent, N. Vitucci, M. Weber, D. Wierichs, R. Wiersema, M. Willmann, V. Wong, S. Zhang, and N. Killoran, Pennylane: Automatic differentiation of hybrid quantum-classical computations (2022), arXiv:1811.04968 [quant-ph] .