Deep Learning for Tuning Optical Beamforming Networks

,

The paper is organized as follows. In Section 2, we derive the mathematical model of ORR. In Section 3, we describe how to exploit the special structure of OBFNs into deep neural network representation. In Section 4, the simulation results of tuning OBFNs using deep learning are presented. Finally, some conclusions are presented in Section 5.

Mathematical Model Of Optical Ring Resonator
A simple one-input one-output single-stage ORR is illustrated in Figure 2 (Left). It consists of a ring-shaped and a straight waveguide. The parameter is the power coupling coefficient, which has a value between 0 and 1, is the round-trip length of the ring-shaped waveguide, is the round-trip period, and is the extra phase-shift due to heater on the top of the ring. The Z-transform of an ORR is illustrated in Figure 2 (Right). Let the signal at the right and left side of the ring be and respectively, then one can derive the following relations: where defines the power loss. Since the frequency response is defined as , and substituting with being the round-trip period, we obtain the equation for frequency response of an ORR: Equation (4) is the same as the equation (2.18) in [2] and equation (2.52) in [20].  Figure 3. Group delay response of (a) single ORR, (b) cascade of multiple ORRs. Figure 3 (a) shows when the group delay increases, the width of the delay curve decreases. This is due to the fact that the area under the group delay curve represents the phase shift of the ORR, which is constant for one free spectral range (FSR) [21]. This observation reveals the tradeoff between the delay value and the bandwidth. A single ORR will not be able to cover large bandwidth and high desired delays at the same time. We can use cascade of multiple ORRs to solve this problem, as illustrated in Figure 3(b). The frequency response of -stage cascade ORRs in normalized angular frequency is defined by the product of those of the individual single-stage ORRs: where is the frequency response of a single ORR in the stage .

Deep Neural Network Representation of Optical Beamforming Networks 3.1. Feed-forward Neural Network System
The OBFN structure is desired to be low-cost, scalable, has minimum number of ORRs, and able to cover wide bandwidth. Zhuang [2] found in his experiment that, for the same antenna specifications and bandwidth, asymmetrical binary-tree-structured OBFNs, illustrated in Figure 1 and its neural network representation in Figure 4, is scalable and has the least number of ORR.

Generating Training Examples
Training examples of a certain neural network consist of input vectors and their respective desired output. From Figure 4, the input to the neural network is the signal received by each antenna element. The signal received by the reference path is the desired output. In this paper, it is assumed that the signal is coming from a satellite, which is very far, and without any aberration from the atmosphere. Therefore, noise and other signal from different direction are omitted, and input signals arriving at each antenna element are parallel to each other. The inputs shown in Figure 4 (a) are in time domain. Since, we will use frequency response of ORR, it is convenient to transform the inputs into frequency domain using Fourier transformation, as follows: From equation 6, the input is frequency dependent. Since we want to obtain desired group delay response over a certain frequency range, an array of signal for a set of frequencies is needed. Given a frequency range the input array for path is defined by Consider a lossless system, i.e., power loss and gain response , then the desired output for path of the network is defined by For a lossy system, where , the magnitude (gain) response of ORR , will not be covered in this paper.

Weight Matrices
Consider an binary-tree OBFN, which is represented by a neural network whose neurons and layers, where specifies the number of frequency of interest. The input vectors are propagated through layers using weight matrices as follows: where specifies the layer index, and is the activation function, which is defined as .
Note that the structure of the weight matrices depends on the configuration of the neural network, because different neural network leads to different location of frequency response inside weight matrices. However, determining the weight matrices from a given neural network configuration is straight forward. Consider a neural network representation of OBFN shown in Figure 5, the respective weight matrices are where specifies the zero matrix, specifies the identity matrix and represents the frequency response matrix of ORR number , which is defined by Note that we should carefully consider coupling weights, in which some weights represent the same frequency response of a certain ORR, as we can observe in equation 11 and 12. Note that stochastic gradient projection [22] is used because the constraints make the gradient descent does not work. Stochastic gradient descent is an appropriate choice because of its simplicity which results in faster computation. The parameters of ORRs are updated via gradient projection methods as follows:

Non-linear Optimization
where is sufficiently small learning rate and is the projection matrix. The term and specify the gradient of the cost function with respect to and respectively, which are formulated by and (16) where specifies the -th element of weight matrix.

Backpropagation Algorithm
The non-linear optimization process using stochastic gradient projection requires the information of the gradient of the cost function with respect to all parameters in equation 16. Backpropagation algorithm is the most efficient way to find the gradient of the cost function by applying the chain rule in reverse order.

Gradient for the last layer (layer )
From equation 13, we can determine the partial derivative of the cost function with respect to , where specifies each element of matrix , as follows: where specifies each element of matrix . Using identity activation function (Equation (10)) and the chain rule, the partial derivative of the cost function with respect to is given by

Gradient for layer
Let any number in { } specifies the layer index. The partial derivative of the cost function with respect to is where { } defines the index of neuron. Then, the partial derivative of the cost function with respect to is (20) Since we should carefully consider coupling weights, i.e., some weights represent the same frequency response, equation 16 should be modified such that ∑ , and ∑ .
The remaining terms in equation (21) are the derivatives of weights (i.e., frequency response of ORR) with respect to parameters and . From the frequency response mentioned in equation 4, we obtain , and This completes the formula to find the gradient of the cost function with respect to all parameters and , which enables the implementation of gradient projection method.

Simulation 4.1. Simulation Setup
The nominal parameters of OBFN setup simulated in this project are similar to the ones used in [14]. Table I shows all parameters needed and their value. The desired delays and training examples are given. Initial guesses of all parameter κ and ϕ are similar to [14]. where specifies the number of AE, and specify the desired and actual delay response of the -th path respectively. The normalized squared group delay error is essential since it gives the comparison how big the error is compared to the desired delay response. Figure 5 shows group delay response and test error of simulation result of a OBFN with desired delay [0 0.1 0.2 0.3] ns. Table 2 shows the optimum value of and , and the initial and final normalized squared group delay error ( . These optimum parameters are similar to the result found in [14].   Figure 6 shows the group delay error for different values. Let the desired delays be ns, where is a positive real number. One interesting thing is that the error increases as the delays becomes bigger. This is expected because of the trade-off mentioned in Section 2. When the delay becomes bigger, it is as expected that few ORRs cannot provide enough delay response, which will result in error becomes bigger as well. Figure 7 show the group delay responses of OBFN, with desired delay ns where we use and respectively. We can observe that as the desired group delay increases, the ripple of the delay response will increase as well. This is why the error illustrated ini Figure 6 increases as desired delay increases. The deep learning algorithm aims to exploit the special structure of OBFN system such that it can tune large-scale OBFN setups. Figure 8 show the group delay responses of and OBFN setups. We can observe that the deep learning algorithm indeed can be used to tune larger OBFN setups.

Conclusion
Optical Beamforming Networks (OBFNs) is used to control Phased Array Antennas (PAAs) such that planes can communicate to satellites. Tuning OBFNs is a highly non-linear and complex problem. An existing solution, a non-linear programming, is limited to small-scale OBFN setups. A deep learning algorithm, which can exploit the special structure of OBFN is proposed to tune large-scale OBFN setups. The special structure of OBFNs can be represented by a deep neural network. The weight matrices are composed of frequency response of some Optical Ring Resonators (ORRs) in the respective layer. Given a certain OBFN structure, a deep learning algorithm works well to find the optimum ORRs' parameters for , , and even OBFN for any given desired delays. Another important thing is that the deep learning approach is data driven, which use measurable signal as a training examples. This is desirable because we can use real data as measurable signal, which is very essential for online tuning in future development.