Approximated computing for low power neural networks

This paper investigates about the possibility to reduce power consumption in Neural Network using approximated computing techniques. Authors compare a traditional fixed-point neuron with an approximated neuron composed of approximated multipliers and adder. Experiments show that in the proposed case of study (a wine classifier) the approximated neuron allows to save up to the 43% of the area, a power consumption saving of 35% and an improvement in the maximum clock frequency of 20%.


Introduction
Machine Learning (ML) plays an important role in several fields as health, computer vision, communications, energy management etc [1][2][3][4][5][6][7][8][9][10]. The interest in Machine Learning increased in the last few years. This was possible thanks to the availability of increasing computational power and the introduction of new technologies [11][12][13][14][15][16][17][18][19][20][21]. Also, in embedded systems, there is a growing trend in the use of ML. This is the case for example of Automotive, Security and Surveillance, Smart Home, Health Care, and IoT. For embedded systems, power consumption represents a crucial aspect [22][23][24][25]. In fact, these systems are often used under operating conditions where power supply cannot be provided by the electrical grid. In this scenario, reducing the power consumption is one of the most important design goals in order to guarantee a long service life. There are three power dissipation components in CMOS digital circuits [26]:  Switching Power  Short-circuit Power  Static Power Among these contributions, switching power represents the main one and it is defined in (1), where α is the switching activity, C is the switching capacitance, f is the clock frequency and V dd the supply voltage.
The second contribution, the short-circuit power, is related to the short-circuit currents flowing through the MOS transistors in the gate at each switching. It is strongly dependent on the parameters present in (1) (switching activity, clock frequency, and supply voltage). Finally, the static power depends on the leakage currents and it is related to the circuit design, the technology, and the supply voltage. ML is characterized by parallel computation and a consequently big area in terms of circuit size. Area impacts negatively on power consumption, with the increasing of the area there is also an increase of all the three power dissipations. There are many techniques to reduce power consumption in digital circuits that can be divided into two main categories, technological solutions, and design solutions. The first ones are based on the using of low power digital libraries, material or devices. The second ones consist of the use of design techniques both at layout level (for example power gating and/or power gating) both at RTL.

1237
This paper is focused on the use of Approximated Computing (AC) for power consumption reduction. AC is a wide spectrum of techniques that relax the accuracy of computation in order to improve speed, energy, and/or another metric of interest. AC exploits the fact that several important applications do not necessarily need to produce precise results to be useful. Research interest in approximate computing has been growing in recent years, motivated by its potential in reducing power consumption. In this paper, we analyze the possibility to reduce power consumption in embedded digital Neural Networks (NN) using approximated algebraic operators in artificial neurons. Figure 1 shows the block diagram of an artificial neuron. Such neuron is characterized by (2):

Low power Artificial Neuron Model
where x i and y represent respectively the inputs of the neuron and the output, w i the weights, b the bias and f is the activation function (typically the sigmoid function).
Hardware implementation of digital neurons requires three main blocks:  Multipliers: Used for the multiplication among the inputs and the weights  Adders: Used to sum weighted inputs and bias  ROM: Used for the implementation of the non-linear activation function In several applications, the accuracy of the NN does not depend on the activation function topology. In such cases, the sigmoid function can be replaced by simpler functions as the satlins. A complexity reduction derives from this simplification. In these cases, ROMs can be replaced by simple multiplexers and comparators as shown in Figure 2. In the light of all these simplifications, the most complex block in terms of area and power consumption are the multipliers and the adders. For this reason, this paper is focused on reducing of the power consumption by replacing multipliers and adders with approximated operators [27].

Approximated Operators
As introduced in previous sections, in all the cases when sigmoid activation function can be replaced by a simple satlins, the power consumption of the artificial neuron depends essentially on the multiplier and the adders. In the following, we provide a description of the multipliers and adders used in our experiments. The Literature offers several approximated multipliers architectures. For our experiments, we use a modified version of the Compression based multipliers proposed in [28]. This multiplier works in three steps: partial products generation, partial products reduction and finally the sum of the reduced partial products using a  Figure 3. More details about the architecture and the design of this multiplier are provided in [28]. In order to further reduce the area and consequently the power consumption, we replace the CMA with a sloppy adder [29]. This adder has been also used to realize the adder tree for the sum of the weighted inputs and the bias, as shown in Figure 4.

Case of Study a Wine Classifier
In order to verify the performace of the proposed low power artificial neuron, it was tested in a NN. The case of study is the wine classifier available in the MATLAB Neural Networks tool box. The design and the training have been realized in MATLAB. The NN is composed of two layers: a hidden layer and an output layer shown in Figure 5. The hiddel layer is composed of 10 neurons having 13 input each while the output si composed by 3 neurons connected to the 10 outputs of the first layer. Inputs are wine features: Alcohol, Malic acid, Ash, Alkalinity of ash Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanidins, Color intensity, Hue. The outputs are three different clesses of wine.
Experiments have been performed to estimate the power consumption reduction obtained by the approximated neuron model with respect to a traditional fixed-point neuron. Experiments followed this flow:  We fixed the performances in terms of classification accuracy that the implemented hardware NN must achieve (we fixed the minimum accuracy to 95%)

1239
 We designed the fixed-point architecture that assures such accuracy is respected.  We designed the approximated architecture that respects such accuracy.  The architectures obtained in point 3 and 4 are coded in VHDL and synthesized using Synopsys.  The two architectures are characterized in terms of size area and speed. To determine the number of bits required by the multipliers and the adders in the fixed-point architecture and in the approximated architecture, both the classifiers have been coded and simulated in MATLAB. The two models were developed using a tensorial representation, as in (3), in which multiplications and addition have been replaced with fixedpoint operations and approximated operators respectively for the fixed-point and the approximated models.
[ i 1 ⋮ i 13 ] * [ w 1 1 ⋯ w 1 13 ⋮ ⋱ ⋮ w 1 10 ⋯ w 10 13 Simulations show that the fixed-point models meet the accuracy specification with the following parameters:  6-bit inputs 12-bit output multipliers  14-bit outputs adders The approximated classifier meets the accuracy specification with the following parameters:  6-bit inputs, 11-bit output approximated multipliers  14-bit outputs adder with 5-bit sloppy for both the models, inputs are 6 bits wide

Experimental Results
After the fixed-point analysis discussed in the previous section, we identified the most complex neuron (in terms of inputs number) in the classifier. Such neuron has been coded in VHDL in two different versions: the traditional fixed-point and the approximated one. For both the models, we use the satlin activation function. This is because, as discussed in previous sections, we focused our analysis on the algebraic operators. Both models are successively synthesized. The synthesis was performed using Synopsys Design Compiler and the STM 90 nm library of standard cells. Synthesis results are shown in Table 1 in terms of area, power consumption, and maximum frequency.
Results show that the approximated neuron outclasses the fixed-point model in area, power and maximum frequency. The area of the fixed-point neuron is 42280 µm 2 against the 23971 µm 2 of the approximated one. Such an advantage in area occupation involves less power consumption and higher clock frequency. The estimation of the area and the power consumption have been performed with a clock constraint equal to the maximum clock frequency obtainable by the fixed-point neuron (0.63 GHz).

Conclusions
In this paper, we investigated the possibility to realize low power Neural Networks using approximated algebraic operators. Experiments were performed comparing a traditional fixed-point neuron with an approximated neuron composed of a compression based multiplier and a sloppy adder. In both the neurons the satlin activation function has been used. Experiments show that in the proposed case of study (a wine classifier) the approximated neuron allows to save up to the 43% of the area, a power consumption saving of 35% and an improvement in the maximum clock frequency of 20%. Results underline that the use of approximated operators is a valid solution for power consumption and area occupation reduction in ML systems.