Performance Evaluation of Centralized Reconfigurable Transmitting Power Scheme in Wireless Network-on-chip

Network-on-chip (NoC) is an on-chip communication network that allows parallel communication among all cores to improve inter-core performance. Wireless NoC (WiNoC) introduces long-range and high bandwidth radio frequency (RF) interconnects that can possibly reduce the multi-hop communication of the planar metal interconnects in conventional NoC platforms. In WiNoC, RF transceivers account for a significant power consumption, particularly its transmitter, out of its total communication energy. This paper evaluates the energy and latency performance of a closed loop power management mechanism which enables transmitting power reconfiguration in WiNoC based on number of erroneous received packets. The scheme achieves significant energy savings with limited performance degradation and insignificant impact on throughput.


Introduction
Chip Multiprocessors (CMP) is gaining significant attention recently in biomedical, multimedia, aviation, next-gen military systems and several other applications [1].This involves quintillion operations per second to meet target applications' requirements.Traditional NoC does not scale to this increasing demand of huge data communication due to several challenges such as unicast features, multi-hop counts and limited bandwidth which affect performance and power consumption.Wireless network-on-chip (WiNoC) provides improvement to design low-power and high-bandwidth massive multicore chips [2][3][4].WiNoC introduces long-range and high bandwidth radio frequency (RF) interconnects that can possibly reduce the multi-hop communication of the planar metal interconnects in conventional NoC platforms.The WiNoC communication infrastructure is constituted by a wireless backbone that is augmented on the conventional NoC architectures.This new structure requires new hardware resources such as transceivers and antennas.Unfortunately, area overhead is significantly introduced by the new structures, encouraging several architectures to be proposed to compromise between the communication capability and the area overhead [5].
On average, implementing WiNoC architecture improves performance and energy savings of about 20 and 30% respectively, over the fully wired NoC links [5,6].Power and energy management continue to play an important role in the on-chip communication issue, even for WiNoC.Adding wireless link structures increase total power consumption.Recently, several energy-efficient transceiver designs have been proposed to overcome the energy consumption overhead introduced in WiNoC [7,8].It has been shown that RF transmitter front-end dissipates power by about 50% in [9] and about 74% in [10] for network sizes of 128, 256 and 512 cores in terms of WiNoC transceiver consumption.Current WiNoC efforts employ the maximum transmitting power for each transmitter regardless of the physical location of the receiver antenna [11,12].
However, there have been some efforts to improve the energy consumption at wireless links level such as by hibernating unused receivers [7] and antenna beam forming technique [13].Dynamic Voltage/Frequency Scaling (DVFS) for WiNoC has been explored by

2845
various authors in [14,15] which vary voltage and/or frequency of routers appropriately to save dynamic power.Centralised and distributed power gating schemes that cut-off power supply when devices are inactive have been explored by [1,16].Besides that, low power techniques such as voltage-frequency island [14] and dynamic voltage scaling as well as static power [17] in WiNoC components have been proposed.An offline configuration of transmitting power has been proposed by Mineo et al. [18].However, this requires an extensive characterization phase which requires either time consuming field solver simulations or direct measurement of real context.It requires robust and accurate field solver simulator of radiating fields in CMOS substrates.For these reasons, Rusli et al. [19] proposed a dynamic power reconfiguration on fixed location which potentially improves energy dissipation while satisfying system performance constraint.The dynamic transmitting power in that work is based on bit error rate (BER) report at receiver radio hubs.
Based on its potential, this paper presents experiments on the erroneous packet module at receivers.In this paper, we present a packet error formulation and the proposed closed loop power management mechanism.The proposed technique is carried out on two WiNoC architectures: WCube and iWise.The rest of this paper is organized as follow.Section 2 describes the centralised power management scheme employed on WiNoC architectures.Experimental setup is presented in Section 3 whereas results and analysis are described in Section 4. Section 5 concludes the work.

Reconfigurable Transmitting Power
In this section, the reconfigurable transmitting power algorithm is introduced in the first subsection.The second subsection discusses the architecture of the proposed power manager on WiNoC architecture.

Closed Loop Reconfigurable Power Scheme
The main idea in the proposed architecture is to introduce power re-configurability to the transceiver based on the number of erroneous packets.Please take a note that this approach is different from the previous work in [19] in that this work proposes the power reconfiguration based on number of erroneous packets compared to bit error rate (BER).Many systems are equipped with packet error detection module, utilising number of packets error in their fault model.In this proposed work, the report of erroneous packets is utilized to reconfigure the transmitting power.The error detection is designed based on incomplete arriving packets or erroneous packets detected by the receiver module.
The scheme is presented by Algorithm 1 and can be described as follow.A source radio hub, i transmits packets with initial maximum power step to a destination radio hub, j.Radio hub j counts the number of packets it receives from transmitter i using packet counter module, PC[T], determined by a predefined RP.PC[T] is decremented each time j receives a packet from i. Signal packet_error is triggered after a complete execution of PC [T].If the manager translates the transmitted power of the pair, <i,j>, is oversupplied, the code word for i-th entry is decremented in the lookup table.WiNoC operation is stalled during RP, allowing power manager to reconfigure the table.The periods of RP and RS are predefined.After PC[T] count finishes, RS starts operation to allow transceiver power to be reconfigured.The threshold packet, threshold_error is predetermined prior to operation commencement.Based on the experiment of investigating the relationship between the RS period and minimal performance degradation, the maximum allowable WiNoC stall period (which equals to the power manager execution period) can be determined.
As soon as the platform enters RS period, each radio hub concurrently notifies universal power manager of current error packet counts.The manager receives the statistical packet count reports concurrently from radio hub i during communication with radio hub j.It then compares the packet error counts with allowable threshold packet counts.If packet_error(i,j) is greater than the user defined threshold_error, then power manager increments the transmitting power needed for reliable communication between the pair.The transmitting power occurs otherwise if packet_error(i,j) is less than threshold_error.As RP commences again, a power amplifier drives the actual wireless transmission power.Input: i, packet_error, threshold_error 2: while true do 3: if if packet_error(i,j) > threshold_error then 5: SendCmdPLInc(i,j) 6: PC[i] ← RP_counter 7: else 8: SendCmdPLDec(i,j) 9: end if 14: end while The power management strategy can be simplified in the following.Let PS(i, j) be the power step used by radio hub i to communicate with radio hub j.If number of packet error detected is less than the threshold value, the power manager increases the transmitting power step used for communication from source to destination radio hub.Similarly, if the number is higher, power is reduced by one step.Such operation is simply carried out by updating the code word stored into the j-th entry of the look-up table of a variable gain controller (VGA) module of radio hub i.The VGA contains the dynamic look-up table of each <i,j> communication pair and power gain amplifier that drives the actual transmitting power based on the power step.The look-up tables are initialised with the highest transmitting power step.Figure 1 illustrates power savings achieved using 3-power steps reconfiguration.The figure illustrates variation of power in both situations.

Architecture of Closed Loop Reconfigurable Power
The centralized power manager scheme allows transmitting power reconfigurability to be done by a centralized power manager based on BER information of WiNoC transceivers.This section presents the architecture of the centralized power management scheme, its adaptive mechanism and components that support the scheme operation.Figure 2 shows the block diagram of two communicating radio hubs on WiNoC platform.The receiver terminal maintains the statistical packet error report received during period RP.Each receiver needs to maintain the error report for transmitting power regulation of its source pair.In fact, error detection scheme in transceiver has been a common practice in physical wireless communication system to maintain reliability [20][21][22].The power manager has input ports that consist of packet, CONTROL_OUT, that is a 3-tuple field, <addrrx, addrtx, packet_error>, where addrrx is the receiver radio hub address that reports the error, addrtx is the address of the transmitting radio hub that produces packet error rate and packet_error is the estimated number of error packets detected by the receiver.The output signal, control_in, from power manager is exclusively fed into each transmitting radio hub that needs to be regulated, addrtx, to update its transmitting power step in VGA controller lookup table via signal command, cmd.The power step is a 3-bit word-code associated with seven steps that drives PA for actual transmission power.Based on the packet header destination, the <source,destination> table entry drives power amplifier to the associated power level.The operation is a 3-bit code-word protocol indicating one of the conditions for optimal transmitting power: increase, decrease or no change.Detailed explanation of VGA controller operation has been discussed in [18] and [19].

Experimental Setup
In the experiments, the network size used is 16 radio hubs.The number of look-up table entries in VGA controller changes proportionally with the network size.The attenuation map is obtained from the Ansoft High Frequency Structural Simulator (HFSS) modelled with zigzag antenna.The transmitting power steps are ranged equally between 8 µW (-21 dBm) and 794 µW (-1 dBm).When expressed in terms of energy per bit, each power step corresponds to 0.42 pJ/bit and 1.4 pJ/bit.
In this work, a packet is defined as an element of information that a PE transmits to another PE via on-chip interconnect, in this case, NoC.A packet is broken down into a variable number of flits.A packet length in this work is 32 flits of 32 bits.The packet injection rate unit considered in this work is in flits/cycle/core.Three injection rate categories are defined which are low, medium and high.The low injection rate is defined as the injection rate of packets per node into the system which results in low average packet latency (in cycles).Medium injection rate is examined if the average packet latency increases linearly from the low latency level until the threshold latency before increasing exponentially to saturation points.As soon as the traffic becomes saturated, network congestion occurs and the packet latency is observed.The injection rate that occurs during this situation is categorized as high packet injection rate (PIR).Initially, the packet injection rate is maintained at a medium rate considering no stringent requirements of the networks for the sake of simplicity.
Energy saving in this work refers to the energy dissipation comparison between WiNoC architectures implementing the reconfigurable power management scheme and the baseline architectures (without the reconfigurable power management scheme).The total energy dissipation of the whole interconnect network (lower and upper subnet NoC layers) is analysed to see the energy savings impact when the proposed reconfigurable power scheme is implemented on WiNoC.The unit is in percentage of Joule (J).The energy saving analysis has been carried out using Reconfiguration Period (RP) represented in unit packet(s), considering a packet is transmitted when each RP cycle executes.In this work, latency is observed from the period of which the message header's flit was first injected into the network until the first message header's flit arrives at the destination core.The unit is in clock cycles and analysis is done based on execution of the proposed scheme on the two WiNoC architectures and to the baseline architectures without the power management scheme.

Results and Analysis
The experimental results discuss the effects of changing RP, RS, packet injection rate (PIR) and the power management scheme.For verification purpose, application-based SPLASH-2 and PARSEC benchmark suites are used in this experiment.The suites are used to simulate multi-core programming in the area of high performance computing such as finance, cloud computing and gaming which are used to evaluate multiprocessors and their designs.The results and performance are then analysed.

Effects of Reconfiguration Periods
The energy dissipation and latency metrics are observed as RP is varied to 1000, 2000 and 4000 packets.It is considered that a packet is transmitted in each cycle.This is to allow the detection of error occurrences as soon as the number of transferred messages reaches certain values.The RS is set to 16 clock cycles and packet injection rate (PIR) low.The value of RS is set to 16 cycles due to the minimal performance degradation from the baseline architecture of both WiNoC, determined by varying the RS values.The total energy dissipation of the whole interconnect network (lower and upper subnet NoC layers) is analysed to see the energy savings impact when the proposed reconfigurable power scheme is implemented on WiNoC.
Based on Figure 3, WCube architecture with the proposed reconfigurable power scheme dissipates between 1.71×10 8 J and 1.74×10 8 J between RP 1000 and RP 4000 compared to the baseline WCube architecture (2.8×10 8 J) as shown in Figure 3 (a).The energy dissipated in iWise architecture implemented with the proposed reconfigurable transmitting power scheme are between 1.44×10 8 J and 1.49×10 8 J between RP 1000 and RP 4000 compared to its baseline design (2.7×10 8 J) as indicated by Figure 3 (b).Overall, WiNoC implemented with the proposed design introduces a 1 clock cycle overhead as shown in Figures 3 (c) and 3 (d).This is due to that WiNoC is stalled as the power management is in operation.
When the proposed scheme is implemented on these architectures using different reconfiguration periods and benchmarks suites, significant energy savings are obtained.As the reconfiguration period increases, the energy savings decrease due to the reduction in terms of responsiveness of the power manager.On average, over 40% energy savings are observed due to the fact that the more frequent power management strategy is employed.The more frequent BER is detected, causing higher probability and frequency of changing transmitting power step.The average energy savings gained between RP 1000 and RP 4000 is 40%.During RS, WiNoC is stalled causing some performance degradation.It is needed in WiNoC to determine transmitting power of each transmitter for reliable communication.Figures 4 (a) and 4 (b) show performance analysis in terms of energy savings for Wcube and iWise.And Figure 4 (c) and 4 (d) show the percentage delay increase for different reconfiguration periods.The latency graph in iWise architecture with higher latency than its counter parts (RP 1000 and RP 4000) is found at RP 2000.This result is stochastically unpredictable due to the nature of respective suite in the benchmark suites.At RP 4000, the latency reaches significantly low value as compared to other RP settings.The average latency overheads when verified using seven applications from the benchmark suites is about 2%.On average, the performance degradation in iWise architecture is about 3%.Despite the irregular energy savings and latency patterns with varied RPs shown by the individual benchmark files, it is backed by the consistent pattern of their mean values.It can be observed that the implementation of the proposed design on WCube results in more than 40% energy savings.On the other hand, about 44% energy savings reduction is observed for iWise.The reason iWise achieves more energy savings in all RP settings is justified as follow.Even though the number of radio hubs in iWise and WCube are identical, the frequency of intra cluster radio hub communication for iWise is four times more than the frequency of inter cluster communication for WCube due to the nature of iWise architectural design.In iWise, a possible communication method is by allowing multichannel transmission for each cycle which follows frequency division multiplexing scheme.In each set, there are four different carrier frequencies (3 dynamics and 1 static) that are being used at each transmission time, allowing regulation mechanism of transmitting power four times higher than in WCube design.WCube employs a single channel per communication scheme, thus less frequent power regulation is employed.Based on this observation, it is safe to conclude that the higher the number of multichannel communication is introduced at radio hubs level, the more frequent power management scheme is invoked, resulting in more energy efficient WiNoC platform.The power manager must be tuned to a suitable RP period to ensure an optimized performance as well as maintaining low power WiNoC effort.The selection of RP impacts both these metrics.Based on the figure, the effectiveness and responsiveness of power management becomes less efficient as RP value increases.

Effects of Reconfiguration States
The previous analysis has been carried out considering a reconfiguration time of 16 clock cycles.The energy and delay figures are affected by such parameter.In this experiment, RS is varied to 1, 4, 8, 16, 24 and 32 clock cycles.This work is done to determine which RS periods return both minimal as well as high latency overheads.Assuming insignificant changes of energy savings that take effect by varying RS within this range, only the latency metric is observed.The RP is set to 2000 clock cycles to represent the median energy savings, PIR is low and the centralised power management scheme is used.As the analysis shown in Figure 5, the percentage delay increases when the reconfiguration states (penalty) increases.As it can be observed, as the reconfiguration time is below 16 cycles, the impact on the delay is less than 3%.The delay starts to increase very quickly as soon as the reconfiguration time exceeds such threshold.Varying the period for RS affects system performance.On iWise architecture, as the reconfiguration phase is greater than 16 clock cycles, system latency increases about double  If the power manager control command output is designed as a serial communication channel, there are a maximum of 16 radio hubs that can be assigned to one power manager module to allow about 3% performance degradation.This condition considers one RS cycle that represents one power manager control command to a radio hub.If RS is increased to 24 cycles and more, the power management becomes less effective in terms of latency, in which the communication latency rises sharply with RS.

Effects of Packet Injection Rates
In this experiment, PIR is varied to see the effect on energy savings and latency.There are three PIR categories set in this experiment: low, medium and high and the unit is flits/cycle/node.Large data packets are broken down into smaller chunks of data for on-chip communication through buffer.These smaller data packets are called flits.Different PIR categories are explored when applying the proposed scheme on WiNoC to observe how it impacts the energy saving as well as performance.In the low injection rate case, the latency starts at about 4 cycles with PIR of 0.00001 flits/cycle/node.The medium injection rate of 0.0001 flits/cycle/node returns average packet latency of about 30 cycles.Packet latency of 100 cycles and above is defined as high when PIR is 0.0005 flits/cycle/node.Even though only a single injection rate point is considered in each category, the latency trend represents all categories.The RP is set to 2000 clock cycles to present the median energy savings, RS is 16 clock cycles to represent low delay percentage and the centralised power management scheme is used.
Based on Figure 6 (a), it is observed that the energy dissipated by the proposed design is around 1.7×10 8 J for all three injection rate categories, as compared to its baseline energy dissipation which ranges between 2.6×10 8 J and 3×10 8 J. Applying the same injection rate categories, Figure 6 (b) shows the energy dissipation between 1.5×10 8 J and 1.8×10 8 J for the proposed design as compared to its baseline energy dissipation, ranging between 2.5×10 8 J and 2.9×10 8 J.Both figures clearly indicate that there are energy savings impacts occurring in both WiNoC architectures.
Investigating further in terms of latency, both architectures with the proposed reconfigurable power scheme results in very small deviation from their baseline latency for all three injection rate categories.Therefore, small percentage of performance degradation can be expected.Based on Figure 6 (c) and 6 (d), both latency values saturate when the PIR reaches the 'take off' point at around 0.0003 flits/cycle/node.Beyond this threshold point, the WiNoC operate in high latency region because it increases exponentially with PIR values causing high packet retransmission as well as affecting the reliability requirement.Both WiNoC are considered to operate well below this threshold point as the latency increases linearly with PIR.Based on analysis as shown in Figure 7, both WiNoC architectures' energy savings decrease when the injection rates are increased.As more flits are injected into the network, the transceivers can characterize better error detection and hence necessitate the action of transmission power management.However, due to network congestion, packet latency rises exponentially making the number of successful arriving packets is less than the number of packets injected into the network.The effectiveness of the power reconfiguration scheme is affected by the saturated network which leads to deadlocks and cyclic redundancy.Despite that, the scheme performs 40% energy savings.
Packet latency increases as the PIR increases.In both low and medium PIR cases, the latency does not exceed 4% increment.This is considerably low where the network traffic function in a controllable manner.Even in high PIR latency both increments are below 6%.Further investigation in terms of flits energy saving has been carried out.
The average energy reduction per flit transmission is consistently greater than 40% for all conditions of injection rates.The unit of energy per flit measured is in pJ.The upward trend of energy savings from low to high PIR are due to the decreased number of flits received at the destination nodes as the traffic congestion is intensified.Despite the reduced number of successful received flits, the total flits energy of the benchmarks does not change significantly from one PIR category to another, thus the increasing pattern of the average energy reduction is obtained for SPLASH-2 and PARSEC benchmark suites.This trend may be different for other network traffic because different core communication pattern may result in a different energy savings trend.

Conclusion
In this paper, WiNoC transmitting power can be reconfigured to guarantee reliability of WiNoC system.Effects of varying reconfiguration period, reconfiguration states and packet injection rate on WiNoC performance and energy savings have been observed.On average, 40% energy savings have been achieved on both WiNoC architectures when reconfiguration period is varied.16 cycle-period is the most suitable reconfiguration state selection with minimum performance degradation.All categories of packet injection rate (low, medium and high) result in the energy savings of about 40% with around 6% performance degradation only.This shows that the proposed scheme is able to offer low power energy system in WiNoC.Several issues such as the suitability of the centralized scheme implementation on larger network sizes such as 256, 512 and 1,024 cores can be investigated to evaluate its performance and limitation.Besides that, run-time error detection, control network and other parameters that affect performance may also be included in the future work.Many other parameters can be taken into consideration when considering the power constraint.

Figure 1 .
Figure 1.Energy savings achieved by implementing reconfigurable transmitting power scheme

Figure 2 .
Figure 2. Proposed source radio hub to a destination radio hub power manager module (a) top level view (b) internal architecture

ISSN: 1693- 6930  2849 Figure 3 .
Figure 3.The energy dissipation for (a) WCube (b) iWise and latency impacts for (c) WCube (d) iWise between the baseline and the proposed centralised reconfigurable transmitted power design.Different RP values are used and verified using SPLASH-2 and PARSEC

Figure 4 .
Figure 4. Performance analysis in terms of energy savings for (a) WCube (b) iWise and latency impacts for (c) WCube (d) iWise on WiNoC architectures verified with SPLASH-2 and PARSEC ISSN: 1693-6930  Performance Evaluation of Centralized Reconfigurable Transmitting Power... (M. S. Rusli) 2851 the previous latency value.Communication latency varies from about 4% to more than 8% when reconfiguration phase is manipulated to 24 and 32 clock cycles, respectively.The iWise architecture works well under 3% performance degradation for the reconfiguration phase of 16 clock cycles and below.

Figure 5 .
Figure 5.The latency overheads impact on WiNoC architectures implementing the proposed design with varied RS and verified using SPLASH-2 and PARSEC benchmark suites: (a) WCube (b) iWise

Figure 6 .
Figure 6.Performance comparison in terms of energy dissipation for (a) WCube (b) iWise and latency impacts for (c) WCube (d) iWise between the baseline WCube and iWise architectures for three different PIR categories

Figure 7 .
Figure 7. Performance comparison in terms of energy dissipation for (a) WCube (b) iWise and latency impacts for (c) WCube (d) iWise between the baseline WCube and iWise architectures for three different PIR categories.