Availability Analysis of Predictive Hybrid MOut-of-N Systems

In m-out-of-n system, if m-out-of-n modules agree, system can report consensus; otherwise, the system fails. On the other hand, in predictive hybrid system if there is no agreement, a history record of previous successful result(s) is used to predict the output. In order to analyze the availability of predictive hybrid redundancy system, Markov modeling is utilized. By using Markov model of the system in steady state, the availability is derived and compared with m-out-of-n system. The results of simulation demonstrated that the availability of predictive hybrid system is higher than m-out-of-n system especially for large m.


Introduction
Redundancy is a well-known technique to enhance fault tolerance of highly reliable and highly available control systems.Redundancy of hardware modules is perhaps the most applicable form of redundancy in control systems and is applied in three forms of passive (static), active (dynamic) and hybrid.Voter is the main element in passive redundancy that masks the effect of fault from the output of the system.Active redundancy does not try to hide failure but detects fault(s) and locate the faulty elements.In hybrid approach, the system masks faults while the mechanisms for fault detection, fault location and fault recovery are performed to reconfigure the system in case of fault occurrence [5].The m-out-of-n system, as been widely applied in engineering systems, utilizes static redundancy on n parallel or series redundant modules and functions when at least m modules among n modules in the system work properly [3] [4] .
Predictive hybrid redundancy [6] is hybrid redundancy architecture.This architecture has only discussed on triple modular redundancy (TMR) where three parallel modules are used.In fact, predictive hybrid redundancy as in [6] is the hardware implementation of hybrid voters incorporating smoothing and prediction.It has been originally presented for X-by-Wire systems; however, the number of sensors and actuators in real X-by-Wire systems is normally more than three.Predictive Hybrid m-out-of-n system (PHmn) [22] was applied on n redundant modules ( ) Failure rate of the system when there are i failed components; 0 i n m    or i n m   .

( ) L s i
Laplace transform of ( ) Repair rate of the system when there are i failed components; 1

( ) P t i
The probability that there are i failed components in the system at time t; 0 i F   .


ISSN: 1693-6930 TELKOMNIKA Vol. 12, No. 2, June 2014: 437 -446 438 rather than three modules as in [6].For real-time applications including X-by-Wire systems (e.g., Brake-by-Wire, Steering by-Wire), sensors and actuators must be available all the time.They function immediately once upon a sensory data is available; otherwise the result of this unavailability is catastrophic.Various approaches have been introduced in literature to compute the availability of mout-of-n systems by considering different techniques and different scenarios of failures [12]- [17], [21].The novelty of this work is to use Markov process for modeling the availability of PHmn system and calculating related equations by applying the steady state condition.Moreover, the effects of failure rate and repair rate on the availability of the system have been taken into account.To the best of our knowledge, availability analysis of PHmn system has not been introduced and analyzed in literature.In this study, steady state availability and MTTR are utilized for estimation of MTTF and MTBF.While no closed-form solution was reported in the literature for estimation of MTTF and MTBF, when a prediction phase is considered.
The rest of this paper is organized as follows: in Section 2, System description and assumptions are described and availability of PHmn systems is obtained by mathematical and probabilistic methods.The availability of PHmn systems and m-out-of-n systems are compared in section 4. Finally, the conclusion and future works are discussed.

Predictive Hybrid m-out-of-n system (PHmn)
A control system is assumed with n redundant modules which work in parallel.This architecture is well-known as N-Modular Redundancy (NMR).Every module generates output independently and the output of a module does not influence on one another.The modules are repairable at any time; however, only one module is eligible to fail or repair in a time unit.A decision making module known as Voter performs decision making or voting on the outputs of redundant modules.In PHmn system, a two-phased voter is utilized.In the first phase, a majority voting [11] is applied on the voter's inputs.Voter generates a first phase decision iff at least a majority of inputs, i.e., m = n + 1=2 , agree or almost agree (considering a threshold).A second phase incorporating Prediction [7]- [9] or Smoothing [1] is used for possible decision making, when the first phase does not make consensus.The activity of finding appropriate voter output in second phase is based on some calculations on voter's history record.Control system fails inevitably when the second phase does not make a result.The structure of the PHmn system is delineated in Figure 1.

Availability Modeling
The availability is defined as the probability of a system to function correctly and be available at the instant of time [5].Similar discussions for reliability analysis as in [22] are presented for the availability analysis.Combinatorial modeling and the Markov modeling are two known techniques to model the availability of the system; however, Markov modeling is used in this study for some reasons.Markov models are very robust [23]; many systems cannot simply be modeled by combinatorial methods because they concentrate on probabilistic technique for ISSN: 1693-6930  Availability Analysis of Predictive Hybrid M-Out-of-N System (Abbas Karimi) 439 calculating availability; and modeling the repair process is not easy by combinatorial modeling [5].For Availability analysis of PHmn system, a Markov model as in Figure 2 is presented.

Figure 2. Markov availability model for PHmn system
Based on the system description in Section 2, the PHmn system works in one of the three modes: operating, prediction and failed.When the system works correctly and voter makes decision in the first phase of execution, the system is in operating mode [19].Recall that there may be some faulty modules in the system; however, the numbers of them are less than the majority.In the other words, at the least (n-m+1) non-faulty module function correctly.The system states in operating mode are labeled from 0 to (n-m) as is in Fig. 2. The system switches from an operating state i to next operating state (i+1) with an exponential failure rate of λ i .
Possible transactions for state i are i → i + 1; , with a departure rate of µ i =iµ .System switches from an operating state (n -m) to prediction state (pr) with departure rate λ Pr and from prediction state to previous operating state with departure rate µ Pr .
A failed module may be repaired with an exponential repair rate.For simplicity, all values of and in operating states are considered as the same and two transactions are not allowed simultaneously.When (n-m+1) modules are failed or are in the repair queue, the system migrates to Prediction mode.This mode has one state which is labeled by Pr.Pi(t) and P Pr (t) denote the probability that the system be in state i and in state Pr at time t, respectively.Based upon the relations of state i with its neighbors, ( ) [(( ) The initial conditions are P 0 (0)=1 and P i (0)=0 for i≠0 .By taking Laplace transform of equations (1, 2, 3), a matrix form as [c]P(s)=P(0) [22].
The system is unavailable in state F, and the unavailability of the system is shown by P f (t), and therefore, system availability is given by: Solving the equations for P i (t) by taking Laplace inverse transform of P i (s) is too complicated.Therefore the steady state condition is defined for availability [24].Steady state availability is denoted by A s in Equation 5.
A s =lim P(system is working at time t) By defining Γ Pi , (i=1,…,n−m) as And Γ Pr as  1) Based on Equation ( 6), the steady state availability of PHmn system is obtained as Mean Time To Failure (MTTF) is simply defined as the average uptime of the system [18].In this study, Steady State Availability and Mean Time to Repair (MTTR) are utilized for estimation of MTTF.The average duration of time that the system spends for repairing a faulty module is known as MTTR.Equation ( 13), shows the relation between MTTF and MTTR and the steady state availability (Equation 12).
Mean Time Between Failures (MTBF) is defined as the average operating time of the system between consecutive failures (excluding the time duration of a system in failed state) [21].MTBF is estimated according to Steady State Availability and MTBR, as seen in Equation (15).
Each module has MTBR=

Experimental Results
In this section, the availability of PHmn system is compared to m-out-of-n system as in [20].For this purpose, Matlab simulator is utilized.Simulations are iterated for different values of n, m, λ and µ.Once the system switches from state i to state j by failure rate λ, it maybe be repaired and returned to the previous working state by repair rate µ; except for the fail state.It can be obviously claimed that the larger the rate of failure, the system is more susceptible to fail.Because the probability of failure is more than the probability of repair.It is also expected that a state with larger rate of repair is likely repaired rather than going to the next state which is perceptibly closer to the fail state.
The results of simulation are discussed in two subsections: based on variation of m, λ and µ, and based on variation of λ and µ.

The effect of m, λ and µ.
Three scenarios for repair rate and failure rate are considered: λ < µ, i.e. failure rate is smaller than repair rate; λ=µ, i.e. failure rate is equal to repair rate; and λ >µ, i.e. failure rate is larger than repair rate.The results of simulations are also analyzed based on the hardness (n/4 < m ≤ n/2+1) and softness (m ≤ n/4) of an agreement [22].For instance, in a 128 modular system, if the agreement achieves in the first phase of voting, m is a value between 1 < m ≤65.This distance is divided into soft agreement (1 < m ≤ 33) and hard agreement if (33 < m ≤ 65).

• Experiment 1. λ<µ
The results of simulation shows that the availability for both m-out-of-n and PHmn systems are 1, i.e., they are highly available when λ<µ.The 100% availability is due to the low failure rate (consequently small probability of failure) and high repair rate (consequently high  ISSN: 1693-6930 TELKOMNIKA Vol. 12, No. 2, June 2014: 437 -446 442 probability of repair and system restoration), which yield to long-term operation and a highly available system.

• Experiment 2. λ=µ
In this scenario the values of λ and µ are considered as the same, i.e. 0.5.For λ=µ= 0.5, the availability of PHmn improved 0.64% in overall, and 1.29% for hard agreements as shown in Fig. 3.As seen in Fig. 3 the improvement for soft agreements is negligible and is close to 1 for small m.Because modules easily agree when m posses a small value in comparison with n.However, the availability of PHmn is higher than m-out-of-n system when m tends to higher values.

• Experiment 3. Λ>µ
The results of availability in PHmn and m-out-of-n are displayed in Figures 4-6 where λ is respectively 0.9, 0.7, and 0.6 and µ is respectively 0.1, 0.3 and 0.4.When failure rate is higher than repair rate, the availability is expected to degrade generally.This phenomena is seen in Figure 4-6.
The availability improvement for soft agreements is negligible as seen in Figure 5-6; however, PHmn improved the general availability to 5.43% (Figure 5), 3.03% (Figure 6) and for hard agreements to 13.82% (Figure 5) and 6.49% (Figure 6).When the failure rate is much higher than the repair rate, i.e., λ=0.9 and µ=0.1, the availability of both systems degrades significantly compared to Figure 5-6.However, the availability of PHmn improves 35.6% for hard agreement.As m=65 is the boundary of soft and hard agreement, the effect of λ on the availability of a 65-out-of-128 system is discussed in this section.The results are presented for two scenarios: fixed repair and varied failure rate (0 < λ ≤ 1) as in Fig. 9-11; and fixed failure and varied repair rate (0 < µ ≤ 1) as in Fig. 12-14.Theoretically, the probability of failure increases and the availability of the systems decreases for higher failure rates (Figure 9-11) and the higher the repair rate, the availability increases for a fixed failure rate.The result of simulation confirmed the theoretical expectations as seen in Fig. 9-14.Mean availability of PHmn system vs. m-out-of-n system based on different values of λ and µ has been demonstrated in Table 1.It shows 2.17%, 14.26% and 34.84% improvement in the average availability of PHmn when λ is respectively 0.1, 0.5, and 0.9.Other values of λ have been also investigated in which increasing in the availability of PHmn is obtained.
As the other conclusion from Table 1 and Figure 9-14, the availability of both systems is closed to 1 for low failure and high repair rate.Although the small µ and large λ yield to the worst availability of the system, large λ has more negative effect in comparison with small µ.The availability has reached to maximum 1 when µ=0.1 and 0 < λ < 1, whereas it has not achieved 1 for different values of µ.The average availability in Figure 9 is 0.9893 for PHmn system while it is 0.9685 for m-out-of-n system systems.These scenarios are highlighted in Table 1.Generally, in a large scale highly available application, the most important assumption is utilizing the highly available modules.Because as they fail, their repair and restoration to a full operational manner does not likely occur.

Conclusion
A PHmn system is an extension of triple predictive redundancy to large scale control system and comprises n redundant hardware modules.If m-out-of-n modules are in agreement, the system makes an output; otherwise, a history record of previous successful results is used to predict the result of current cycle.In order to investigate the availability of PHmn system, a Markov availability model has been presented.Then the availability of PHmn system has been derived in steady-state condition and simulated in different scenarios of repair rate, µ , failure rate, λ , and m (quorum of consensus) as the effective parameters on the system's availability according to the computed availability equation.The result showed that the PHmn system has totally more availability than m-out-of-n system.In all cases, the use of PHmn system is the best choice especially when the large scale control systems are dealt with.The exception is for the situations where the use of traditional system is favored due to the cost preferences if the number of m is very small.In future works, the other parameters influencing the system dependability will be investigated.

Figure 1 .
Figure 1.Structure of the PHmn System[22] repair is allowed at any time, t, MTTR equals to and TELKOMNIKA ISSN: 1693-6930  Availability Analysis of Predictive Hybrid M-Out-of-N System (Abbas Karimi)

.
When the system fails, (n -m + 2) modules are waiting for repair.Therefore, MTBR of the system is

Table 1 .
Mean availability of PHmn vs. m-out-of-n system