Data Partition and Communication on Parallel Heuristik Model Based on Clonal Selection Algorithm

This research conducted experiments on population-based heuristic parallel algorithms, which are inspired by the clonal selection, called Clonal Selection Algorithm (CSA). Course-grained parallelism model applied to improve execution time. Inter-process communication overhead is addressed by adjusting communication frequencies and size of data communicated. Experiments on six parallel computing models represent all possible partitions and communications and using data of NP-Problem, Traveling Salesman Problem (TSP). The algorithm is implemented using model of message passing libraries MPJExpress and ran in a cluster computation environment. Result shows the best parallelism model is achieved by partitioning initial population data at the beginning of communication and the end of generation. Communication frequency can be up to per 1% of a population size generated. Using four dataset from TSPLib, experiments show the effect of communication frequency that increased best cost, from 44.16% to 87.01% for berlin52.tsp; from 9.61% to 53.43% for kroA100.tsp, and from 12.22% to 17.18% for tsp225.tsp. With eight processors, using communication frequency will be reduced execution time e.g 93.07%, 91.60%, 89.60%, 74.74% for burma14.tsp, berlin52.tsp, kroA100.tsp, and tsp225.tsp respectively. We conclude that frequency of communication greatly affects execution time, and also best cost. It improved execution time and best cost.


Introduction
CSA (Clonal Selection Algorithm) is one of the population-based heuristic search algorithms.This algorithm has been able to solve combinatorial problems [1], [2], from classical problem the Traveling Salesman Problem (TSP) [2], [3] to particular optimization problems in Iterative Learning Control (ILC) [4].CSA is part of the Artificial Immune System (AIS), a bioinspired computing approach to solve complex problems [5], [6].This approach, like other population approaches, requires significant amount of computation time.Many ideas attempt to address this problem by adopting parallel computation paradigm.As the initiators, Watskin [7] is not specific to the CSA and applied to pattern recognition problems.Hongbing et al. [8] apply the CSA parallelism for protein structure prediction using Open-MPI.Dabrowski and Kobale [9] using the parallel-CSA computation for graph coloring problem.
In this research, parallel computing models will be developed to exploit the available parallelism potential on the clonal selection and CSA.In addition to considering the characteristics possessed by the immune system on the clonal selection events, models built refers to the principles and concepts of parallel computation design, taking into account many aspects: partitioning, communication, agglomerations, and mapping [10] Based on the principle of communication, there are two groups of models of computation, the master-slave model with a processor acts as a communications controller, and others acting as slave processors are governed by the main processor/master.Other computational model is called multi-communication model or coarse-grained communication, where all processors communicate with each other without any centralized control processor [11], [10].For a population that has been set, the multi-communication model shows better computation speed.However, this has yet to be showed the linkage between computing speed performance and CSA's parameters, i.e. population size, number of the selection, and the amount of data

Population initialization
Set of randomly generated tour.There are (n-1)!possibilities that the tours may be raised.This population is part of the whole tours.The number of tours is generated by the specified population size.

Affinity evaluation
Evaluation of affinity checks each tour that has raised, find the cost required to form the tour.Selection: affinity maturation Affinity is how close the cost of a tour with the optimal/best cost.The closer, the higher affinity and will be selected.

Cloning
Cloning is process to copy selected tour, number of copies are depends on clone factor: Hypermutation Cloned/copied tour will be mutated according to hypermutation probability mutate factor: Edit receptor/elisitation After mutate, we will have the best tours-that will be replacedthe worst tours in the initial population.The number worst tour replaced will be depends on some random size replacement d.

Stop condition
Clonal selection process will be repeated until a stop condition obtained.Stopping criteria could be the number of generations, or numbers of populations (tours) are evaluated, or best cost found.

Processing element communications
Exchange best tours produced by each of the processing elements to other processing elements.
Here the description about research method that used in this research: The problems are going to have immune engineered; which are representation and affinity maturation.There are parallel clonal selection algorithm called clonal selection inspired parallel algorithm (CSI-PA) that has several parameters that has been set.Parameters of the clonal selection consists of the population size (N), the number of selection (n), the number of generations (g), and the number of nodes (non) from TSP Problem.These algorithms are executed in parallel execution environment, e.g multicore and cluster computer.There are several procesing elements to process.These executions will result solustion, e.g. the best tour with their best cost and time for execution.These algorithms are implemented using single program multiple data model, using message passing interface standard (MPI) and libray named MPJExpres using Java Programming language.The algorithms will be executed using some variant of data partition and communication frequency.
Experiments conducted on multicore and cluster environment with a headnode and 16 compute nodes.Eight compute nodes used in these experiments with their specification: 16 x 2.90GHz CPUs storage of 895.465GB in RAID5 configuration.The head-node is using CPUs 32x2.90GHz,126.13GB memory, local disk 895.465GB and Linux 2. Some parameter values have been defined, such as the value of the initial population, the number of selection, clone factor, and mutate factor.Initial population partition was done in Figure 2 as follows: Experiment overview can be resumed in Figure 3. Thus we have six models for experiment.We do several executions for each experiment, and then get the average result from each ex-periment to report in section Result and Discussion.After that, we will check the effect of communication frequency to execution time result and the best cost obtained.

Result and Discussion
Based on the above scenario, conducted experiments on cluster environment with four datasets, e.g., burma14.tsp,berlin52.tsp,kroA100.tsp,andtsp225.tsp.Results logged from main processor (process 0).For the six models, we observed effects of the number of generations and the frequency of communication on the best cost and the execution time.We do with 100.000 number of generation.There are two result experiments, first result will show the effect of partition and the second one is the effect of communication frequency to exe-cution time and best cost obtained.Detail of the result will be presented in the following section.

Result I
The first experiment was to observe the six models in terms of the number of generations, best cost, and execution time.As we can see, each models gain their best cost differently for each dataset and number of processing elements.For dataset burma14 that has number of node = 14, the best cost was obtained by several models, with 2 and 4 number of processing elements.Their best costs are 3323 which is same as best known best cost from TSPLib for burma14.tspdataset.But increasing number of node made different results, as we can see model number 1 gained better best cost for dataset berlin52, kroA100, and tsp225 with 2 number of processing elements, close to model number 2 with 4 number of processing element.Table above shows that number of processing elements has no direct impact for best cost obtained for all dataset.It because, best costs obtained are more depend on.cloningand hypermutation mechanism that result random tour.Tabel 4 shows experiment results for execution time.If we use more processing element, then we will need more time to execute.There are communication overheads between processing elements.Except model 1, that if we use 8 number of processing elements, we will have better execution time than if we use 4 number of processing elements.Averagely, model number 1 has better execution time than others for all datasets.

Result II
In this experiment, we carried out some reductions of the frequency of communication between processors.The ultimate goal is to get the execution time as possible, but does not reduce the quality of the final result, i.e. best cost.Table 5 shows summary of the best cost after we controlled the communication frequency.Table 6 shows the best execution times after we controlled communication frequency.After we controlled communication fre-quency, we gained execution times 12.822ms (M4; np2), 36.604ms(M4; np2), 89.993ms (M4; np8), and 607.228ms (M4; np8) for bur-ma14.tsp,berlin52.tsp,kroA100.tspand tsp225.tsprespectively.Compare to Table 4 above, the execution time reductions are 93.07%,91.60% , 89.60%, 74.74% respectively.The average execution time shows that Model 1 gained the best execution time.We can see that controlled frequency greatly affects the execution time, and also the best cost.It improved execution time and also best cost.

Result III
This section shows comparison the result from section 4.2 with another approach from another researcher.Since another researchs using different case and different parallel programming environment, we need to do re-created algorithm and program and apply it to the same case, TSP problem, with some assumption.Since model 1, with single population and partition shows the best result, we choose it and compare to algorithm from [8]. Figure 4 descriribe parallel computing model from Hongbing, using ring communciation; compare to model 1 from section 4.2, using mesh communication, can be shown in Figure 5 below: Figure 6 shows best cost comparison for all dataset with number of processing element 2, 4, and 8 and Figure 7 show execution time comparison for all dataset with number of processing element 2, 4, and 8.As we can see, from best cost, there are some differences results from each dataset.But over all, result from researcher gain better best cost than other researcher.From execution time point of view, result from researcher gain significant improvement than result from other researcher.We conclude that our approaced lead to better result.

Conclusion
Experiment results showthat all the models, produce best weight relatively close to known-best-cost for burma14 dataset.However, for other dataset need more generation to obtain best know result.Before and after controlled communication frequency, there are some models that obtained 100% known best cost e.g:Model 2 wih np=4 (M2; np4),M4; np8,M5; np4,M6; np2 np4,M1; np4 np8,M2; np2,M3; np8,M4; np4 np8.The execution time significantly We conclude that with six models, to obtain best cost the best model is M1, e.g single population with partition in initial population and its best population; and to obtain best execution time, the best model is M4, e.g single population with partition at the end of generation.For the average execution time we can see Model 1 gained the best cost and the execution time.These conditions are best if the communication frequency is controlled.After compare to another approach from another researcher, from execution time point of view, result from researcher gain significant improve-ment than result from other researcher.We conclude that our approaced lead to better result

Figure 4 .
Figure 4. Single-population with ring communication

Figure 6 .
Figure 6.Best cost comparison for all dataset with several number of processing element

Figure 7 .
Figure 7. Execution time comparison for all dataset with several number of processing element

Table 3 .
Table 3 below shows the results for the six experiments based on weight and execution time.Experiments about execution time are summarized in Table 4 below.Best cost for all dataset

Table 4 .
Execution time for all dataset

Table 5 .
Best cost for all dataset after controlled communication frequency

Table 6 .
Execution for all dataset after controlled communication frequency

Execution Time (second) Execution Time Comparison for Number of Node 52, 100, 225 with Number of Processing Element = 2, 4, 8
201 differs for each model, increases with the number of gener-ations and the number of processors used.It appears that the amount of processing affects the execution time but does not affect the best cost.Frequency of communication greatly affects the execution time, and also the best cost.It improved execution time and best cost.Communication frequency can be up to per 1% of a population size generated.Using four dataset from TSPLib, experiments show the effect of communication frequency that increased best cost, from 44.16% to 87.01% for berlin52.tsp;from 9.61% to 53.43% for kroA100.tsp,and from 12.22% to 17.18% for tsp225.tsp.With eight processors, using communication frequency will be reduced execution time e.g 93.07%, 91.60%, 89.60%, 74.74% for burma14.tsp,berlin52.tsp,kroA100.tsp,and tsp225.tsprespectively.