Comparison of Knuth Morris Pratt and Boyer Moore algorithms for a web-based dictionary of computer terms

The term computer dictionary is a collection of vocabulary words that have meanings and functions to guide human interaction with computers. The Dictionary of computer terms is currently widely distributed in printed books but is not efficient in its use. Users have to search for terms manually page by page to get the meaning of the words searched. Besides, printed books are also inflexible in their use and are not updated with the latest provisions.


I. Introduction
The term computer dictionary is a collection of vocabulary words that have meanings and functions to guide human interaction with computers. The Dictionary of computer terms is currently widely distributed in printed books but is not efficient in its use. Users have to search for terms manually page by page to get the meaning of the words searched. Besides, printed books are also inflexible in their use and are not updated with the latest provisions.
Users can use online applications to search for computer terms by utilizing web applications such as https://kbbi.kata.web.id and http://intisari.grid.id as well as search engines. However, the time is sometimes not found or obtained a different meaning from the term computer. Even the current activities in searching vocabulary and dictionary meanings of computer terms are less active. The search method is still conventional, with applications that have many categories of words that are not computer-specific. Thus, the search process is too long.
Research related to string matching algorithm in documenting content comparing Brute Force algorithm, Knuth-Morris-Pratt algorithm (KMP), Boyer Moore algorithm (BM), and Rabin Karp algorithm. Using the KMP algorithm has a better speed than the others [1]. Besides, the application of the .txt and .docx document search algorithm shows that the KMP algorithm has the best performance [2]. Full permutation pattern-matching research with BM, Harspool, Aho-Corasick, and KMP algorithms concludes that the KMP algorithm has the fastest permutation pattern-matching performance [3]. However, a comparison of the performance of the KMP, Naïve, and BM algorithms in processing various text sizes concluded that the BM algorithm has the best and most efficient performance for large text sizes [4]. A word search in the dictionary with the Zhu-Takaoka and KMP algorithm is found that the Zhu-Takaoka algorithm is faster than the KMP algorithm [5]. The Zhu-Takaoka algorithm starts searching from the end of the pattern adapted to the text, but in the KMP algorithm, it starts from the beginning of the model. The BM algorithm has the same search concept as the Zhu-Takaoka algorithm doing a search starting from the end of the pattern. The application of the BM algorithm in the electronic term dictionary has a pretty good speed [6].

A B S T R A C T
Computer students need a dictionary of computer terms to deepen lectures. In developing dictionary applications, the term computer will choose the fastest and most efficient memory algorithm. The comparison algorithm is Knuth Morris Pratt (KMP) and Boyer Moore (BM) algorithm. Based on previous research, the KMP algorithm has a better performance compared to other string matching algorithms. However, other studies have concluded that the BM algorithm has better performance. Besides, the Zhu-Takaoka algorithm is more efficient than the KMP algorithm in dictionary development. The BM algorithm has the same search concept as the Zhu-Takaoka algorithm. The determination of the fastest and most efficient algorithm in this study uses the Exponential Comparison Method (ECM). ECM sets criteria for when searching and using the memory in the search process. The results of the comparison of the KMP and BM algorithm are the search time for the BM algorithm is 37.9%, and the KMP algorithm is 62.1%. The results of the use of search memory for the KMP algorithm are 50.6%, and the BM algorithm is 49.4%. The total ECM score shows that the BM algorithm is 0.55% better than the KMP algorithm. Based on previous research, each string matching algorithm has the best conditions for different dictionary applications. This research will build a web application to compare the KMP and BM algorithms for the dictionary of computer terms. The initial hypothesis shows that the KMP algorithm is faster than the BM algorithm. The comparison process uses the Exponential Comparison Method (ECM). ECM is a decision support system method used to determine the priority order of decisions based on alternative criteria. ECM is quite useful in assessing performance [7,8]. The determination of the algorithm in making a dictionary of computer terms is to choose the smallest total score. The attributes are word search time and memory usage in the search process.

II. Method
The stages of research comparing the KMP and Boyer Moore algorithms were carried out with the initial stages of observation and interviews. This stage is done by asking the method in finding a dictionary of computer terms and obstacles encountered. The author looks for facts directly by trying the processes carried out by users in searching for computer terms.
The next stage is the study of literature. This stage aims to find some documentation related to dictionary applications and string matching algorithms. Based on the results of previous research studies related to the comparison of the KMP and BM algorithms. The following will explain the concept of string matching and two algorithms.
String matching is a technique for finding patterns of character or strings [9]. String Matching is an algorithm that can search all short strings that appear with a pattern [0 .... n-1] and a longer one with a pattern [0 ... m-1] called text [10]. String searching can be formulated using two-terms, namely, the pattern (a string of length n), and text (a string of length m characters).
Character values (n<m) can be found in the text. In string matching algorithms, the text is defined to be in memory so that it is in the search string in the archive. The contents of the file are read first and stored in the memory [11]. If the pattern appears more than once in the text, then the search will bring up output in the form of the location of the model found first. There are many string matching algorithms, but in this study, only comparing the KMP algorithm and the Boyer Moore algorithm.
The KMP algorithm has an unnecessary iteration concept. The KMP algorithm process will not produce a match between the pattern or words sought with the primary model. For example, the process of finding some m in the sentence K[] containing the word k[]. The way it works is by looking for a match of characters at successive values starting from the m-index at the location of the string to be searched, that is, K [m]. If the index m is at the end of the string and no matching characters are found, the search will fail. For location m, the algorithm checks to find a match starting with the first character for the word to be searched, which is K[m] = k[0]? When a match is found, the algorithm will test other characters in the word search by checking the values in sequence starting from the location of the word index, i. The algorithm will take the character k [i] in the word search and check it according to the expression K[m+i] = k[i] ?. When there is a match between all sequential characters that correspond to locations k and m, the search string matches. The KMP algorithm application is implemented in desktop-based search engine applications [12].
The stages of the KMP algorithm in string matching are as follows: 1 The KMP algorithm starts to match the pattern at the beginning of the text. 2 The algorithm matches data from left to right for each character with a pattern in the corresponding text, until the following conditions (as appropriate): a. There is no match between character patterns and text (mismatch). b. There is a character pattern match. Then the algorithm will inform its position. c. The algorithm shifts the pattern according to the Boyer Moore (BM) algorithm uses the concept of string matching from right to left. The process is done by scanning character patterns from right to left [13]. Boyer Moore's algorithm uses two-shift functions, namely good-suffix shift, and bad-character shift. This function is used to take the next step if there is a mismatch between text characters and character patterns [14].
The working principle of Boyer Moore's algorithm is as follows: 1. The preBmBc and preBmGs procedures will be performed to obtain initialization. 2. The results of the preBmBc and preBmGs procedures in the form of BmBc and BmGs tables are used for the string search process.
Pseudocode BM algorithm in the pre-search and search phase as follows: Design and implementation of applications, build web-based applications using php, html, css, js and mysql databases. To illustrate the system, class diagrams and sequence diagrams are made.
Testing, comparing the score results from the KMP algorithm and Boyer Moore algorithm by measuring speed and memory usage based on ECM. To test the best algorithm, MPE is used to determine the priority order of alternative decisions with multiple criteria. In calculating and comparing the search process of the KMP and BM algorithm are as follows: 1. Determine alternatives To analyze the speed comparison between the KMP and BM algorithm in searching it is necessary to determine which algorithm will be used as a string search algorithm in a computer language dictionary application.

Determine Criteria
To be able to compare the two alternatives, the next step is to determine criteria in analyzing the process and how it works. For the criteria, see table 1.  The level of memory usage has an effect on the algorithm's speed Time Needed 50% 0,5 The time needed in the search process

A. The Need for a Dictionary of Computer Terms
The author conducted interviews with a sample of 50 students of Informatics Engineering study programs to determine the need for a computer term dictionary. The discussion is divided into three main points, namely the importance of the glossary of computer terms, how to access computer terms, and how difficult it is to obtain. Fig. 1 shows that all respondents stated that they needed access to a dictionary of computer terms to support the learning process. Respondents gave responses with the percentage of strongly agreeing at 86% and agreeing at 14%. Based on access, the dictionary of computer terms is divided into three, namely printed books, search engines, and KBBI or other web applications. Fig. 1 explained that of the three ways of access, respondents stated 88% experienced difficulties and were quite severe, while 12% said they were comfortable. After the interview, researchers observed the difficulty level of access to the dictionary of computer terms. Accessing a glossary of computer terms using books has many limitations, namely media and speed in searching. The use of online tools has difficulty in search results and seek time because not all application providers have keyword choices. Therefore, a computer term dictionary application with string matching is indispensable for an easy and fast term search.

B. Application design of algorithm comparison
The application is based on a website, which users can access the application using a browser. The app is designed with PHP, HTML, CSS, js, and MySQL database. This application compares the KMP and BM algorithm by searching for computer terms, and the search results will be obtained in the form of memory usage information and access time. Next, Fig. 2 shows a sequence diagram of the application being built. The class diagram illustrates the interaction between classes in the application. There are three classes of this design, namely users, algorithms, and computer search terms. Based on Fig. 3, the design of this application to compare the KMP and BM algorithm.   The following is the function of the implementation of the KMP and Boyer Moore algorithms using the PHP programming language.