An Improved Entity Similarity Measurement Method

To facilitate the integration of learning resources categorized under different ontology representations, the techniques of ontology mapping can be applied. Though many algorithms and systems have been proposed for ontology mapping, they do not have an automatic weighting strategy on class features to automate the ontology mapping process. A novel method of computing the feature weights is proposed. By feature semantic analysis, the different entities similarity calculation model and weight calculation model were defined. The results show that it makes the ontology mapping process more automatic while retaining satisfying accuracy. Improve ontology mapping effectiveness.


Introduction
Being one of the best instrument of knowledge presentation and the basis of semantic web technologies, ontology is mainly described with RDF (Resource Description Framework) and OWL(Ontology Web Language) released by W3C besides CYCL, DOGMA, F-Logic and the like developed and used by other organizations.Currently, domain ontology has been applied in many fields such as artificial intelligence, software engineering, library science and semantic web [1], [2].The resources represented by different ontologies in different fields would be integrated and classified via ontology mapping.As the key factor of ontology mapping, the entity similarity measurement can be generally divided into three methods with different bases, namely, terminology, structure and semantics.Besides, the process of mapping can also be classified into three types, namely, manual, semi-automatic and automatic [3], [4].
Influenced by factors such as classification scheme, representation language, and background knowledge, the ontology in a same field may appear quite different.Therefore, when studying the issue of ontology mapping, besides the researches on the class matching of different entities, the features (i.e.relations) between them also matters.Generally, the system of ontology mapping possesses two strategies, namely, single strategy and multi-strategy [5], [6].When multi-strategy is adopted, different similarity measurements shall be combined into a single one properly.During the process, most weight distribution of resources is made based on the experiences or experiments of the experts nowadays, while this method remains timeconsuming and unstable when used in Web resources represented by different ontologies [7], [8].
Ontology mapping is a kind of process in which the entity of the source ontology (including class and features) would be mapped and represented by a target ontology, and the similarity measurement also includes other related entities owing to certain relational features besides the entity itself.A concept of "universality" among classes in ontology representation is proposed in this thesis: if a feature possesses a high universality, the partition degree of a class would become low and the similarity would thus remain indistinguishable, namely, the larger a feature's universality becomes, the smaller the weight will get.And the following comes the detailed explanations.

Semantics Features
Since the ontology possesses many feature types such as tags, annotations, attributes, relations (parent class and subclass) and examples, the distinctive feature among entities is called "uniqueness" [9], [10].As a hypothesis, if the ontology of a feature is unique, meanwhile there is a class with same features in another ontology, then we consider the above ontologies equal to each other.Just as we can easily distinguish human beings when they were put into a group of animal by noticing the feature of "thought", since they are the only species who possess the ability of thinking.On the contrary, since they maintain the same features, it is hard to distinguish them when in a crowd.
This thesis defines as two different semantic forms which represent two feature-based ontologies with different feature types in related semantics similarities.For instance, related semantics for the string type "tag" and "annotation" might be a set of synonyms, while the relational semantics of a related feature might be a set of classes which connect via certain relations.If value(c,f) is defined to express Feature f's value of Class c, and sem(f,c) to express Feature f's semantic associated value of Class c, the formula for the value of Feature f, ontology c 1 and c 2 and can be defined as follows: Among which sem(f,c 1 ) and sem(f,c 2 ) are respectively the synonyms of value(c 1 ,f) and value(c 2 ,f).Besides, the similarities between binding property f, c 1 and c 2 can also be defined as follows: Moreover, as for the value of relational features such as "parent class", "subclass" and "example", it can be considered as a collection of ontologies which originate from a certain feature.And the relational features of c 1 and c 2 could thus be defined as the follows: The value of conceptual feature drawn from the above calculation can be used to calculate the value of weight of a feature.If we define O as the ontology, C for a set of entities which belongs to O, F for a group of features of C, which include "tags", "annotations", "parent class", "binds", "relations", "examples" and the like, the Among which n represents the amount of classes in C; and c i , c j are the classes of C, the weight of Feature f could thus be defined as the follows:

Similarity Measurement
If the weight of the entity features represented by the two ontologies is worked out, the similarity measurement of different classes can be calculated by integrating various feature weight, for the similarity measurement of class and feature inter influences each other during the process [11], [12].Since a class is described by a set of features, the similarity of features 1019 should be taken into account when doing similarity measurement.When make ontology mapping, iterative algorithm would be adopted in this thesis.Definitions: original ontology , c 1 and c 2 for class collection, F 1 and F 2 for feature collection.The entity 1 i e which described by a set of feature collection can be defined as

 
. And related Formula 7 is as follows: Among which the attribute value of SIM k is based on its type: (1) If X and Y are not in the same type, then SIM k (X,Y)=0 (2) If X and Y are in the same type such as "character type" or "numeric type" and X=Y, the SIM k (X,Y)=1, otherwise: (3) If X and Y are both entity sets, then: As is shown in Figure 1 the ontology representation of synonyms, the formula of similarity measurement of "Book", an entity in the source ontology and the one in target ontology is as follows:   As is shown in the above algorithm, if the respective entity number for Ontology O 1 and O 2 is n and m, the time complexity of the very algorithm would be O (n×m).

Experimental results and Analyses
The test data of this thesis is OAEI 2009 Corpus (http://oaei.ontologymatching.org/), and the evaluation of the performance standard remain to be precision rate, recall rate and Fmeasure [13].The respective definitions of the calculation formula are as follows: The test data includes 33 identified classes, 24 relations, 44 attributes, 56 examples and 20 examples with no attribute.The experiment has also compared the proposed method (AFW) with Lily, MapPSO, and TaxoMap, and as is displayed in Figure2 below, owing to the adoption of automatic feature weight calculation, the matching efficiency and the three performance standards have been improved significantly.

Conclusion
By emphasizing the importance to represent features via the method of weight and analyzing the semantics of features, this thesis has designed the computing model of entity weight and calculated the similarity weight among various relations.Due to the adoption of iteration method and automatic feature weight calculation, the Ontology-mapping efficiency has been improved in related experiments.Besides, it also possesses better characteristics in precision rate, recall rate and F-measure when comparing with other systems.Priorities would be given on the studies of improving the robustness and adjustable capability of the algorithm in the near future.

1 i e and 2 je 2 ie
for entities, and the classes and features also belongs to their ontologies O 1 and O 2 .In order to benefit the expression of the aforementioned algorithm, related variables are defined as follows: would be worked out after applying the iterative algorithm for k times and it would also be represented by


ISSN: 1693-6930 TELKOMNIKA Vol. 12, No. 4, December 2014: 1017 -1022 1020 Among which Similarity On Superclass, Similarity On Label and Similarity On Subclass are the corresponding similarity of features (Sim k ) and the feature weight (W) for features, super-class and sub-class by running the similarity measurement.

Figure 1 .
Figure 1.The Ontology Representation Figure 2. Comparison on Experimental of a Sample