Revisiting the challenges and surveys in text similarity matching and detection methods

Authors

  • Alva Hendi Muhammad Universitas Amikom Yogyakarta
  • Kusrini Kusrini Universitas Amikom Yogyakarta
  • Irwan Oyong Universitas Amikom Yogyakarta

Keywords:

Text similarity, Similarity detection, Document similarity, Text matching, Natural language processing

Abstract

The massive amount of information from the internet has revolutionized the field of natural language processing. One of the challenges was estimating the similarity between texts. This has been an open research problem although various studies have proposed new methods over the years. This paper surveyed and traced the primary studies in the field of text similarity. The aim was to give a broad overview of existing issues, applications, and methods of text similarity research. This paper identified four issues and several applications of text similarity matching. It classified current studies based on intrinsic, extrinsic, and hybrid approaches. Then, we identified the methods and classified them into lexical-similarity, syntactic-similarity, semantic-similarity, structural-similarity, and hybrid. Furthermore, this study also analyzed and discussed method improvement, current limitations, and open challenges on this topic for future research directions.

Downloads

Published

2022-09-30

Issue

Section

Computational Intelligence