WCLOUDVIZ: Word Cloud Visualization of Indonesian News Articles Classification based on Latent Dirichlet Allocation

Retno Kusumaningrum, Satriyo Adhy, Suryono Suryono


Latent Dirichlet Allocation (LDA) is a widely implemented approach for extracting hidden topics in documents generated by soft clustering of a word based on document co-occurrence as a multinomial probability distribution over terms. Therefore, several visualizations have been developed, such as matrices design, text-based design, tree design, parallel coordinates, and force-directed graphs. Furthermore, based on a set of documents representing a class (category), we can implement classification task by comparing topic proportion for each class and topic proportion for the testing document by using Kullback-Leibler Divergence (KLD). Therefore, the purpose of this study is to develop a system for visualizing the output of LDA as a classification task. The visualization system consists of two parts: bar chart and dependent word cloud. The first visualization aims to show the trend of each category, while the second visualization aims to show the words that represent each selected category in a word cloud. This visualization is subsequently called WCloudViz.  It provides clear, understandable and preferably shared the result.


Latent Dirichlet Allocation; Topic Modeling; News Articles Classification; Data Visualization; Word Cloud

DOI: http://dx.doi.org/10.12928/telkomnika.v16i3.8194


