Integrated Social Media Knowledge Capture in Medical Domain of Indonesia

The Social Media Platforms, as the one of largest part of today data traffic on the Internet, disseminate a vast volume of information, including medical information in it. Knowledge management system (KMS) approach is applied with purpose to capture, maintain, and manage tacit or explicit knowledge available and collected within the social media platforms, organization’s database, knowledge base, or document repository. By adding Indonesian Natural Language Processing (InaNLP), Machine Learning and Data Mining approach, our research has proposed a framework which is theoretically designed to improve the previous research related to social media knowledge capture model and enhance its accuracy and reliability of knowledge retrieved compared to previous knowledge capture model. This system mainly aimed for medical practitioner to give a quick suggestion of the diseases regarding to the early diagnose which has been taken in the first place. On this current research state, the pre-processing phase of the framework implementation and knowledge presentation is our main concernto maximize the information value for the knowledge users and also to reduce the language issues in texts such as ambiguity, inconsistency, use of slang vocabulary, etc.According to this research’s goal, we have designed an algorithm to extract feature from dataset.


Introduction
Healthcare system in developing country nowadays are having a difficult state to accommodate health issues that occur within the country, especially in Indonesia.Despite the internet access is growing in big cities, people still have to make efforts to find a reliable source of knowledge of medical, in this context are drugs and medical actions should be taken.The knowledge become useful when it is maintained and can be accessible for those who need it to cure disease.Since the medical sector have become increasingly important today, knowledge creation and diffusion is necessary to cater in giving better health services and quick-response first aid actions.
Social media distributes a highly diverse of information through the internet, including text, image, video, etc.A vast number of media contents which are disseminated through the social media contain any kind of useful information that can be used in order to gain knowledge [1].By using the NLP and machine learning approach, it becomes possible to extract the specific knowledge for supporting the credibility of KMS designed.
Previous research in terms of Knowledge Capture System for Medical Domain has shown the prototype can predict the relevant knowledge which retrieved by the user with only 38.89% accuracy using 62 records of training dataset.This result caused due to its limitations: lack of collected data for creating the predictive model; and the knowledge capture model preprocessed the collected model manually [2].According to this research's goal, we have proposed a framework that could improve the previous knowledge capture model accuracy and reliability in the model creation, preprocessing, and knowledge sharing phase [3].Since our framework is generally designed based on literature review, it becomes our concern to set goals for the development and evaluation of the framework to be conducted in our works.
We try to develop the Web-based KMS according to this framework in pursuance of framework evaluation result whether the frameworkis actually able to improve the result of

1847
previous study.The pre-processing phase of the model implementation and knowledge presentation is our main concern on this current research state.We have designed an algorithm to extract feature from dataset to maximize the information value for the knowledge users and also to reduce the language issues in texts such as ambiguity, inconsistency, use of slang vocabulary, etc.In the future, information technologies will be obiquitously applied in healthcare and medical organizations in order to meet the medical requirements with proper health services, suggestions, or decision making.In addition, the use of information technology (IT) in medical domain enables a holistic management of medical knowledge and its secure exchange between healthcare or medical providers and recipients whom may receive its advantages [4].By integrating these approach, it creates a greater range of possibility to improve medical services, especially in Indonesia.

Literature Review 2.1. Perspective on Knowledge, Knowledge Management and Knowledge Capture
In many years, knowledge has been produced and used by human to improve and enhance their value of life.Despite, the usefulness of knowledge itself is relative to humans understanding, insight, and how knowledge can be used in practice [5].Since knowledge deliver values, people conduct knowledge exchange in many ways, in example discussion of two people through a deep conversation and it is documented both formally and informally [6].
According to Ackoff's idea on DIKW (Data, Information, Knowledge, Wisdom) hierarchy [7], knowledge can be generated when the body of information is organized in a particular way and connected each other in order to deliver message or fact.This knowledge generation may be executed by individual or group [8].In knowledge management framework context, four steps of Wiig's KM cycle as shown on Figure 1 represent how knowledge is created and used as individuals or as groups [5,8].This also relates to the process how the knowledge is being captured.Greenes Consulting [9] declare 10-step guide to knowledge capture which helps codifying the knowledge process.This makes possible to enhance and measure the KM process by associating the 10-step guide to knowledge capture for structuring, managing, and According to the following steps, it is possible to enhance the KM process by associating the 10-step guide to knowledge capture for a better understanding of knowledge management.

Natural Language Processing for Indonesian Language
Nowadays, NLP applications depend on various pre-processing steps, such as Chunking, Tokenization, Stemming, Lemmatization, POS tagging, sentence boundary identification, Named Entity Recognition (NER), etc. [13].However, methods used in the NLP for text mining have limitations in the matter of ambiguity and inconsistency within sentences.
Indonesian language (Bahasa Indonesia) has its own characteristic due to the region influences and historical vocabulary development.Bahasa Indonesia has an indulgent sentence structure, particularly in daily conversation both in a form of text or speech communication.Although the word order within a phrase is changed or some of the words are missing, the sentence is still can be understood roughly [14].An architecture has been developed to solve this problem.InaNLP System Architecture as shown in Figure 2 is developed with concerning the sentence structure of Bahasa Indonesia.InaNLP contains a number of NLP module where each module can be flexibly applied and added as a dependency by the user [14].

Machine Learning and Data Mining
Machine Learning and Data mining has become an important research area since the market of database industry needs of extracting valuable knowledge from data sources is constantly growing.Extracting implicit and potential data is a part of knowledge discovery process which become a generic goal for industry [15].In order to cater this goal, various of machine learning algorithm has been developed such as decision trees, rule-based, bayesian network, nearest-neighbour, nonlinear regression, etc [16].Combining the data analytical needs and modeling tools, several data mining frameworks have been developed, in example CRISP-DM (Cross-industry Standard Process for Data Mining) [17].This framework is generic and it is able to carry out data mining project in many industry sectors.

Social Media in Healthcare & Medical Sector
Recently, there are online doctor communities among professional healthcare and medical on social networks which exchange information on such subject related to clinical diagnosis, bodily symptoms, treatment options, experience of medical evidence, and their opinion about the medical treatment [18].Moreover, 29% of patients looking for health information through social media and viewing others experiences with particular disease [19].As a result of that, patients are more independent and well prepared when they are equipped with medical information and this leads to a greater influence of patients' decision making [20,21].

Existing Work-social Media Knowledge Capture Model. Case Study: Medical Knowledge
Cestyakara [2] proposed a knowledge capture model, as shown in Figure 3, for optimizing knowledge management process.This model implements KMS and data mining concepts.These activities in this framework are derived from the 10-step Guide to Knowledge Capture [9].Knowledge source of this conducted research is retrieved from the social media through the Facebook Graph API which contains of medical information within forum discussions.In the pre-processing step, the texts which gathered from the social media is processed using manual approach to extract the features and build a training corpus.Using WEKA [22] as a tool for classification learning, this model produces predicted categories based on 11 medical features (e.g.tips, treatment, disease, symptom, etc.).
Feedbacks and effectiveness of the predicted knowledge are tested through the expert judgment and online forum/fanpage discussion.According to this model, there are opportunities to improve the accuracy of the existing model and add automations on pre-processing the texts retrieved.This research is conducted to continue the improvement of KMS model by adopting relevant approaches.

Integrated Social Media Knowledge Capture
Respecting to the methodology and approach used in this research, a frameworkis proposed to comprehend the optimization of the previous model.Our goal with the proposed

Integrated Knowledge Capture Model
We evaluate Cestyakara's model [2] in each phases to gain the understanding and to know how far the model can elicit useful medical knowledge.This previous research proves that social media knowledge capture model can be implemented using existing technologies.As the result of this model implementation, there are some improvements required in the information pre-processing step by shifting the manual process into an automated by using an NLP technology, and prepare a larger dataset in order to achieve a more accurate training model.
Due to the evaluation of previous model, we establish a model which is theoretically possible to gain a better result.Since our proposed model accommodate end-to-end social media functionality and text processing improvement, several approaches are adopted to cater the research objective.According to the Wiig's KM Cycle [4], the proposed model follows the sequence of the knowledge management phase with associating the 10-step guide of knowledge capture on each section [19].

Integrated Social Media Knowledge Capture Framework
Since this model is a theoretical design, we combined concepts which has been explained in the Chapter 2 to get an end-to-end knowledge capture framework.There are four steps of this framework to be implemented as shown on Figure 4. First, in the Build Knowledge phase, source of knowledge is prepared through health and medical forums.In social network posts, comments, and tweets contain meaningful knowledge that can be captured through a natural conversations.Specific subject of conversation naturally emerged through the contents shared by those who shares their experience about their health condition whether it is a symptom, disease, or health problem.After identifying the content of conversation, it is possible to determine whether the information captured is structured or unstructured.In the Hold Knowledge phase, captured information from the data gathered through the social media sources are filtered and transformed in purpose to gain meaningful sources of information needed which is used in generating a better quality of training model.Information retrieved is collected in a raw data form which has to be going through preprocessed step.This collected raw data saved in a DBMS (Database Management System) to make sure that raw data is stored and managed in a proper manner.
In the Pool Knowledge phase, as mentioned in the NLP section, inconsistency language used in the retrieved texts need to be formalized and pre-processed in the NLP module to extract medical features contained in it [14].During this phase, InaNLP Framework modules are 1851 activated for pre-processing the information gathered into a new form of pre-processed text dataset.The system model relies on lexical dictionary, word formalizer, set of NER, word tokenization, and POS Tagging rule, and training dataset from Bahasa Indonesia language in order to gain patterns of collected information in determining the medical feature.List of sentences are formalized in the first place before it is processed through the tokenizer and stemmer.Sentence splitter works in case where posts are retrieved in a paragraph form into a list of sentence.Each words and punctuation will be tokenized and stemmer work to transform tokenized word into its root word.After that, POS Tagger determined the Indonesian language tag for each word as an additive information.The features are determined information characteristic as default within the sentence.Prepared with Medical Named Entity List as a corpus, this model will assign each instance of datasets with binary record within every instance for every determined features.We prepare a rule set to determine the binary set for each instance based on what kind of gazette that match with particular parsed word.Our model adopts the multi-label classifier to handle context of medical terms used or matched named entity for each instance to construct a dataset of binary record [23].This because of the execution of Agastya's model determined a sentence with a single label while it possibly contains many of categories that would be classified and beneficial in other aspect of information [3].In example, a patient can make a statement of his disease, action, body part, and medicine in a single sentence.We see that particular sentence has multiple information in it, then is should be handled in multi-label to maximize of capturing the information.In multi-label classifier, all set of features are able to be assigned as labels [23][24][25].Therefore, the rule set must have a method to handle what kind of tag that should be marked as a label/predictive label beforehand.Multi-label dataset training allows predictive model to be produced and used for the predicting the input for the next captured information and evaluate the result.This predictive model and are stored in the file storage.After that, the result structured information is stored into a database management system to be accessed in the Use Knowledge phase.
In the Use Knowledge phase, the accuracy and retrieval of structured information is tested by developing a web 2.0 platform to evaluate the performance of conveyed medical knowledge capture is for patients or medical practitioners through information tags search on it and ensure the requirement of this system accommodates patient's expectation [26].Furthermore, knowledge is evaluated not only through the predictive phase but also by asking for expert judgment.

Model Comparison
Since this framework is designed by combining various perspectives and existing model, it is important to examine in which aspect that this framework is excel or deprived.Therefore, we try to compare our framework to previous model [2] and CRISP-DM model [17] as shown in Table 1.
According to the Table 1, there are steps that nearly similar both in the Agastyas' model and our framework.Since our objectives is to improve the cycle, our model does not include the "Determine Record Schedule" activity.We define the condition of Build Knowledge step which is necessary to be conducted if the source of knowledge does not exist yet.Thus, if the knowledge does exist and is able to be retrieved from social media source, we can manage to collect data as much as required to build a dataset in order to improve model prediction accuracy with or without a schedule.
In our model, capturing information from social media and store it to a DBMS are distinct activities in particular KM Cycle Step with purpose to differentiate the understanding of the way we retrieved the data then hold it into a form that can be processed later.On the other hand, agastyas' previous model do the storing the data and the predictive model several steps after the phase where we obtain the knowledge.We take a preprocessing activity in agastyas' model into a more detail step.Since our model automate the preprocessing step using InaNLP module, constructing the dataset for predictive model training require additional steps in the back-end side.As explained in the previous section, our model consider the context of medical features to minimize the loss of information by using the single label predictive model.Thus, we apply the multi-label approach in our model [23,25,27].
We review the CRISP-DM Framework to obtain understanding of data mining process to enhance our machine learning steps.Our understanding to the CRISP-DM is mapped in the  1.Since the CRISP-DM is designed for industrial sector and to provide analytical reports, it has a different purpose to our model.Our model is purposed to provide knowledge related to medical topic and relevant solution to meet particular medical issue.Due to this review, we realized in CRISP-DM approach does not apply the evaluate step in the end of cycle while our model require this step in the last phase of cycle to gain feedbacks from user and experts after they make use of the provided knowledge by the system which adopt our knowledge capture model [17].Detailed implementation of the framework on Indonesian medical knowledge model is shown on Figure 5. System capture information from social media such as Facebook and Twitter as representation of Social Media in Healthcare and Medical Domain information exchange media and media repository for discussion history, and store the result in a CSV file.These steps are the implementation of Build Knowledge and Hold Knowledge in KM Cycle Steps.As representation of Pool Knowledge in KM Cycle Steps, the pre-processing phase of the model implementation and knowledge presentation is our main concern on this current research state.We are using InaNLP to reduce the language issues in texts such as ambiguity, inconsistency, use of slang vocabulary, informal sentences, etc. Example of preprocessing step of informal sentence which contain medical information using InaNLP is shown in Figure 6.We have designed an algorithm as shown on Figure 7 to extract feature from preprocessed dataset to maximize the information value for the knowledge users.
The algorithm will scan the file for certain attributes pattern for each social media information record.If a record has word that exist in word list of medical NER, system will write a flag in related attributes of medical NER.For example, one of medical NER is Disease.Disease has certain features/attributes like definition, medicine, prevention, indication, etc.After feature extraction, the next phase is system will check multi-label pattern which match one of existing rules.Rules will identify classification of social media information record and store itas processed knowledge complete with related medical NER and its feature/atrributes.As representation of Use Knowledge, the last phase in KM Cycle Steps, our future work is implementation Web Based KM for processed knowledge dissemination.

Conclusion
Regarding to the healthcare and medical sector issues in Indonesia, many healthcare organizations is striving to increase the dependency of integrated knowledge management system so that they can maximize the use of medical knowledge and capture it for taking medical actions.It also brings a beneficial impact to the organization increasing their medical service quality and time efficiency.
In this research, we suggest an integrated approach which helps healthcare or medical organization to establish an end-to-end knowledge management system and enables stakeholders and medical practitioners to use and share knowledge in an effective way.We already create a framework of integrated knowledge capture.We also implemented the framework to build, hold, and pool knowledge.Although this framework theoretically improves several aspects of the previous research, yet it still has to be proven by implementing this framework into a prototype which can produce objective result.Our future work is to leverage the knowledge generated from research by building Web-based KMS for knowledge dissemination.


ISSN: 1693-6930 TELKOMNIKA Vol. 16, No. 4, August 2018: 1846-1856 1850 frameworkis to enable patient to capture hierarchical useful medical knowledge since the model act as assistive media to the patient.

Figure 5 .
Figure 5. Integrated Knowledge Capture Framework Implementation on Indonesian Medical Knowledge Model

Figure 6 .Figure 7 .
Figure 6.Preprocessing step of informal sentence which contain medical information using InaNLP 1855

Table 1 .
Knowledge Model Comparison

Integrated Knowledge Capture Framework Implementation on Indonesian Medical Knowledge Model We
implement theframework in pursuance of framework evaluation result whether the framework is actually able to improve the result of previous study.The implementation approaches for each activity of Integrated Knowledge Capture Frameworkis shown in Table2.