Data Cleaning Service for Data Warehouse: An Experimental Comparative Study on Local Data

Arif Bramantoro

Abstract


Data warehouse is a collective entity of data from various data sources. Data are prone to several complications and irregularities in data warehouse. Data cleaning service is non trivial activity to ensure data quality. Data cleaning service involves identification of errors, removing them and improve the quality of data. One of the common methods is duplicate elimination. This research focuses on the service of duplicate elimination on local data. It initially surveys data quality focusing on quality problems, cleaning methodology, involved stages and services within data warehouse environment. It also provides a comparison through some experiments on local data with different cases, such as different spelling on different pronunciation, misspellings, name abbreviation, honorific prefixes, common nicknames, splitted name and exact match. All services are evaluated based on the proposed quality of service metrics such as performance, capability to process the number of records, platform support, data heterogeneity, and price; so that in the future these services are reliable to handle big data in data warehouse.

Keywords


data cleaning service; data warehouse; data quality; local data;

Full Text:

PDF


DOI: http://dx.doi.org/10.12928/telkomnika.v16i2.7669

Article Metrics

Abstract view : 150 times
PDF - 263 times

Refbacks

  • There are currently no refbacks.


Copyright (c) 2019 Universitas Ahmad Dahlan

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

TELKOMNIKA Telecommunication, Computing, Electronics and Control
ISSN: 1693-6930, e-ISSN: 2302-9293
Universitas Ahmad Dahlan, 4th Campus, 9th Floor, LPPI Room
Jl. Ringroad Selatan, Kragilan, Tamanan, Banguntapan, Bantul, Yogyakarta, Indonesia 55191
Phone: +62 (274) 563515, 511830, 379418, 371120 ext. 4902, Fax: +62 274 564604

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

View TELKOMNIKA Stats