Analysis of diabetes mellitus gene expression data using two-phase biclustering method

Authors

  • Rahmat Al Kafi Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Indonesia
  • Alhadi Bustamam Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Indonesia
  • Wibowo Mangunwardoyo Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Indonesia

DOI:

https://doi.org/10.26555/konvergensi.v0i0.22111

Keywords:

Clustering, Singular Value Decomposition, K-Means, Silhouette

Abstract

The purpose of this research is to find bicluster from Type 2 Diabetes Mellitus genes expression data which samples are obese and lean people using two-phase biclustering. The first step is to use Singular Value Decomposition to decompose matrix gene expression data into gene and condition based matrices. The second step is to use K-means to cluster gene and condition based matrices, forming several clusters from each matrix. Furthermore, the silhouette method is applied to determine the number of optimum clusters and measure the accuracy of grouping results. Based on the experimental results, Type 2 Diabetes Mellitus dataset with 668 selected genes produced optimal biclusters, with six biclusters. The obtained biclusters consist of 2 clusters on the gene-based matrix and 3 clusters on the sample-based matrix with silhouette values, respectively, are 0.7361615 and 0.7050163.

References

R. D. Cahyaningrum, A. Bustamam, and T. Siswantining, “Implementation of spectral clustering with partitioning around medoids (PAM) algorithm on microarray data of carcinoma,†2017, p. 020007, doi: 10.1063/1.4978976.

Frisca, A. Bustamam, and T. Siswantining, “Implementation of spectral clustering on microarray data of carcinoma using k-means algorithm,†2017, p. 020008, doi: 10.1063/1.4978977.

G. Ardaneswari, A. Bustamam, and D. Sarwinda, “Implementation of plaid model biclustering method on microarray of carcinoma and adenoma tumor gene expression data,†J. Phys. Conf. Ser., vol. 893, p. 012046, Oct. 2017, doi: 10.1088/1742-6596/893/1/012046.

A. Bustamam, S. Formalidin, and T. Siswantining, “Clustering and analyzing microarray data of lymphoma using singular value decomposition (SVD) and hybrid clustering,†2018, p. 020220, doi: 10.1063/1.5064217.

W.-H. Yang, D.-Q. Dai, and H. Yan, “Finding Correlated Biclusters from Gene Expression Data,†IEEE Trans. Knowl. Data Eng., vol. 23, no. 4, pp. 568–584, Apr. 2011, doi: 10.1109/TKDE.2010.150.

I. Jolliffe, “Principal Component Analysis,†in Encyclopedia of Statistics in Behavioral Science, Chichester, UK: John Wiley & Sons, Ltd, 2005.

O. Alter and G. H. Golub, “Singular value decomposition of genome-scale mRNA lengths distribution reveals asymmetry in RNA gel electrophoresis band broadening,†Proc. Natl. Acad. Sci., vol. 103, no. 32, pp. 11828–11833, Aug. 2006, doi: 10.1073/pnas.0604756103.

I. Bin Mohamad and D. Usman, “Standardization and Its Effects on K-Means Clustering Algorithm,†Res. J. Appl. Sci. Eng. Technol., vol. 6, no. 17, pp. 3299–3303, Sep. 2013, doi: 10.19026/rjaset.6.3638.

L. Kaufman and P. J. Rousseeuw, An introduction to cluster analysis. John Wiley and Sons, Incorporated, 1990.

G. S. Maddala, “Introduction to Econometrics,†Introd. to Econom. (2nd ed.). New York MacMillan, pp. 88–96, 1992.

A. Bustamam, S. D. Puspa, and T. Siswantining, “Implementation of co-similarity measure on microarray data of lymphoma using K-means partition algorithm,†2018, p. 020222, doi: 10.1063/1.5064219.

S. Dudoit, J. Fridlyand, and T. P. Speed, “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data,†J. Am. Stat. Assoc., vol. 97, no. 457, pp. 77–87, Mar. 2002, doi: 10.1198/016214502753479248.

T. R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,†Science (80-. )., vol. 286, no. 5439, pp. 531–537, Oct. 1999, doi: 10.1126/science.286.5439.531.

NCBI, “Diabetes Melitus Data: Obese patients with and without type 2 diabetes: liver,†2009. https://www.ncbi.nlm.nih.gov/sites/GDSbrowser accessed on March 2020.

Y. Shao, H. Shao, M. S. Sawhney, and L. Shi, “Serum uric acid as a risk factor of all-cause mortality and cardiovascular events among type 2 diabetes population: Meta-analysis of correlational evidence,†J. Diabetes Complications, vol. 33, no. 10, p. 107409, 2019, doi: https://doi.org/10.1016/j.jdiacomp.2019.07.006.

Downloads

Published

2021-10-18

Issue

Section

Articles