Homology modeling and mutation prediction of ACE2 from COVID-19

SARS-CoV-2 has become a pandemic in the world. The virus binds to the Angiotensin-Converting Enzyme 2 (ACE2) receptor, which is found in epithelial cells such as in the lungs, to generate the pathology of COVID-19. It is essential to analyze the characteristics of ACE2 in understanding the development of the disease and study potential new drugs. The analysis was carried out using computer simulations to speed up protein analysis that utilized Artificial Intelligence technology, databases, and big data. Homology modeling is a method to exhibit homologous of protein families, hence the model and arrangement of protein sequences modeled are established. This research aims to determine the possibility of mutations in ACE2 by performing the mutation prediction. The result shows reliable homologous modeling with the score of GA341, MPQS, Z-DOPE, and TSVMod NO35 were 1; 1.28252; -0.47; and 0.793, respectively. Moreover, Gene Ontology (GO) analysis describes that ACE2 has a molecular transport function in cells while there are no mutations found occurred in ACE2 analyzed using SIFT and PROVEAN.


MATERIALS AND METHODS Materials
The research material is used a sequence of Angiotensin-converting enzyme 2 (ACE2) from humans with the Uniprot® database code (ID Q695T7). It is full-length human ACE2, in the presence of a neutral amino acid transporter B0AT1. The software is used UCSF Chimera 1.15®. It has forcefield tools to perform molecular dynamic analysis such as Parameters for solvation energy (PARSE), AMBER, SWANSON, CHARMM, PEOEPB, and TYL06. In addition, web servers are used, such as NCBI®, BLAST, MolProbity®, Clustal Omega®, Cofactor®, Sorting Intolerant From Tolerant (SIFT®), Protein Variation Effect Analyzer (PROVEAN), Modeler® and Swiss-Model®.
It is utilized because it opens access and user-friendly. The modeler has a script file so that parameters can be adjusted.

Homology modeling using swiss model
The preparation of the ACE2 enzyme sequence was carried out by searching the Homo sapiens (Human) ACE2 enzyme sequence in the Uniprot® database. Data from Uniprot is then inputted, and a model is made. Furthermore, validation was carried out using Molprobity® and analyzed in 3D using UCSF Chimera.

Homology modeling using modeller
Making enzyme models using the homology modeling method begins with a search for templates with the help of the NCBI® web server www.ncbi.nlm.nih.gov/. An alignment search is carried out using The Basic Local Alignment Search Tool (BLAST) to find templates that are similar to Homo sapiens ACE2 sequence. Then the blastp is input by entering the FASTA Sequence to find out the similarities of the sequences. Obtained five templates that have similarities, then the FASTA sequence is downloaded. After that, Multiple Sequence Alignment is carried out using Clustal Omega https://www.ebi.ac.uk/Tools/msa/clustalo/ then the Align is downloaded, then analyzed using the Modeler https://modbase.compbio.ucsf.edu/modweb/ and analyzed the 3D form using UCSF Chimera.

Prediction analysis of ligand binding site, gene ontology and enzyme commission using cofactor
Protein data from the Data Bank with code 6M17 was uploaded at https://zhanglab.ccmb.med.umich.edu/COFACTOR/ then analyzed the ligand-binding site and gene ontology.

Prediction of Mutations from ACE2
Mutation prediction analysis with interset G8790A using the Sorting Intolerant From Tolerant (SIFT) application https://sift.bii.a-star.edu.sg/ and also used the PROVEAN application http://provean.jcvi.org/seq_submit.php. Sequence sequences in Fasta format are inputted into SIFT Sequences and PROVEAN.

RESULTS AND DISCUSSION Homology modeling using swiss model
Setup the ACE2 sequence by inputting the Uniprot ID code file format Q695T7). Obtained amino acid sequence which is used to find the template in making ACE2 model. The sequence of the ACE2 sequence can be seen in Figure 1. This method is used to generate a previously unknown protein structure by "fitting" its sequence (target) into a known structure (template), with a certain degree of sequence homology (at least 30%) between the target and template (Sensoy et al., 2017). It is accurate for making structural models of proteins (Skariyachan and Garka, 2018). ACE2's 3D structure prediction uses the Swiss-Model webserver to obtain accurate modeling. In the Homology Modeling study, several steps must be taken, (1) It is identifying the target sequence using BLAST, (2) Perform Sequence Alignment, (3) Correct the alignment order, (4) Identify the backbone, (5) Loop modeling (6) Sidechain modeling using rotamer data, (7) Optimization of the model using energy minimization data, (8) Stereochemical model validation using the Ramachandran plot (Gromiha et al., 2018).
A template with swiss-model analysis, namely PDB 6M17 and 6M18. These two proteins are specific structures for identifying SARS-CoV-2 using the human ACE2 standard (Yan et al., 2020), which has 100 percent similarity. The tyrosinase enzyme modeling using the Swiss-Model® webserver was carried out automatically (automated mode).
Sequence alignment or better known as sequence alignment, is intended to determine the similarity between the target amino acid sequence and the template amino acid sequence. Sequence alignment between the ACE2 target and the template (PDB ID 6M17 and 6M18) was carried out using the assistance of the Swiss-Model® web server. The results of the alignment of the sequences with the Swiss-Model can be seen in Figure 2 and Figure 3.
The modeling of the ACE2 enzyme using the Swiss-Model is done by inputting data so that a 3D protein form is obtained. The results are obtained automatically by transferring atomic coordinates and aligning them with the target template. SWISS-MODEL relies on the OpenStructure computational structural biology framework (Biasini et al., 2013) and the database from Promod3 (Waterhouse et al., 2018). Swiss-Model® ACE2 3D model prediction assessment generated by using the QMEAN rating function. The QMEAN (Qualitative Model Energy Analysis) assessment function is a function that describes the geometric structure of proteins (Benkert et al., 2008). From the research data, the QMEAN score data obtained from the modeling is -4.50 for the 6M17 model and -4.92 for the 6M18, where the score shows the results of the modeling quality with low quality. The QMEAN score value ≤ −4.0 indicates that the model made has low quality.
Model Evaluation and Optimization The evaluation carried out on the ACE2 model from the results of homology modeling includes evaluating the stereochemical properties and spatial properties of the model. Evaluation of the stereochemical properties of the model was carried out using the Swiss Model web server by analyzing the Ramachandran and Molprobity® plots to determine the acquisition of the clash score and molprobity scores from the model. The Ramachandran plot is a two-dimensional graph depicting amino acid residues in the enzyme structure, where the angles φ (phi) as the x-axis and ψ (psi) as the y-axis are divided into four quadrants, the angular values of the spectrum from each axis range from −180° to +180° (Choudhuri, 2014). It is plot visualizes the do-and-not areas in the plane of dihedral angles. The quality of the homolog is poor if many forbidden regions are found (Wiltgen, 2018). Visualization of the results of the Ramachandran plot can be seen in Figure 4.
The evaluation of contacts between atoms is based on the clashscore assessment. Evaluation results can be seen in Table 1. Clashscore is the steric number that overlaps per 1000 atoms. The best clashscore value is <100. It has met the standard value in homologous modeling. The molprobity score is a combination of clashscore, percentage forbidden regions, and the percentage of bad rotamer, which reflects the value of the crystallographic resolution appropriate for the model. Judging from the Molprobity score, the data model has met the standard because it is <84. Both models do not meet the standard value of the standard value (> 98%). The two models' outlier value does not enter the standard criteria, namely <0.05%. Low-quality result of Swiss Model because steric hindrance between the Cβ side chain and the central chain atoms.

Homology modeling using modelers
The search for templates was carried out using the assistance of the NCBI® BLAST web server to find the protein structure that has the highest sequence similarity to the target amino acid sequence, which has been known experimentally in 3D structure. One of the tools that NCBI has that is commonly used to find similar sequences in the BLAST (Basic Local Alignment Search Tool), which looks for similarities in templates with the highest percentage of identity. The results of the sequence analysis using BLAST can be seen in Table 2, selected 5 sequences that have a similarity> 43%-followed by multiple sequence alignment using Clustal Omega. The percentage of similarity in identity is shown by the Query Cover, which is input into BLAST. A good sequence is one that has a similarity sequence> 30%. The Three-dimensional space (3D) form of modeling can be seen in Figure 5. It was obtained three 3D models. Modeler software brought one model with the protein data bank code 6M17. The Swiss model software received two 3-dimensional forms, namely the protein data bank code 6M17 and 6M18

Figure 5. Model 3D ACE2 Homologous
The results of the thread sequence and the model evaluation values using a modeler can be seen in Table 3. Model evaluation of the model is carried out, including the assessment of E-Value, GA341, MPQS, z-DOPE, and TSVMod NO35. The results obtained are included in the criteria (reliable). E-Value is a parameter used to compare the size form's suitability between the design and the actual shape of the score database. The standard score is <0.0001. GA341 is a value that can be trusted from statistical modeling with a standard value of ≥0.7. The MPQS is a composite score consisting of the template sequence identity, z-Dope value, and GA341. Z-Dope is a value that describes the folds in the template model with criteria that meet the standards, namely <0 (Webb and Sali, 2016). TSVMod NO35 is an estimated value overlap with the standard value, namely ≥40%. From the evaluation of the data obtained, the homologous modeling above fulfills the standard criteria.
The ACE2 variant exhibits a similar binding affinity for the SARS-CoV-2 spike protein as observed in the structure of the wild-type ACE2 and SARS-CoV-2 spike protein complex. However, the ACE2 alleles, rs73635825 (S19P) and rs143936283 (E329G), show marked variations in their intermolecular interactions with viral spike proteins (Hussain et al., 2020). It has the strongest binding interaction. The earliest isolates of SARS-CoV-2 were surprisingly well adapted to human ACE2, potentially explaining its rapid transmission (Piplani et al., 2020). Twenty residues interact with proteins and can bind to ACE2, of which five residues (Val445, Thr478, Gly485, Phe490, and Ser494). The interaction between ACE2 and the tertiary structure of the protein is different from that of ACE2 and the RBD protein monomer (Sakkiah et al., 2021).

Analysis of ligand binding site prediction, gene ontology (GO) using cofactor
Analysis of the Predicted Gene ontology and predicted binding side using the Cofactor can be seen in Table 4. Cofactor is an application that can predict the structure and function of proteins. Cofactor can categorize proteins such as Gene Ontology, Enzyme Commission, and Ligand-Binding sites from analogs and template homologs (Zhang et al., 2017). Gene ontology (GO) is a framework and concept for describing genes' function in all organisms. Gene ontology facilitates computational interpretation of biological systems. GO Term is a unique code to facilitate and classify gene ontologies (Dessimoz and Škunca, 2017). In terms of gene ontology, ACE2 has biological and molecular functions as a transporter activity. ACE2 is involved in the movement of molecules and ions from outside to into the cell or vice versa. It is essential to know the ligand and binding site in analyzing protein because this is the place where the drug will attach later. This study predicts ligand and binding sites, the results of which can be seen in Table 5.
Cs Score is a value in the range 0-1, the closer to the value 1, the more similar to the physicalchemical properties of the protein with the database. The potential Ligand is Leucine with predicted binding site residues number 49,51,52,53,125,129,277,278,280,283,431,435. Cofactor was predicted ligand and binding site by collecting binding site data and mapping it from the query. The ligands originating from the template are placed in the query structure using the query matrix and the template binding site. It was calculated with Monte Carlo then compute the ligand pose to figure the motion and rotation of the atoms. Grouping ligands and their superposition obtain the final ligand into queries with a cut off of 8Å (Zhang et al., 2017).

Prediction of Mutations from ACE2
Prediction of mutations using SIFT obtained ACE2 had not mutations. The mutations that were chosen to be compared were interset G8790A. G8790A is a gene that has a mutation in ACE2 hypertensive patients (Li, 2012). It was able to predict amino acid substitution that affects protein function based on sequence homology and amino acid physical properties (Sim et al., 2012). Amino acid position was obtained from 1 to 634. Mutation analysis was also carried out using the PROVEAN