Correlation of model quality between predicted proteins and their templates

Muhamed Adilović, Altijana Hromić-Jahjefendić

Abstract


Protein structure prediction is an important process that carries a lot of benefits for various areas of science and industry. Template modeling is the most reliable and most popular method, depending on the solved structures from the Protein Data Bank. An important part of it is template selection, using different methods, which is a challenging task that requires special attention because the proper selection of protein template can lead to a more accurate protein prediction. This study focuses on the relationships between predicted proteins, taken from the Swiss-model repository, and their templates, on a larger scale. Features of predicted proteins are taken into account, including protein length, sequence identity, and sequence coverage. Quality assessment scores are compared and analyzed between the predicted proteins and their templates. Overall, quality assessment scores of predicted proteins show a moderate positive correlation to the sequence identity with the templates. Moreover, the level of template quality seems to be transferred onto predicted proteins to a certain degree, because templates with higher quality scores will, on average, also allow for the modeling of predicted proteins with higher quality scores.

Keywords


protein structure prediction; protein quality assessment; template-prediction correlation; Protein Data Bank; Swiss-model

Full Text:

PDF

References


“Introduction to Proteins: Structure, Function, and Motion, Second Edition,” CRC Press. https://www.crcpress.com/Introduction-to-Proteins-Structure-Function-and-Motion-Second-Edition/Kessel-Ben-Tal/p/book/9781498747172 (accessed Oct. 02, 2019).

R. A. Chica, “Protein Engineering in the 21st Century,” Protein Sci. Publ. Protein Soc., vol. 24, no. 4, pp. 431–433, Apr. 2015, doi: 10.1002/pro.2656.

C. A. Orengo, A. E. Todd, and J. M. Thornton, “From protein structure to function,” Curr. Opin. Struct. Biol., vol. 9, no. 3, pp. 374–382, Jun. 1999, doi: 10.1016/S0959-440X(99)80051-7.

“Comparison of Crystallography, NMR and EM - Creative Biostructure.” https://www.creative-biostructure.com/comparison-of-crystallography-nmr-and-em_6.htm (accessed Oct. 30, 2019).

R. P. D. Bank, “RCSB PDB: Homepage.” https://www.rcsb.org/ (accessed Oct. 02, 2019).

A. Fiser, “Template-based protein structure modeling,” Methods Mol. Biol. Clifton NJ, vol. 673, pp. 73–94, 2010, doi: 10.1007/978-1-60761-842-3_6.

J. Lee, P. L. Freddolino, and Y. Zhang, “Ab Initio Protein Structure Prediction,” in From Protein Structure to Function with Bioinformatics, D. J. Rigden, Ed. Dordrecht: Springer Netherlands, 2017, pp. 3–35. doi: 10.1007/978-94-024-1069-3_1.

S. Vangaveti, T. Vreven, Y. Zhang, and Z. Weng, “Integrating ab initio and template-based algorithms for protein–protein complex structure prediction,” Bioinformatics, doi: 10.1093/bioinformatics/btz623.

S. Abeln, J. Heringa, and K. A. Feenstra, “Strategies for protein structure model generation,” 2017.

Y. Zhang, “Protein Structure Prediction: Is It Useful?,” Curr. Opin. Struct. Biol., vol. 19, no. 2, pp. 145–155, Apr. 2009, doi: 10.1016/j.sbi.2009.02.005.

J. Cheng, A. N. Tegge, and P. Baldi, “Machine Learning Methods for Protein Structure Prediction,” IEEE Rev. Biomed. Eng., vol. 1, pp. 41–49, 2008, doi: 10.1109/RBME.2008.2008239.

M. Gao, H. Zhou, and J. Skolnick, “DESTINI: A deep-learning approach to contact-driven protein structure prediction,” Sci. Rep., vol. 9, no. 1, pp. 1–13, Mar. 2019, doi: 10.1038/s41598-019-40314-1.

S. Wang, J. Peng, J. Ma, and J. Xu, “Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields,” Sci. Rep., vol. 6, p. 18962, Jan. 2016, doi: 10.1038/srep18962.

S. P. Nguyen, Y. Shang, and D. Xu, “DL-PRO: A novel deep learning method for protein model quality assessment,” in 2014 International Joint Conference on Neural Networks (IJCNN), Jul. 2014, pp. 2071–2078. doi: 10.1109/IJCNN.2014.6889891.

R. Cao, B. Adhikari, D. Bhattacharya, M. Sun, J. Hou, and J. Cheng, “QAcon: single model quality assessment using protein structural and contact information with machine learning techniques,” Bioinformatics, vol. 33, no. 4, pp. 586–588, Feb. 2017, doi: 10.1093/bioinformatics/btw694.

K. Uziela, D. Menéndez Hurtado, N. Shu, B. Wallner, and A. Elofsson, “ProQ3D: improved model quality assessments using deep learning,” Bioinformatics, vol. 33, no. 10, pp. 1578–1580, May 2017, doi: 10.1093/bioinformatics/btw819.

R. Cao, Z. Wang, Y. Wang, and J. Cheng, “SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines,” BMC Bioinformatics, vol. 15, no. 1, p. 120, Apr. 2014, doi: 10.1186/1471-2105-15-120.

C. L. P. Gupta, A. Bihari, and S. Tripathi, “Protein Classification using Machine Learning and Statistical Techniques: A Comparative Analysis,” ArXiv190106152 Cs Q-Bio Stat, Jan. 2019, Accessed: Oct. 02, 2019.

[Online]. Available: http://arxiv.org/abs/1901.06152

A. Dalkiran, A. S. Rifaioglu, M. J. Martin, R. Cetin-Atalay, V. Atalay, and T. Doğan, “ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature,” BMC Bioinformatics, vol. 19, no. 1, p. 334, Sep. 2018, doi: 10.1186/s12859-018-2368-y.

A. Runthala and S. Chowdhury, “Refined template selection and combination algorithm significantly improves template-based modeling accuracy,” J. Bioinform. Comput. Biol., vol. 17, no. 02, p. 1950006, Nov. 2018, doi: 10.1142/S0219720019500069.

S. Bienert et al., “The SWISS-MODEL Repository-new features and functionality,” Nucleic Acids Res., vol. 45, no. D1, pp. D313–D319, 04 2017, doi: 10.1093/nar/gkw1132.

M. Adilović and A. Hromić-Jahjefendić, “Feature Importance in the Quality of Protein Templates,” Period. Eng. Nat. Sci. PEN, vol. 9, no. 2, Art. no. 2, Apr. 2021, doi: 10.21533/pen.v9i2.1830.

“PDB101: Learn: Guide to Understanding PDB Data: Introduction,” RCSB: PDB-101. http://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/introduction (accessed Oct. 02, 2019).

G. J. Kleywegt and T. A. Jones, “Phi/psi-chology: Ramachandran revisited,” Struct. Lond. Engl. 1993, vol. 4, no. 12, pp. 1395–1400, Dec. 1996, doi: 10.1016/s0969-2126(96)00147-5.

H. Zhou and Y. Zhou, “Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction,” Protein Sci., vol. 11, no. 11, pp. 2714–2726, 2002, doi: 10.1110/ps.0217002.

R. Lüthy, J. U. Bowie, and D. Eisenberg, “Assessment of protein models with three-dimensional profiles,” Nature, vol. 356, no. 6364, pp. 83–85, Mar. 1992, doi: 10.1038/356083a0.

J. U. Bowie, R. Lüthy, and D. Eisenberg, “A method to identify protein sequences that fold into a known three-dimensional structure,” Science, vol. 253, no. 5016, pp. 164–170, Jul. 1991, doi: 10.1126/science.1853201.

R. A. Laskowski, M. W. MacArthur, D. S. Moss, and J. M. Thornton, “PROCHECK: a program to check the stereochemical quality of protein structures,” J. Appl. Crystallogr., vol. 26, no. 2, Art. no. 2, Apr. 1993, doi: 10.1107/S0021889892009944.

C. Colovos and T. O. Yeates, “Verification of protein structures: patterns of nonbonded atomic interactions,” Protein Sci. Publ. Protein Soc., vol. 2, no. 9, pp. 1511–1519, Sep. 1993, doi: 10.1002/pro.5560020916.

J. Pontius, J. Richelle, and S. J. Wodak, “Deviations from standard atomic volumes as a quality measure for protein crystal structures,” J. Mol. Biol., vol. 264, no. 1, pp. 121–136, Nov. 1996, doi: 10.1006/jmbi.1996.0628.

P. Benkert, M. Biasini, and T. Schwede, “Toward the estimation of the absolute quality of individual protein structure models,” Bioinforma. Oxf. Engl., vol. 27, no. 3, pp. 343–350, Feb. 2011, doi: 10.1093/bioinformatics/btq662.

M. Shen and A. Sali, “Statistical potential for assessment and prediction of protein structures,” Protein Sci. Publ. Protein Soc., vol. 15, no. 11, pp. 2507–2524, Nov. 2006, doi: 10.1110/ps.062416606.

K. Olechnovič and Č. Venclovas, “VoroMQA: Assessment of protein structure quality using interatomic contact areas,” Proteins Struct. Funct. Bioinforma., vol. 85, no. 6, pp. 1131–1145, 2017, doi: 10.1002/prot.25278.

W. R. Pearson, “An Introduction to Sequence Similarity (‘Homology’) Searching,” Curr. Protoc. Bioinforma. Ed. Board Andreas Baxevanis Al, vol. 0 3, Jun. 2013, doi: 10.1002/0471250953.bi0301s42.

X. Deng, J. Li, and J. Cheng, “Predicting Protein Model Quality from Sequence Alignments by Support Vector Machines,” J. Proteomics Bioinform., vol. Suppl 9, Nov. 2013, doi: 10.4172/jpb.S9-001.




DOI: http://dx.doi.org/10.21533/pen.v10i1.2018

Refbacks

  • There are currently no refbacks.


Copyright (c) 2022 Muhamed Adilović, Altijana Hromić-Jahjefendić

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN: 2303-4521

Digital Object Identifier DOI: 10.21533/pen

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License