The gram-negative endophytic bacterium Gluconacetobacter diazotrophicus is found in plants such as sugarcane, pineapple, coffee and sweet potato. This bacterium fixes atmospheric nitrogen, produces plant growth-promoting hormones, bacteriocins and helps solubilization of zinc compounds1,2. 3778 protein sequences were predicted in the G. diazotrophicus PAL5 genome3 [Refseq: NC_010125]. We investigated these 3778 proteins in order to compare the predictions made by conventional annotation and structural annotation. Conventional annotation is based only on primary sequences of proteins and structural annotation is based on information from the three-dimensional structure of these proteins4,5.
Protein structural properties were analyzed in large scale by MHOLline6,7 and ASAProt workflow8. MHOLline combines a specific set of programs for comparative modeling approach. ASAProt is a computational workflow for structural annotation projects.
The 3D models were built and evaluated with Ramachandran plot, RMSD and BATS classification by MHOLline. Furthermore, structural domains and superfamilies of the proteins were analyzed by fastSCOP and 3DBlast. Simultaneously, the primary sequences of proteins were analyzed by conventional approaches like COG, SMART and Uniprot.
We have successfully constructed 3D models with excellent stereochemistry quality for 1390 proteins. It was possible to compare the conventional annotation and the structural annotation for 1225 sequences. Similar functions have been predicted by both approaches for 58% of these sequences. Further, from 96 sequences predicted as hypothetical proteins by conventional annotation, we have inferred function for 53 sequences by structural annotation. In conclusion we note that the structural annotation should be used as a complement and improvement of the conventional annotation in genomics projects.