Gene annotation in gorilla has been generated by projection of genes from the human reference genome as well as alignment of proteins from three major sources (in descending order of their contribution to the final gene set):
Projection of human genes to gorilla began with the alignment of gorilla genome to the latest human reference genome (GRCh37 assembly) using BLASTz. These alignments were used to project human Ensembl gene structures (Ensembl version 56) to the corresponding location in gorilla. About 60% of human protein-coding genes were projected onto the gorilla genome. Small insertions/deletions that disrupt the reading-frame of the resultant projected transcripts are corrected for by inserting "frame-shift" introns into the structure. For some human exons and parts of exons, the corresponding gorilla sequence is missing from the assembly. In most of these cases, the missing exon is omitted from the gorilla gene model. In a small number of cases however, where BLASTz has aligned the human sequence to a gap in the gorilla sequence, the exon is placed in the gap, resulting in a run of X's of the correct length in the translation.
Ensembl human translations were also aligned to the gorilla genome using Exonerate. The alignment of mammalian/vertebrate proteins and gorilla-specific proteins followed procedures in the standard Ensembl genebuild pipeline using Genewise.
The gene-building procedure on the gorGor3 assembly identified 20803 protein coding genes and 1553 pseudogenes.
Additional manual annotation of this genome can be found in
Vega