The Variant Effect Predictor tool which appears as an option when you click on Manage your Data allows you to upload a set of variation data and predict the effect of the variants.
Note that the input and output formats are completely different.
Data must be supplied in a simple tab-separated format, containing five columns, all required:
1 881907 881906 -/C + 5 140532 140532 T/C + 12 1017956 1017956 T/A + 2 946507 946507 G/C + 14 19584687 19584687 C/T - 19 66520 66520 G/A + 8 150029 150029 A/T +
An insertion is indicated by start coordinate = end coordinate + 1. For example, an insertion of 'C' between nucleotides 12600 and 12601 on the forward strand of chromosome 8 is indicated as follows:
8 12601 12600 -/C +
A deletion is indicated by the exact nucleotide coordinates. For example, a three base pair deletion of nucleotides 12600, 12601, and 12602 of the reverse strand of chromosome 8 will be:
8 12600 12602 CGT/- -
The popular VCF (version 4.0) and pileup formats are also supported as input.
The tool predicts the consequence of this variation, the amino acid position and change (if the variation falls within a protein) and the identifier of known variations that occur at this position. The output columns are:
Empty values are denoted by '-'.
11_224088_C/A 11:224088 A ENSG00000142082 ENST00000525319 Transcript NON_SYNONYMOUS_CODING 742 716 239 T/N aCc/aAc - SIFT=deleterious(0);PolyPhen=unknown(0) 11_224088_C/A 11:224088 A ENSG00000142082 ENST00000534381 Transcript 5_PRIME_UTR - - - - - - - 11_224088_C/A 11:224088 A ENSG00000142082 ENST00000529055 Transcript DOWNSTREAM - - - - - - - 11_224585_G/A 11:224585 A ENSG00000142082 ENST00000529937 Transcript INTRONIC,NMD_TRANSCRIPT - - - - - - HGVSc=ENST00000529937.1:c.136-346G>A 22_16084370_G/A 22:16084370 A - ENSR00000615113 RegulatoryFeature REGULATORY_REGION - - - - - - -
The VEP script (standalone) will also add a header to the output file. This contains information about the databases connected to, and also a key describing the key/value pairs used in the extra column.
## ENSEMBL VARIANT EFFECT PREDICTOR v2.1 ## Output produced at 2011-06-16 16:09:38 ## Connected to homo_sapiens_core_63_37 on ensembldb.ensembl.org ## Using API version 63, DB version 63 ## Extra column keys: ## HGNC : HGNC gene identifier ## ENSP : Ensembl protein identifer ## HGVSc : HGVS coding sequence name ## HGVSp : HGVS protein sequence name ## SIFT : SIFT prediction ## PolyPhen : PolyPhen prediction ## Condel : Condel SIFT/PolyPhen consensus prediction ## MATRIX : The source and identifier of a transcription factor binding profile aligned at this position ## HIGH_INF_POS : A flag indicating if the variant falls in a high information position of a transcription factor binding profile