Variant annotation system (VAS) is an online system that facilitates easy exploration of millions of genetic variants in a browser. It provides various types of information for exploring both coding and non-coding variants and their flanking regions, including latest whole-genome transcription factor binding, open chromatin and gene expression from the ENCODE consortium.

For citation: VAS: a convenient web portal for efficient integration of genomic features with millions of genetic variants by Eric Dun Ho, Qin Cao, Sau Dan Lee and Kevin Y Yip
BMC Genomics 2014, 15:886 (Full text)

VAS provides:

  • Batch annotation of millions of variants in one single job.
  • A large variety of datasets for exploring many types of information around the variant sites.
  • Inclusion of a flexible window of flanking regions for looking up information around each input variant.
  • Linking up with UCSC Genome Browser for further examining specific variants.

Step 1) Variant positions

You can choose either Text input or File input to enter a list of genetic variant positions.

  • Text input: Please enter one variant per line, in the format "<chr><space><position>".
  • File input: Please upload a file from your computer in VCF format. Currently we have a maximum file size limit of 300MB.

Step 2) Select datasets

VAS will annotate your genetic variants according to the datasets you select in this step.
After the button "click to select datasets" is clicked, a sub-window will be shown:

Fig 1. The select datasets sub-window, with “Chromatin State Classification” clicked

Please choose a category first by clicking its tab title. All datasets in this category will be listed at the bottom of the page.

You can use the search box to list only a specified subset of datasets quickly. The dropdown menu contains all the attributes of the datasets in this category, namely Cell Type, Region Type and Display Name in this case (Fig. 1). Select one of the items to search datasets using this attribute. The text box next to it allows you to enter the attribute value for looking up particular datasets. It also dynamically lists all attribute values that partially match your input, so that you can select the value from the list directly (Fig. 2). After clicking "Search", you will get a list of matching datasets. At any time, you can click the "Show all" button to clean up the present searching criteria and relist all datasets in this category.

Fig 2. The search box can list all attribute values that partially match your input

To select a dataset for annotating your genetic variants, simply check the check box next to its entry on the list of datasets. You can also specify a window of flanking regions to be considered when annotating each variant. By default, a window size of 1 is selected, which corresponds to a point search (i.e., considering only the variant position but not any flanking regions). The number of datasets chosen is shown in the lower-right corner (Fig. 3).

Fig 3. Selecting datasets and the selection status

If you want to select multiple datasets in the same category with the same flanking window size, you can do so easily using the "Batch selection" function After finishing your selection of datasets in a category, you can go on choosing datasets in other categories, as long as the total number of datasets selected does not exceed 20.
After all the datasets you want to include are selected, click the "Done" button and finish the selecting dataset step.
The final data selection status will be displayed in the main page. You can view the list of selected datasets by clicking it (Fig. 4). If you want to modify the selection, just click the "Click to select datasets" button again to return to the data selection step.

Fig 4. Data selection status shown on the main page after the data selection step

Step 3) Input user information

Please input your email address if you want to be notified by email when the results are ready for viewing.

VAS contains various types of genome-wide from different data sources. We classify these large datasets into different categories, which provide a quick overview of the different aspects of genetic variants that our users can explore.

Category Source
Chromatin State Segmentation ChromHMM track from ENCODE at UCSC Genome Browser, PMID: 22373907
Chromatin State Classification Yip et al., Genome Biology 2012, 13:R48 [data], PMID: 22950945
DNA Methylation Whole-genome Reduced-Representation Bisulfite Sequencing signal from Roadmap Epigenomics, PMID: 20944595
RNA-seq RNA-seq from ENCODE (Total long RNA, Whole-cell compartment, with 100bp binning), PMID: 22955616
Histone Modification, signal Histone modification ChIP-seq signals from ENCODE Analysis Hub (with 100bp binning), PMID: 22955616
Open Chromatin, peaks DNaseI hypersensitive sites from ENCODE, PMID: 22955616
Open Chromatin, signal DNase-seq and FAIRE-seq signals from ENCODE Analysis Hub (with 100bp binning), PMID: 22955616
Protein Binding Transcription factor binding ChIP-seq peaks (called by PeakSeq) from ENCODE Analysis Hub, PMID: 22955616
Single Nucleotide Polymorphism dbSNP (build 135 and 138), PMID: 11125122
Evolutionary Conservation Conserved elements in the conservation track from UCSC Genome Browser, PMID: 19858363, 16024819
Conserved TFBS Human-Mouse-Rat conserved transcription factor binding sites from UCSC Genome Browser, PMID: 12520026
CpG Island CpG island track from UCSC Genome Browser, PMID: 24270787
Sequence Uniqueness Mappability and uniqueness of the human reference genome from UCSC Genome Browser, PMID: 22276185
Simple Repeat Simple tandem repeats by Tandem Repeats Finder (TRF) from UCSC Genome Browser, PMID: 9862982
GWAS Catalog From NHGRI GWAS Catalog, PMID: 24316577
GENCODE GENCODE Gene Set V14, V19 from The GENCODE Project, PMID: 22955987
HGMD Human Gene Mutation Database (HGMD) Public Version from Biobase, PMID: 19348700