Variant annotation system (VAS) is an online system that facilitates easy exploration of millions of genetic variants in a browser. It provides various types of information for exploring both coding and non-coding variants and their flanking regions, including latest whole-genome transcription factor binding, open chromatin and gene expression from the ENCODE consortium.
For citation: VAS: a convenient web portal for efficient integration of genomic features with millions of genetic variants by Eric Dun Ho, Qin Cao, Sau Dan Lee and Kevin Y Yip
BMC Genomics 2014, 15:886 (Full text)
- Batch annotation of millions of variants in one single job.
- A large variety of datasets for exploring many types of information around the variant sites.
- Inclusion of a flexible window of flanking regions for looking up information around each input variant.
- Linking up with UCSC Genome Browser for further examining specific variants.
Step 1) Variant positions
You can choose either Text input or File input to enter a list of genetic variant positions.
- Text input: Please enter one variant per line, in the format "<chr><space><position>".
- File input: Please upload a file from your computer in VCF format. Currently we have a maximum file size limit of 300MB.
Step 2) Select datasets
VAS will annotate your genetic variants according to the datasets you select in this step.
After the button "click to select datasets" is clicked, a sub-window will be shown:
Please choose a category first by clicking its tab title. All datasets in this category will be listed at the bottom of the page.
You can use the search box to list only a specified subset of datasets quickly. The dropdown menu contains all the attributes of the datasets in this category, namely Cell Type, Region Type and Display Name in this case (Fig. 1). Select one of the items to search datasets using this attribute. The text box next to it allows you to enter the attribute value for looking up particular datasets. It also dynamically lists all attribute values that partially match your input, so that you can select the value from the list directly (Fig. 2). After clicking "Search", you will get a list of matching datasets. At any time, you can click the "Show all" button to clean up the present searching criteria and relist all datasets in this category.
To select a dataset for annotating your genetic variants, simply check the check box next to its entry on the list of datasets. You can also specify a window of flanking regions to be considered when annotating each variant. By default, a window size of 1 is selected, which corresponds to a point search (i.e., considering only the variant position but not any flanking regions). The number of datasets chosen is shown in the lower-right corner (Fig. 3).
If you want to select multiple datasets in the same category with the same flanking window size, you can do so easily using the "Batch selection" function After finishing your selection of datasets in a category, you can go on choosing datasets in other categories, as long as the total number of datasets selected does not exceed 20.
After all the datasets you want to include are selected, click the "Done" button and finish the selecting dataset step.
The final data selection status will be displayed in the main page. You can view the list of selected datasets by clicking it (Fig. 4). If you want to modify the selection, just click the "Click to select datasets" button again to return to the data selection step.
Step 3) Input user information
Please input your email address if you want to be notified by email when the results are ready for viewing.
VAS contains various types of genome-wide from different data sources. We classify these large datasets into different categories, which provide a quick overview of the different aspects of genetic variants that our users can explore.
|Chromatin State Segmentation||ChromHMM track from ENCODE at UCSC Genome Browser, PMID: 22373907|
|Chromatin State Classification||Yip et al., Genome Biology 2012, 13:R48 [data], PMID: 22950945|
|DNA Methylation||Whole-genome Reduced-Representation Bisulfite Sequencing signal from Roadmap Epigenomics, PMID: 20944595|
|RNA-seq||RNA-seq from ENCODE (Total long RNA, Whole-cell compartment, with 100bp binning), PMID: 22955616|
|Histone Modification, signal||Histone modification ChIP-seq signals from ENCODE Analysis Hub (with 100bp binning), PMID: 22955616|
|Open Chromatin, peaks||DNaseI hypersensitive sites from ENCODE, PMID: 22955616|
|Open Chromatin, signal||DNase-seq and FAIRE-seq signals from ENCODE Analysis Hub (with 100bp binning), PMID: 22955616|
|Protein Binding||Transcription factor binding ChIP-seq peaks (called by PeakSeq) from ENCODE Analysis Hub, PMID: 22955616|
|Single Nucleotide Polymorphism||dbSNP (build 135 and 138), PMID: 11125122|
|Evolutionary Conservation||Conserved elements in the conservation track from UCSC Genome Browser, PMID: 19858363, 16024819|
|Conserved TFBS||Human-Mouse-Rat conserved transcription factor binding sites from UCSC Genome Browser, PMID: 12520026|
|CpG Island||CpG island track from UCSC Genome Browser, PMID: 24270787|
|Sequence Uniqueness||Mappability and uniqueness of the human reference genome from UCSC Genome Browser, PMID: 22276185|
|Simple Repeat||Simple tandem repeats by Tandem Repeats Finder (TRF) from UCSC Genome Browser, PMID: 9862982|
|GWAS Catalog||From NHGRI GWAS Catalog, PMID: 24316577|
|GENCODE||GENCODE Gene Set V14, V19 from The GENCODE Project, PMID: 22955987|
|HGMD||Human Gene Mutation Database (HGMD) Public Version from Biobase, PMID: 19348700|