Supplimentary data for paper:
Whole-genome bisulfite sequencing of multiple individuals reveals the quantitative roles of DNA methylation in transcriptional regulation
Shaoke Lou 1,2, Hao Qin 2, Jing-Woei Li 2, Zhibo Gao 3, Xin Liu 3, Landon L. Chan 1,2,
Vincent K. L. Lam4,5,6, Heung-Man Lee 4,5,6, Wing-Yee So 4,5,6, Ying Wang 4,5,6, Si
Lok 6, Jun Wang 3, Ronald C. W. Ma 4,5,6, Stephen Kwok-Wing Tsui 7,8,9, Juliana C. N.
Chan 4,5,6, Ting-Fung Chan 2,8,9,and Kevin Y. Yip 1,8,9
1 Department of Computer Science and Engineering,
2 School of Life Sciences,The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
3 Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, China
4 Department of Medicine and Therapeutics,
5 Hong Kong Institute of Diabetes and Obesity,
6 Li Ka Shing Institute of Health Sciences,
7 School of Biomedical Sciences,
8 Hong Kong Bioinformatics Centre,
9 CUHK-BGI Innovation Institute of Trans-omics,
The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
Two directories are in this folder.
raw/--the annoatation file and raw data used to extract methylation and expression features
trio.hg19.cout.nchrM.sort.bed: the methylome data of a trio-family. The data is lifted from hg18 to hg19
and is sorted by chromosome and start position
The format is: chr, start (0-based), end, methylation. where methylation is
represented by a string, which follows the regular expression:
(\d+)_(0|1)_(0|1)_(0|1)_(0|1). The group1 of thre regular expression is
position (1-based) based on hg18 genome; the group2,3,4 is methylation
status for trio-father, mother and daughter, and the number '1' means
methylated,'0' means unmethylated; The last group (4), present strand,
'1' is plus strand and '0' is minus strand.
(f|m|d).sort.sam.20120616t1.exp: expression level of RNA-Seq data for trio family. The format is : id,
gene_id, gene_name, chromosome, gene_length, reads_count, rpkm. 'f' means
father, 'm' means mother, and 'd' means daughter
gencode_v7.level12.sort.ucsc: UCSC bed format of annatation file, which includes level 1 and 2 of genes
from Gencode version 7 annotation file. The gene structure is composited
as described in Methods.
processed/-- .arff file for weka.
merge(M|MG|ML|MGL).121024.5sample.arff: merged dataset includes: trio-father (fam_id:0), trio-mother(fam_id: 1),
trio-daughter(fam_id:2), H1 (fam_id: 3), IMR90 (fam_id:4). 'M' means mCG,
'MG' means mCG/CG, 'ML' means mCG/Length, 'MGL' means mCG/CG/length
trio(M|MG|ML|MGL).newclass.121024.arff: trio dataset includes: trio-father (fam_id:0), trio-mother(fam_id:1)