DNA methylation is an epigenetic regulator of gene expression with important functions in development and diseases such as cancer. The modified cytosines, 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC), are routinely detected by sequencing Illumina libraries generated using either an enzyme-based workflow called EM-seqTM or by using bisulfite conversion. However, these methods cannot differentiate between 5mC and 5hmC. There is also increasing interest in identifying 5hmC sites due to its role in regulating gene expression in embryonic stem cells and a variety of neuronal cell types. Methods currently exist to enable discrimination of 5mC and 5hmC, for example oxBS-seq and TAB-seq, however these require bisulfite conversion leading to increased DNA damage. Here we describe a fully enzymatic method that identifies only 5hmC.
5hmC libraries are generated by ligating adaptors onto sheared DNA. Next, 5hmCs are glucosylated, which protects them from deamination by APOBEC. Unprotected cytosines and 5mCs are deaminated to uracil and thymine, respectively. Libraries are amplified and 5hmC is discriminated from cytosine and 5mC by Illumina sequencing. 5hmCs are sequenced as cytosines whereas 5mC and cytosine are sequenced as thymines. Additionally, subtracting 5hmC data from EM-seq data (detects 5mC and 5hmC) enables the precise localization of individual 5mCs and 5hmCs.
5hmC data were generated for 0.1 ng to 200 ng DNA isolated from adult human brain (Biochain). The global level of 5hmC in the CpG context was approximately 20% and was highly consistent across inputs. In addition, 5hmC levels were profiled during mouse E14 cell differentiation over a period of 10 days. 5hmC levels dropped from 3% to 0.7% over the time course as observed by both LC-MS quantification and Illumina sequencing. The 5hmC libraries had similar characteristics to EM-seq libraries, and as a result the libraries have expected insert sizes, low duplication rates and minimal GC bias. T4147 phage DNA was used as an internal control (all cytosines are 5hmC), with 97-99% of cytosines correctly identified. This provides a high level of confidence in the detection of 5hmC using this method.