Pathology - Posters D
Alternations in the methylation pattern during tumorigenesis have been recognized decades ago and the identification of informative methylation patterns continues to be the focus of multiple cancer studies. Given the platform’s suitability for screening large cohorts, Illumina Infinium Arrays have been the most popular among methods to identify genom-wide methylation profiles at a CpG-site resolution level. Our goal was to assemble an integrated database containing all available methylation and clinical data for publicly available samples and to analyze methylation changes contributing to altered gene expression in colorectal cancer. In order to enable comfortable examination of the database we also aim to create a web-based interactive platform.
Studies with publicly available raw intensity data files were systematically screened using the GEO Platform browser. After filtering by dataset origin, methylation data from colorectal cancer studies were acquired along with clinical and demographical data. In order to identify differentially methylated regions, gene level analysis was performed. Pearson correlation between methylation and gene expression in colorectal cancer and normal tissues was computed using Illumina HumanMethylation450K and RNA-seq data from the GDC (Genomic Data Commons) database. For web application development we used the shiny R package.
As a result, a database containing 2295 adenocarcinoma, adenoma and normal tissue data was established. When tumor and normal tissues were compared, promoter and first exon regions exhibited the highest fold change. Among the top 20 genes based on AUC value, there were multiple genes previously associated with malignancies such as POFUT1, DMBT1 and TDG in the TSS1500, PSMD11, LYPD5 and MIR16. in the TSS200, CPNE5, SLC9A1 and WEE1 in the 5’UTR, NAT8L, MERTK and A1BG in the first exon, SOD3, C11orf52 and RAPH1 in the body region and QPCTL, MEIS2 and CISH in the 3’UTR region. Among the genes with differentially methylated regions, the ones with strong negative correlation between methylation and expression (r< -0.7) were overrepresented in the case of promoter and first exon regions.
We assembled a sizeable database containing colorectal samples with genome-wide methylation data and established a pipeline for data processing. The presented database and web platform will be a useful starting point for biomarker discovery.