Poster Presentation 35th Lorne Cancer Conference 2023

Agreement of DNA methylation-based classifications of breast tumours obtained using different clustering strategies (#365)

Elaheh Zareanshahraki 1 , Shuai Li 1 2 , Ee Ming Wong 1 , Enes Makalic 2 , Roger Milne 1 2 3 , Graham Giles 1 2 3 , Catriona McLean 4 , Melissa Southey 1 3 5 , Pierre-Antoine Dugué 1 2 3
  1. Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, Victoria, Australia
  2. Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Victoria, Australia
  3. Cancer Epidemiology Division, Cancer Council Victoria, Melbourne, Victoria, Australia
  4. Anatomical Pathology, Alfred Health, The Alfred Hospital, Melbourne, Victoria, Australia
  5. Department of Clinical Pathology, Melbourne Medical School, The University of Melbourne, Parkville, Victoria, Australia

ABSTRACT

Background & aim: Many studies have applied clustering strategies to tumour-based genome-wide DNA methylation data to define disease subtypes. In the field of breast cancer, there is an apparent low degree of consensus across these studies, which complicates the interpretation of the resulting clusters and their potential clinical use. We assessed the agreement between methylation-based subtypes obtained from common clustering methods and analytical strategies.

Methods: We used genome-wide DNA methylation data (HM450K assay) measured in 409 breast tumours diagnosed in women from the Melbourne Collaborative Cohort Study. K-means clustering, hierarchical clustering, and partitioning around medoids (PAM) were applied to the full dataset and data subsets including the most variable CpGs. The adjusted rand index (ARI) was used to assess the agreement of methylation-based subtypes obtained from different clustering methods and varying numbers of clusters and CpGs analysed.

Results: Using the full dataset and k=2 clusters, there was moderate agreement of K-means and PAM (ARI=0.53) and poor agreement of K-means and hierarchical (ARI=0.27) and hierarchical and PAM clustering (ARI=0.32). Increasing the number of clusters resulted in poorer agreement (e.g. k=3, K-means vs. hierarchical clustering, ARI=0.13). Selecting the most variable CpGs generally resulted in better agreement compared with using the full dataset (e.g. 2,000 CpGs, K-means vs. hierarchical, ARI=0.77). Across combinations of numbers of clusters (k=2 to 5) and CpGs (500 to 50,000), the ARI typically ranged between 0.3 and 0.7. When a single method was used with different numbers of CpGs, there was generally good to excellent agreement for k=2 clusters (e.g. K-means, 2,000 vs. 10,000 CpGs, ARI=0.92), and moderate to poor for k>2 (e.g. k=4, hierarchical clustering, 2,000 vs. 10,000 CpGs, ARI=0.16).

Conclusion: While there was good agreement between methylation-based breast cancer subtypes generated from common clustering methods for k=2 clusters and using the most variable CpGs, the analytical strategy has a major impact on the clusters identified, which may have implications for their interpretation and clinical usefulness.