3.8 Sample Correlation Heatmap

Euclidean distances between VST-transformed samples reveal how similar samples are to each other globally. Replicates from the same condition should cluster together and show small (dark) distances. A sample that is more similar to the opposite condition than to its own replicates is a red flag for a labelling error or a failed experiment.

Variance-stabilising transformation (VST) is applied here because raw or normalised counts have heteroscedastic variance (high-count genes have much larger absolute variance).

VST removes this mean–variance dependence, making distances meaningful across the full expression range (Anders & Huber, 2010).

vsd <- varianceStabilizingTransformation(dds, blind = TRUE)

blind = TRUE,the VST is computed ignoring the experimental design. This is recommended for QC and exploratory analysis, where you want an unbiased view of sample similarity.

Use blind = FALSE only when the transformation is feeding into a model that already accounts for the design.

sampleDists      <- dist(t(assay(vsd)))
sampleDistMatrix <- as.matrix(sampleDists)

rownames(sampleDistMatrix) <- paste(vsd$sample, vsd$condition, sep = " | ")
colnames(sampleDistMatrix) <- rownames(sampleDistMatrix)

colours <- colorRampPalette(rev(brewer.pal(9, "Blues")))(255)

pheatmap(
  sampleDistMatrix,
  col          = colours,
  border_color = NA,
  main         = "Sample-to-sample distances (VST) — E. coli MG1655",
  fontsize     = 10
)