3.8 Sample Correlation Heatmap
Euclidean distances between VST-transformed samples reveal how similar samples are to each other globally. Replicates from the same condition should cluster together and show small (dark) distances. A sample that is more similar to the opposite condition than to its own replicates is a red flag for a labelling error or a failed experiment.
Variance-stabilising transformation (VST) is applied here because raw or normalised counts have heteroscedastic variance (high-count genes have much larger absolute variance).
VST removes this mean–variance dependence, making distances meaningful across the full expression range (Anders & Huber, 2010).
blind = TRUE,the VST is computed ignoring the experimental design. This is recommended for QC and exploratory analysis, where you want an unbiased view of sample similarity.
Use blind = FALSE only when the transformation is feeding into a model that already accounts for the design.
sampleDists <- dist(t(assay(vsd)))
sampleDistMatrix <- as.matrix(sampleDists)
rownames(sampleDistMatrix) <- paste(vsd$sample, vsd$condition, sep = " | ")
colnames(sampleDistMatrix) <- rownames(sampleDistMatrix)
colours <- colorRampPalette(rev(brewer.pal(9, "Blues")))(255)
pheatmap(
sampleDistMatrix,
col = colours,
border_color = NA,
main = "Sample-to-sample distances (VST) — E. coli MG1655",
fontsize = 10
)