4.5 Creating the DESeqDataSet
The DESeqDataSet (DDS) is DESeq2’s core data container. It holds the raw count matrix, sample metadata, and the experimental design formula. The design formula tells DESeq2 which variable to test — here ~ condition compares treatment vs control.
📘 Note: Prokaryote: DESeq2’s negative binomial model is organism-agnostic. It works identically for E. coli as for mouse or human data. ✅
samples_info$condition <- factor(samples_info$group,
levels = c("control", "treatment"))
dds <- DESeqDataSetFromMatrix(
countData = count_genes,
colData = samples_info,
design = ~ condition
)📘 Note: The design formula uses R’s formula syntax, where the tilde (~, pronounced “TIL-duh”)
means “is modelled by”. So ~ condition reads as: “gene expression is modelled by
condition”.
In a general linear model, the tilde separates the response variable (left side) from the predictors (right side). For example:
gene_expression ~ condition— model expression as a function of conditiony ~ x1 + x2— model y as a function of two predictors
In DESeq2, the left side is omitted because the count matrix is already the response —
you only need to specify which variable explains the differences between samples.
~ condition therefore tells DESeq2 to test whether gene counts differ between your
experimental groups.