3.3 Loading Count Data
The nf-core/rnaseq pipeline was run with -profile prokaryotic, which uses Bowtie2 for alignment and Salmon for quantification. Since E. coli has no introns*, splice-aware aligners like STAR are unnecessary. We load the SummarizedExperiment object produced by the pipeline, extract the raw count matrix, and assign gene symbols as row names
*The dispersal of five group II introns among natural populations of Escherichia coli Dai & Zimmerly -2002
). Despite their apparent intractability, at least five distinct group II introns exist naturally in E. coli strains. These are self-splicing group II introns (retroelements), not spliceosomal introns like in eukaryotes — so they don’t affect RNA-seq quantification in the way eukaryotic introns do, which is why Bowtie2 (non-splice-aware) works fine for E. coli.
⭐ Important: Raw counts must remain as integers — DESeq2’s statistical model requires this.
💡 Tip: If you are unsure which assay name or rowData columns are available in your RDS, inspect them first with assayNames(count_x) and names(rowData(count_x)).
count_x <- readRDS(
file.path(git_root, "data", "nf-core_rnaseq",
"salmon.merged.gene.SummarizedExperiment.rds")
)
count_genes <- assay(count_x, assayNames(count_x)[1])
gene_symbols <- rowData(count_x)$gene_name
gene_ids <- rowData(count_x)$gene_id
gene_symbols_saved <- ifelse(
!is.na(gene_symbols) & nchar(gene_symbols) > 0,
make.unique(as.character(gene_symbols)),
make.unique(as.character(gene_ids))
)
count_genes <- apply(count_genes, 2, as.integer)
rownames(count_genes) <- gene_symbols_saved
cat("Dimensions (genes × samples):", dim(count_genes), "\n")## Dimensions (genes × samples): 4523 6
## [1] "b0001" "b0002" "b0003" "b0004" "b0005" "b0006"