5.8 Summary
| Step | Choice made | Rationale |
|---|---|---|
| ORA package | mulea |
Empirical FDR correction; accounts for gene set interdependence |
| GSEA package | fgsea |
Gold standard for preranked GSEA; proper NES and permutation p-values |
| Gene sets | KEGG pathways | Matches Dedios 2025; biologically interpretable for bacteria |
| GMT source | KEGGREST API + local RDS cache |
No pre-built KEGG GMT for E. coli; built from API and cached |
| ID conversion | Locus tag to gene symbol via KEGGREST | nf-core uses locus tags when GTF lacks gene_name; KEGG GMT uses symbols |
| ORA background | All DESeq2-tested genes | Correct statistical background — not the full genome |
| GSEA rank metric | sign(LFC) x -log10(pvalue) | Encodes direction and significance; robust to lowly expressed gene noise |
| ORA FDR | Empirical FDR (eFDR) | Accounts for gene set interdependence |
| GSEA FDR | Benjamini-Hochberg (fgsea default) | Standard for preranked GSEA |
| Significance threshold | 0.05 | Standard; consistent with DE analysis |
Enrichment results are saved and ready for biological interpretation and reporting.
## R version 4.4.1 (2024-06-14)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.3
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Copenhagen
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] KEGGREST_1.46.0 fgsea_1.32.4
## [3] mulea_1.1.1 plotly_4.12.0
## [5] DT_0.34.0 kableExtra_1.4.0
## [7] knitr_1.51 factoextra_2.0.0
## [9] pheatmap_1.0.13 RColorBrewer_1.1-3
## [11] ggpubr_0.6.3 DESeq2_1.46.0
## [13] SummarizedExperiment_1.36.0 Biobase_2.66.0
## [15] MatrixGenerics_1.18.1 matrixStats_1.5.0
## [17] GenomicRanges_1.58.0 GenomeInfoDb_1.42.3
## [19] IRanges_2.40.1 S4Vectors_0.44.0
## [21] BiocGenerics_0.52.0 reshape2_1.4.5
## [23] lubridate_1.9.5 forcats_1.0.1
## [25] stringr_1.6.0 dplyr_1.2.1
## [27] purrr_1.2.2 readr_2.2.0
## [29] tidyr_1.3.2 tibble_3.3.1
## [31] ggplot2_4.0.3 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] rlang_1.2.0 magrittr_2.0.5 otel_0.2.0
## [4] compiler_4.4.1 png_0.1-9 systemfonts_1.3.2
## [7] vctrs_0.7.3 pkgconfig_2.0.3 crayon_1.5.3
## [10] fastmap_1.2.0 backports_1.5.1 XVector_0.46.0
## [13] labeling_0.4.3 rmarkdown_2.31 tzdb_0.5.0
## [16] UCSC.utils_1.2.0 xfun_0.57 zlibbioc_1.52.0
## [19] cachem_1.1.0 jsonlite_2.0.0 DelayedArray_0.32.0
## [22] BiocParallel_1.40.2 broom_1.0.12 parallel_4.4.1
## [25] R6_2.6.1 bslib_0.10.0 stringi_1.8.7
## [28] car_3.1-5 numDeriv_2016.8-1.1 jquerylib_0.1.4
## [31] Rcpp_1.1.1-1.1 bookdown_0.46 Matrix_1.7-5
## [34] timechange_0.4.0 tidyselect_1.2.1 rstudioapi_0.18.0
## [37] abind_1.4-8 yaml_2.3.12 codetools_0.2-20
## [40] lattice_0.22-9 plyr_1.8.9 withr_3.0.2
## [43] S7_0.2.2 coda_0.19-4.1 evaluate_1.0.5
## [46] xml2_1.5.2 Biostrings_2.74.1 pillar_1.11.1
## [49] carData_3.0-6 generics_0.1.4 emdbook_1.3.14
## [52] hms_1.1.4 scales_1.4.0 glue_1.8.1
## [55] lazyeval_0.2.3 tools_4.4.1 apeglm_1.28.0
## [58] data.table_1.18.2.1 locfit_1.5-9.12 ggsignif_0.6.4
## [61] mvtnorm_1.3-6 fastmatch_1.1-8 cowplot_1.2.0
## [64] grid_4.4.1 bbmle_1.0.25.1 crosstalk_1.2.2
## [67] bdsmatrix_1.3-7 colorspace_2.1-2 GenomeInfoDbData_1.2.13
## [70] Formula_1.2-5 cli_3.6.6 textshaping_1.0.5
## [73] S4Arrays_1.6.0 viridisLite_0.4.3 svglite_2.2.2
## [76] gtable_0.3.6 rstatix_0.7.3 sass_0.4.10
## [79] digest_0.6.39 SparseArray_1.6.2 ggrepel_0.9.8
## [82] htmlwidgets_1.6.4 farver_2.1.2 htmltools_0.5.9
## [85] lifecycle_1.0.5 httr_1.4.8 MASS_7.3-65