7 EXTRA: Running nf-core/differentialabundance
We will use pipeline release v1.5, which takes the count matrix and sample metadata produced by nf-core/rnaseq and performs differential expression analysis using DESeq2.
To process the data, run the following command:
nextflow run nf-core/differentialabundance \
-r 1.5.0 \
-profile docker \
-params-file params_degs.json \
--outdir results/differentialabundanceThe Nextflow command is stored in a bash script and can be executed by running:
The processing time is about 5 minutes.
7.0.1 Key parameters
The pipeline is configured via params_degs.json. The most important parameters we set
are described below.
Inputs
input— sample sheet linking each sample to its conditionmatrix— raw gene count matrix from Salmon (salmon.merged.gene_counts.tsv)transcript_length_matrix— gene length matrix for TPM normalisationcontrasts— CSV file defining which groups to compare (treatment vs control)gtf— genome annotation file for E. coli K-12 MG1655
Feature annotation
features_id_col: "gene_id"— locus tags (e.g.b0001) are used as the primary identifierfeatures_name_col: "gene_name"— gene symbols (e.g.thrL) are used as display namesfeatures_gtf_feature_type: "transcript"— annotations are extracted at transcript level
Filtering — genes must pass all three criteria to be retained:
filtering_min_abundance: 10— minimum count of 10filtering_min_samples: 2— present in at least 2 samplesfiltering_grouping_var: "group"— filtering applied per experimental group
DESeq2
deseq2_fit_type: "local"— uses a local dispersion fit, more appropriate for smaller sample sizes than the default parametric fit
Differential expression threshold
differential_min_fold_change: 1.0— only genes with |FC| ≥ 1.5 are reported as DE
Disabled modules
gprofiler2_run: falseandgsea_run: false— the pipeline’s built-in functional enrichment steps are turned off because to save time here.