3 Running the nf-core/rnaseq pipeline v3.23.0

We will not process the original dataset because more computing power would be required. Instead, we will process only two size-reduced samples where 50,000 reads were randomly sampled from the original data (used script for sub-sampling: util/subsample_50k_PRJNA1158806.sh).

We will use pipeline release v3.23.0. From this release, it is possible to use Bowtie2 for read alignment, which is advantageous for processing prokaryotic RNAseq data. For simplicity, we specify a “prokaryotic” profile which automatically uses Bowtie2 for read alignment and Salmon for read quantification.

To process the data, run the following command:

nextflow run 'https://github.com/nf-core/rnaseq' \
    -name 'Ecoli_MG1655_saccharin_2_samples' \
    --outdir '/workspaces/dsp_transcriptomics_training/results/nfcore_rnaseq_processing_subsampled' \
    --input '/workspaces/dsp_transcriptomics_training/data/seq_files_subsampled/samplesheet_50k_subsampled_2samples.csv' \
    --fasta '/workspaces/dsp_transcriptomics_training/data/genome_files/GCF_000005845.2_ASM584v2_genomic.fna.gz' \
    --gtf '/workspaces/dsp_transcriptomics_training/data/genome_files/GCF_000005845.2_ASM584v2_genomic.gtf.gz' \
    -r 3.23.0 \
    -profile prokaryotic,docker \
    -c /workspaces/dsp_transcriptomics_training/01_scripts/custom.config

Parameter descriptions:

  • -name: name of the processing run
  • --outdir: (absolute) path to the output directory where results will be saved (the sub-directory is created automatically)
  • --input: (absolute) path to the sample sheet
  • --fasta: (absolute) path to the gzipped genome FASTA file
  • --gtf: (absolute) path to the gzipped genome annotation file
  • -r: nf-core/rnaseq pipeline release/version
  • -profile: profile(s) to run; here “prokaryotic” mode using “docker”
  • -c: (absolute) path to the custom configuration file; used here to limit the number of CPUs and memory

The Nextflow command is stored in a bash script and can be executed by running:

bash 01_scripts/00_nfcore_rnaseq_processing.sh

The processing time is about 7 minutes.

📌 Remember: Rather than configuring parameters manually, nf-core offers automatic parameter configuration. Go to the pipeline page and press Launch version 3.23.0 to see all pipeline parameters and change them as needed. A configuration file can be generated automatically for use with your Nextflow command.