Metagenomics data processing#
Here we are going to process the dataset presented in this paper entitled: “Metagenomic insights into the prokaryotic communities of heavy metal-contaminated hypersaline soils”.
Longer explanation about the process in here
But for now let’s get to the practical part. let’s double check few things in your terminal.
We are going to use Nextflow to run a pipeline that performs taxonomical profile of our samples: nf-core/taxprofiler. Nextflow is a workflow system for creating scalable, portable, and reproducible workflows. To run Nextflow you need java and Docker installed in your system. Let’s verify everything is installed:
java -version
docker --version
nextflow -v
Let’s get some info about Nextflow and run our first pipeline.
nextflow info
nextflow run hello
If everything went well you have a fully functional Nextflow environment for today.
Running taxprofiler:
nextflow run nf-core/taxprofiler -r 1.2.3 -profile test,docker --outdir processed_data
Now that is running let’s study the command:
nextflow run –> to run the pipeline
nf-core/taxprofiler –> it is downloaded from github repo
-r 1.2.3 –> we are running this version
-profile test,docker –> we are running a test here so the input data (that would be like our fastQ files), the genomes database to map reads to (Metaphlan 4.0) and the software for the different steps is downloaded from a github repository from Nextflow. We also need Docker so that each step runs in a container isolated to ensure reproducibility
–outdir processed_data –> intermediate files and final results will be saved in processed_data
Notice that a directory called work
is created where there are all the pipeline steps logs in these folders called with hexadecimal numbers. In the folder processed_data
you will find the results of each step.