Docker Hands-on training (26-11-2025)#

Initial considerations#

In this course, you can work on your laptop if you have installed the Docker in your machine. For local installation:

Clone the repository that contain the data, script and all the course information

git clone git@github.com:biosustain/dsp_docker_training.git

Alternatively use Github Codespaces to avoid any system incompatibility. You can open directly a GitHub codespace for this specific repository here:

Data, biological context and script to run in a Docker container#

Biological context for this hands-on in these slides

Shell script that we want to run inside a container.

Running a container#

First of all, if you installed Docker Desktop on your machine check that Docker Desktop is open and shows “Running” in the status bar.

You can also test with:

docker -v
docker info

Once the docker daemon is running you can create and run a new container from an image, we will start with the hello-world image, which is hosted on Docker Hub.

Docker Hub is Docker’s public registry service where we can store, share, and manage container images. It hosts Docker Official Images (like hello-world), Verified Publisher content, and community-contributed images.

In your command line, run the publicly available hello-world image:

docker run hello-world

We usually run that to check that our installation is working correctly. If you read 'Hello from Docker!' everything is fine for now.

Please read the information given to understand what happened.

What happens:

Client (your terminal) sends run hello-world request.
Daemon (the engine) checks if image exists; if not, pulls it.
Daemon creates and runs the container (i.e hello-world).
Output is sent back to the client (“Hello from Docker!”).

Pulling a container#

Pull the publicly available debian image: debian:bookworm-slim and verify it has been downloaded. Images are pulled by default from Docker hub.

docker pull debian:bookworm-slim

In Docker image names like debian:bookworm-slim, the colon separates the image name from its tag.

“Bookworm” is the codename for Debian 12, the current stable release of the Debian Linux distribution. “slim” variant is a stripped-down, minimal version of the base image.

To check which images you have pulled, you can use images command with the following syntax:

docker images

Writing our Dockerfile#

Docker images are created from a textfile called Dockerfile.

This text file contains list of commands to assemble and configure the image with the software required.

For our dockerfile, we will learn the following instructions: FROM, LABEL, WORKDIR, RUN, COPY and ENV Go to Dockerfile references and find out what are these instructions for.

First thing we need is a base image. A dockerfile can use an existing image, and build on top of it. A base image is an image you designate with FROM directive in a Dockerfile. A Dockerfile with the FROM scratch directive uses an empty base image. When we build the Docker image from debian:13 we will get an image with something really close to having installed debian distribution (version 13).

What do we need to add to our Dockerfile?

Base image
Working directory
Instructions to download and install software and place it on the executables path
- curl to download feature counts
- bowtie2: align reads to a reference
- feature counts: count reads
- samtools: deal with .bam files (sorting and indexing them)
copy an script example

Open a file called Dockerfile. code Dockerfile

First let’s specify which base image we want to use. We are going to use one of the latest Debian versions as base image:

FROM debian:13

Define the working directory:

WORKDIR /app

Next, we are going to update the downloader manager apt-get (included in the debian image) and installing the software that we need and can be found in apt-get (easier solution). In our case we need curl (downloading feature counts), bowtie2 (the aligner) and samtools (sort and index bam files). We need to use RUN to execute building commands.

RUN apt-get update && apt-get -y install \
    bowtie2 \
    curl \
    samtools

However, not all tools needed can be installed with apt-get, for example in our case we need to download feature counts from a particular URL. This process is a bit more involved. We need to:

download feature counts using curl
uncompress the package
remove the compress file
specify in which path there is the executable

# Download feature counts (package called subread)
RUN curl -L -o subread-2.0.8-Linux-x86_64.tar.gz https://sourceforge.net/projects/subread/files/subread-2.0.8/subread-2.0.8-Linux-x86_64.tar.gz/download

# Uncompress and remove the compress file
RUN tar -xzf subread-2.0.8-Linux-x86_64.tar.gz && \
    rm subread-2.0.8-Linux-x86_64.tar.gz

Specify the path to the feature counts executable

# Specify the path to the executable
ENV PATH="${PATH}:/app/subread-2.0.8-Linux-x86_64/bin/"

Finally we are going to copy to the image the scripts folder that contains the script that we are going to run.

COPY scripts/ ./scripts/

Building an image from ther Dockerfile#

When you have a Dockerfile, you can build it to a docker image with the build command using the following syntax: docker build --tag <my-image-name:my-optional-image-tag> <my-location-path>, where --tag is equivalent to -t, -f specifies the Dockerfile, and last thing is <my-location-path> that can be referenced with .

Let’s build your Dockerfile to an image:

docker build -t align_count -f docker_file/Dockerfile .

Running a container in interactive mode#

While we would rather recommend using virtual environments for running specific environments interactively, it can be handy to be familiar with docker’s ability to run a container in interactive mode. This is just useful to check if the expected software is present within a container, but we do not recommend to change anything in the container, instead do it in the Dockerfile.

This is done by adding the two command line inputs to the docker run command: --interactive and --tty, which have the equivalents -i and -t.

Remember you can always run the following to find out what the parameters used in docker run are:

docker run --help

TThe interactive mode can be run with the following syntax: docker run -it <container-name> <command>

We will launch the BASH shell (bash) in the align_count image:

docker run -it align_count bash

Then you can check what is in there ls -la or in which directory are we pwd. Which folder can you see?

After looking around lets exit the bash with exit.

Check which software versions you included in your Docker image:

docker run align_count bowtie2 --version
docker run align_count featureCounts -v

We know that our images are locally available. We also want to check if our containers are still existing.

To do this try: docker ps -a

When we do this we can see that they are still existing. There is no reason for this as these containers are supposed to just do one job based on the docker image.

We can keep our space tidy by adding --rm to the run command to make sure the container is wiped after usage.

File system mounts#

Containers are isolated and run in a completely separate file system and it cannot access the hosting file system by default.

We can use the --volume <host-directory>:<container-directory> syntax to make a file or a folder available inside our container. -v is equivalent to --volume. Its argument consists of two fields separated by a colon (:):

Host source directory path
Container target directory path

In Docker, there is no fixed hard limit on how many -v (volume mounts) you can attach to a container.

Running the container mounting the data#

Finally we are going to build our command line to run our script to align and count inside the Docker container.

We will now make two mounted volumes, a read-only for data (ro), and a writable one for results

Mount paths must be absolute on the container side.

docker run --rm -v ./data/:/app/data:ro -v ./results/:/app/results align_count ./scripts/align_and_count_prok.sh

Results files and interpretion#

Results have been copied back from the container to my local results folder (host). Let’s have a look.

In total, four output files are generated:

Running bowtie2/samtools generates the sorted alignment file aligned_sorted.bam and the bam alignment index aligned_sorted.bam.bai (this index is used to accelerate read counting by featureCounts).
Running featureCountsgenerates the gene_counts.txt which holds unnormalised read count data per gene along with additional information like gene length, strand etc.
- Out of 4305 genes, 2720 genes have a read count of at least 1; the remaining genes have zero counts.
The gene_counts.txt.summaryfile details count statistics.
- It shows that 39913 reads were assigned a genomic feature (gene) and counted.
- Reads that could not be assigned a genomic feature amounted to 3830 and 1840 reads were ambigious (aligned to more than one genomic feature).

Docker Hands-on training (26-11-2025)

Contents