Organisation: Data Science Platform
Responsible: Juliana Assis (jasge@dtu.dk)
In today’s data-driven world, the ability to visualize data effectively is essential for uncovering trends, making informed decisions, and communicating complex information. This workshop will guide you through the theoretical foundations and the practical applications of data visualization, ensuring you leave with a solid understanding of the subject.
The importance of data visualization for making complex data accessible and understandable
Good practices for creating clear, compelling visualizations.
Practical skills in data visualization using R, focusing on RNotebook, ggplot2, dplyr, tidyr, and Plotly, with hands-on experience in creating both static and interactive plots
If you prefer to run the Workshop in your own computer, please, follow the instructions at the Github.
Otherwise you are welcome to join our cloud. The link will be not pasted here, it will be only available in our workshop (for security reasons).
Independent of the method you chose, the next steps will be the same to everyone.
This section ensures all the necessary R packages for the workshop are installed and loaded. Here’s what each part of the code does:
install.packages
.BiocManager
package is installed, which is needed for some specialized packages.BiocManager
to install the ComplexHeatmap
package, which is not available on CRAN.Below is the R code implementing these steps:
# List of packages to install
packages <- c("ggpubr", "grid", "tidyr", "reshape2", "reshape", "ggrepel", "ggh4x", "pheatmap", "RColorBrewer", "patchwork","DT", "kableExtra", "plotly", "bookdown", "heatmaply", "gganimate", "gifski")
# Install missing packages
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
# Load the libraries
lapply(packages, library, character.only = TRUE)
# BiocManager
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ComplexHeatmap")
In this section, we load all the required libraries for the workshop. These libraries provide functions for data manipulation, visualization, and interactive reporting. Here’s what each package is used for:
Code on how to load the packages (one by one, you can also make a loop)
library(ggplot2)
library(ggpubr)
library(grid)
library(tidyr)
library(reshape2)
library(ggrepel)
library(ggh4x)
library(pheatmap)
library(RColorBrewer)
library(patchwork)
library(DT)
library(dplyr)
library(kableExtra)
library(ComplexHeatmap)
library(plotly)
library(bookdown)
library(heatmaply)
library(circlize)
#Extra
library(gganimate)
library(gifski)
Summary
Here, by employing 16S rRNA gene amplicon sequencing, we investigated the adaptation and succession of human fecal microbial communities in an automated multistage fermentor. We performed two independent experiments using different human donor fecal samples, one configured with two units of three colon compartments each studied for 22 days and another with one unit of two colon compartments studied for 31 days. The fermentor maintained a trend of increasing microbial alpha diversity along colon compartments. Within each experiment, microbial compositions followed compartment-specific trajectories and reached independent stable configurations. While compositions were highly similar between replicate units, they were clearly separated between different experiments, showing that they maintained the individuality of fecal inoculum rather than converging on a fermentor-specific composition. While some fecal amplicon sequence variants (ASVs) were undetected in the fermentor, many ASVs undetected in the fecal samples flourished in vitro. These bloomer ASVs accounted for significant proportions of the population and included prominent health-associated microbes such as Bacteroides fragilis and Akkermansia muciniphila. Turnover in community compositions is likely explained by feed composition and pH, suggesting that these communities can be easily modulated. Our results suggest that in vitro fermentors are promising tools to study complex microbial communities harboring important members of human gut microbiota.
Experimental Design
Alpha Diversity: To be reproduced
In this section, we are automating the setup for our workshop by downloading necessary files from the GitHub repository. Here’s what the script does:
Below is the R code that performs these tasks:
# Base URLs for each directory on GitHub
base_urls <- list(
"assets" = "https://github.com/biosustain/dsp_workshop_datavizR/raw/main/01_Code/assets/",
"assets/fonts" = "https://github.com/biosustain/dsp_workshop_datavizR/raw/main/01_Code/assets/fonts/",
"data/raw" = "https://github.com/biosustain/dsp_workshop_datavizR/raw/main/01_Code/data/raw/"
)
# List of files to download, organized by subdirectory
file_paths <- list(
"assets" = c(
"_footer-lab.Rmd",
"_header-lab.Rmd",
"logo.png",
"logo.svg",
"report.css"
),
"assets/fonts" = c(
"JetBrainsMono-Regular.ttf",
"SourceSansPro-Italic.ttf",
"SourceSansPro-Regular.ttf",
"SourceSansPro-SemiBold.ttf",
"SourceSansPro-SemiBoldItalic.ttf"
),
"data/raw" = c(
"01_Alpha_Diversity.tsv",
"02_ord_DataFrame.tsv",
"03_HeatMap_Exp1_DCs.tsv",
"04_HeatMap.rds",
"data.csv"
)
)
# Create necessary local directories with the specified structure
dir.create("./assets/fonts", recursive = TRUE, showWarnings = FALSE)
dir.create("./data/raw", recursive = TRUE, showWarnings = FALSE)
# Loop through each directory and download files
for (subdir in names(file_paths)) {
for (file_name in file_paths[[subdir]]) {
# Construct the full URL for each file
url <- paste0(base_urls[[subdir]], file_name)
# Define the local destination path
dest <- file.path(".", subdir, file_name)
# Download the file
download.file(url, destfile = dest, mode = "wb")
}
}
message("All files have been downloaded successfully into the specified directory structure!")
Here, we are loading the alpha diversity data from a TSV file using the read.table
function.
The arguments specify the following:
- header=T
: The first row contains column headers.
- sep="\t"
: The file is tab-delimited.
- row.names=1
: The first column contains row names.
- check.names=T
: Ensures column names are syntactically valid in R.
Code
# Load the TSV files
alpha_info_tab <- read.table("data/raw/01_Alpha_Diversity.tsv", header=T, sep="\t", row.names=1, check.names=T)
#ord_dataframe <- read.table("data/raw/02_ord_DataFrame.tsv", header = TRUE, sep = "\t")
#heatmap_exp1 <- read.table("data/raw/03_HeatMap_Exp1_DCs.tsv", header = TRUE, sep = "\t")
# Load the RDS file
#heatmap_data <- readRDS("data/raw/04_HeatMap.rds")
# Load the CSV file
#data_csv <- read.csv("data/raw/data.csv")
In this section, we will display the alpha_info_tab
dataset as an interactive table using the DT
package. The table will include features such as:
Here, you can see the code and the output bellow.
This section filters the dataset to include only rows where the Study
column is equal to "Exp1"
. The dplyr
package’s filter()
function is used for this purpose, allowing for easy subsetting of the data. After filtering, we display the dataset.
# Filter the data for Study == "Exp1"
alpha_info_tab_filtered <- alpha_info_tab %>%
filter(Study == "Exp1")
# View the filtered data
alpha_info_tab_filtered
BoxPlot of Observed Values by Compartment for Exp1
In this section, we filter the dataset for Study == "Exp1"
and create a boxplot of the Observed
values, grouped by the Compartment
column. The ggplot2
package is used for plotting, with geom_boxplot()
creating the boxplot and theme_pubr()
applied to give the plot a professional appearance. The labs()
function is used to label the x-axis and y-axis, while the plot title is set with ggtitle()
.
The code also includes an option to add individual data points with geom_jitter()
, which has been commented out for now. You can remove the comment and see what happens.
# Filter for Exp1 and create the boxplot
alpha_info_tab %>%
filter(Study == "Exp1") %>%
ggplot(aes(x = Compartment, y = Observed, fill = Compartment)) +
geom_boxplot() +
#geom_jitter(width = 0.2) + # Adds individual data points
theme_pubr(border = TRUE) +
labs(x = "Compartment", y = "Observed") +
ggtitle("Boxplot of Observed Values in Exp1 by Compartment")
In this section, we define two methods for assigning colors to the Compartment
variable in the dataset.
Manual Color Assignment: The first method manually assigns specific colors to each compartment using a named vector (cols_compartment
). This is useful for smaller dataset.
Automatic Color Assignment: The second method uses the RColorBrewer
package to automatically generate a color palette based on the unique values in the Compartment
column. The brewer.pal()
function generates a set of colors from the “Set3” palette, and setNames()
associates the colors with the unique compartments. For more details about available color palettes, you can visit the RColorBrewer website.
cols_compartment <- c("Fecal" = "#D46C4E", "AC" = "#77A515", "TC" = "#264D59", "DC" = "#43978D")
# Using color brewer
# Get the unique compartments
unique_compartments <- unique(alpha_info_tab_filtered$Compartment)
# Automatically generate a color palette (you can use a predefined set or any other color scale)
cols_compartment_auto <- setNames(
RColorBrewer::brewer.pal(length(unique_compartments), "Set3"),
unique_compartments
)
We filter the data for Study == "Exp1"
and then reorder the Compartment
factor levels manually to ensure the boxplot appears in a specific order. The plot is created using ggplot2
, with compartments on the x-axis and observed values on the y-axis. We apply custom colors to the boxplot using the scale_fill_manual()
function, which can either use a predefined color vector (cols_compartment
) or an automatically generated color palette (cols_compartment_auto
).
The mutate(Compartment = factor(Compartment, levels = c("Fecal", "AC", "TC", "DC")))
part ensures the compartments appear in a specific order in the boxplot.
geom_boxplot()
creates the actual boxplot, and scale_fill_manual(values = cols_compartment)
applies the custom color scheme to each compartment.
Finally, the plot is saved using ggsave()
(although this line is commented out). Remove the comment if you want to save it.
#BoxPlot<-
alpha_info_tab %>%
filter(Study == "Exp1") %>%
mutate(Compartment = factor(Compartment, levels = c("Fecal", "AC", "TC", "DC"))) %>%
ggplot(aes(x = Compartment, y = Observed, fill = Compartment)) +
geom_boxplot() +
theme_pubr(border = TRUE) +
labs(x = "Compartment", y = "Observed") +
ggtitle("Boxplot of Observed Values in Exp1 by Compartment") +
#scale_fill_manual(values = cols_compartment_auto)
scale_fill_manual(values = cols_compartment)
In this section, we filter and transform the data using dplyr
functions to create a boxplot with the facet_wrap()
function in ggplot2
.
Study == "Exp1"
to focus on this subset of data.Units
that duplicates the Fecal
compartment into two entries: Unit1
and Unit2
. This is achieved using the bind_rows()
function to add rows to the dataset for both Unit1
and Unit2
while maintaining the original Unit
for other compartments.mutate()
function is used to ensure that the Compartment
variable has the desired factor levels (Fecal
, AC
, TC
, DC
) for the boxplot, ensuring that the compartments are ordered.Compartment
or Units
columns have missing values using filter(!is.na(Compartment) & !is.na(Units))
.ggplot()
function is used to create the boxplot of Observed
values by Compartment
. We use the fill
aesthetic to color the boxes by compartment, and apply a custom color scheme using scale_fill_manual()
.facet_wrap(~ Units, scales = "free_x")
creates separate plots (facets) for each value of the Units
variable. This allows us to view the distributions for Unit1
and Unit2
separately, with free x-axis scaling to accommodate different numbers of categories across the facets.The result is a faceted boxplot showing observed values for Exp1
across the Compartment
categories, separated by Units
.
test <- alpha_info_tab %>%
filter(Study == "Exp1") %>%
bind_rows(
alpha_info_tab %>%
filter(Compartment == "Fecal") %>%
mutate(Units = "Unit1"),
alpha_info_tab %>%
filter(Compartment == "Fecal") %>%
mutate(Units = "Unit2"),
alpha_info_tab %>%
filter(Compartment != "Fecal") %>%
mutate(Units = as.character(Unit))
) %>%
mutate(Compartment = factor(Compartment, levels = c("Fecal", "AC", "TC", "DC"))) %>%
filter(!is.na(Compartment) & !is.na(Units)) %>%
ggplot(aes(x = Compartment, y = Observed, fill = Compartment)) +
geom_boxplot() +
geom_jitter(width = 0.2) + # Adds individual data points
theme_pubr(border = TRUE) +
labs(x = "Compartment", y = "Observed") +
ggtitle("Boxplot of Observed Values in Exp1 by Compartment and Units") +
scale_fill_manual(values = cols_compartment) +
facet_wrap(~ Units, scales = "free_x")
test
Here, we are visualizing the longitudinal trends of Observed
(Richness) and Shannon
diversity across different time points (Day
) for the Exp1
study. The data is reshaped and plotted with several customizations.
Study == "Exp1"
using filter()
.gather()
function is used to reshape the data from wide format (separate columns for Observed
and Shannon
) to long format, where both metrics are stored in a single column (metric
), and their values in another (value
).mutate()
function is used to rename the metrics directly in the data, so that Observed
is shown as “Richness” and Shannon
is retained as “Shannon”.ggplot()
function is used to create the plot with Day
on the x-axis and value
on the y-axis, colored by Compartment
and grouped by Study
and Compartment_Unit
using interaction()
.geom_vline(xintercept = 1:23)
) to visually separate different days.geom_point()
) and lines (geom_line()
) are plotted for each combination of Study
, Compartment
, and Unit
.theme()
and scale_colour_manual()
.scale_x_discrete()
to set the limits, which include the Fecal
compartment and days 1 to 23.facet_grid()
function is used to create separate panels for each metric (Observed
and Shannon
), with scales = "free_y"
to allow for different y-axis scales for each metric.This visualization provides insights into the changes in microbial diversity across different compartments and units over time, making it easier to compare the trends for both richness and Shannon diversity in the same plot.
The plot shows the longitudinal trends of Richness
and Shannon
diversity, faceted by metric, with custom color, line types, and axis adjustments.
# Extra
# Define the y-axis scale
#scales <- scale_y_continuous(limits = c(0, 129))
alpha_info_tab %>%
filter(Study == "Exp1") %>% # Filter for Exp1
gather(key = "metric", value = "value", c("Observed", "Shannon")) %>% # Reshape data
mutate(
metric = recode(metric, "Observed" = "Richness", "Shannon" = "Shannon") # Change names directly in the data
) %>%
ggplot(aes(x = Day, y = value, color = Compartment, group = interaction(Study, Compartment_Unit))) +
# Add vertical lines for day intervals
geom_vline(xintercept = 1:23, linetype = 'solid', colour = "grey", alpha = 0.3) +
# Plot points and lines
geom_point(size = 4) +
geom_line(aes(linetype = Unit)) +
# Customize plot theme
theme_pubr(border = TRUE) +
theme(
axis.text.x = element_text(size = 8, hjust = 0.5),
axis.text.y = element_text(size = 8, hjust = 1),
legend.position = "top"
) +
# Custom color and line types
scale_colour_manual(values = cols_compartment) +
scale_linetype_manual(values = c("dotted", "solid", "dashed", "longdash")) +
# Define x-axis labels
scale_x_discrete(limits = c("Fecal", as.character(1:23))) +
# Add labels for axes
labs(x = "", y = "") +
# Facet the plot by metric
facet_grid(rows = vars(metric), scales = "free_y", space = "free_x")
In this section, we are focusing on the ordination data to visualize Beta Diversity through Principal Coordinate Analysis (PCoA).
ord_DataFrame.tsv
file is loaded into R using the read.table()
function. This file contains the results from ordination analysis, where the rows represent individual samples and the columns represent the principal coordinates (or other variables).header = TRUE
argument indicates that the first row contains column names. The row.names = 1
argument designates the first column as row names (e.g., sample IDs).sep = "\t"
option specifies that the file is tab-delimited. The check.names = TRUE
ensures that column names are properly formatted.knitr::kable()
. This function outputs the data in a readable format for the R Markdown document.kableExtra::kable_styling()
function is applied to make the table more visually appealing, with bootstrap_options = c("striped", "hover")
to add alternating row colors and hover effects.ord_DataFrame <- read.table("data/raw/02_ord_DataFrame.tsv", header=T, sep="\t", row.names=1, check.names=T)
ord_DataFrame %>%
knitr::kable() %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
Beta Diversity
ggplot(ord_DataFrame, aes(x = X45.7., y = X19.4., color = Compartment_Unit, shape = Unit)) +
geom_point(size = 5) #+
In this section, we are visualizing the Beta Diversity results from the Principal Coordinate Analysis (PCoA) using a scatter plot. The plot is customized to display samples from different compartments and units, with specific color and shape assignments for better visualization.
unique()
function is used to inspect the unique values in the Compartment_Unit
column of the ordination data frame (ord_DataFrame
). This helps identify the different categories that will be represented in the plot.compartment_unit_col
vector is defined to assign specific colors to each unique Compartment_Unit
combination. The colors chosen are visually distinct for each compartment and unit combination.ggplot()
function is used to create the PCoA plot. The x and y axes are set to X45.7.
and X19.4.
, which represent the first two principal coordinates (PC1 and PC2).aes()
function is used to map the Compartment_Unit
to color and Unit
to shape. This allows for clear differentiation of the points based on their respective categories.x = 0
and y = 0
are added using geom_hline()
and geom_vline()
to help visually assess the distribution of the samples.geom_point()
, and the size of the points is set to 5 for better visibility.scale_shape_manual()
, with different shapes for different Unit
values.theme_pubr()
function is used to apply a professional plot theme with borders.coord_fixed(ratio = 1)
to maintain proportionality between the axes.theme()
. Axis labels, title fonts, and legend settings are customized for better readability.scale_color_manual()
with the compartment_unit_col
palette.print(ordplot2)
to visualize the results.## [1] "Fecal" "AC1" "AC2" "DC1" "DC2" "TC1" "TC2"
# compartment Unit
compartment_unit_col <- c("Fecal" = "#264D59", "AC1" = "#77A515", "TC1" = "#D46C4E", "DC1" = "#43978D", "AC2" = "#77A515", "TC2" = "#D46C4E", "DC2" = "#43978D")
ordplot2 <- ggplot(ord_DataFrame, aes(x = X45.7., y = X19.4., color = Compartment_Unit, shape = Unit)) +
geom_hline(yintercept = 0, linetype = 'dashed', alpha = 0.3) +
geom_vline(xintercept = 0, linetype = 'dashed', alpha = 0.3) +
geom_point(size = 5) +
#geom_text_repel(aes(label = Study), nudge_x = 0.06, size = 3.0, segment.alpha = 0.5) +
scale_shape_manual(values = c(18, 19, 17)) +
theme_pubr(border = TRUE) +
coord_fixed(ratio = 1) + # Keep aspect ratio 1:1
theme(axis.text = element_text(size = 14),
axis.text.x = element_text(size = 12, hjust = 0.5),
axis.title.y = element_text(size = 18),
legend.text = element_text(size = 14),
legend.title = element_text(size = 0),
legend.position = "bottom", #top, null etc
axis.title.x = element_text(size = 18),
strip.text.x = element_text(size = 20, face = "bold")) +
scale_color_manual(values = compartment_unit_col) +
labs(x = "PCo1 [45.7%]", y = "PCo2 [19.4%]", color = "Compartment_Unit", shape = "Unit") #+
#annotate("text", x = min(ord_DataFrame$X45.7.), y = min(ord_DataFrame$X19.4.),
#label = "Exp2", hjust = 0.1, vjust = 0.4, size = 5, color = "black")
# To view the plot
print(ordplot2)
In this section, we load and prepare data for heatmap visualization. The distinction between a data frame and a matrix is important because many heatmap plotting functions, such as those in the ComplexHeatmap
or pheatmap
packages, require the input to be in matrix format.
After loading the data
class()
function is used to check the type of the loaded object, which will initially be a data frame.as.matrix()
function converts the data frame into a matrix format.class()
function is then used to confirm that the object is now a matrix.Feature | Data Frame | Matrix |
---|---|---|
Structure | A list of equal-length vectors. | A 2D array with elements of the same type. |
Column Types | Can hold different data types (e.g., numeric, character, etc.). | Must hold elements of the same type. |
Usage in Heatmaps | Often not directly usable in heatmap functions. | Preferred for heatmap plotting. |
Row and Column Names | Both row and column names are supported. | Row and column names are supported. |
Conversion | Can be converted to a matrix with as.matrix() . |
Cannot be directly converted to a data frame. |
Why Convert to a Matrix? Heatmap functions require uniform data types (e.g., numeric values) for all elements, as they rely on mathematical operations such as clustering or scaling. Matrices ensure that all entries are of the same type (typically numeric), avoiding issues caused by mixed data types in a data frame.
heat_df <- read.table("data/raw/03_HeatMap_Exp1_DCs.tsv", header=T, sep="\t", row.names=1, check.names=T)
class(heat_df)
## [1] "data.frame"
## [1] "matrix" "array"
This code filters the heatmap data to include only the top 20 taxa based on their abundance across samples. This is a common preprocessing step to focus the visualization on the most significant or abundant features, reducing clutter in the heatmap.
Key Steps:
Total
, is added to the data frame. This column contains the row sums for each taxon, representing its total abundance across all samples.rowMeans()
if you want to base the selection on average abundance instead of the sum.arrange(desc(Total))
function sorts the taxa in descending order of their total abundance.slice(1:20)
function selects the top 20 taxa with the highest abundance.select(-Total)
function removes the Total
column, as it is no longer needed.as.matrix()
to ensure compatibility with heatmap functions.# Step 1: Calculate row sums or another measure to identify top taxa
heat_df$Total <- rowSums(heat_df) # or use `rowMeans` if preferred
# Step 2: Filter the top 20 rows based on Total
top20_data <- heat_df %>%
arrange(desc(Total)) %>%
slice(1:20) %>%
select(-Total)
# Step 3: Convert to matrix for ComplexHeatmap
data_matrix <- as.matrix(top20_data)
This code generates a heatmap to visualize the abundance of the top 20 taxa across samples. Heatmaps are a powerful tool for exploring patterns, clusters, and relationships within the data.
data_matrix
created earlier is used as the input for the heatmap. It represents the abundance of the top 20 taxa across samples.colorRamp2()
function defines a gradient color scheme for the heatmap:
cluster_rows = TRUE
and cluster_columns = TRUE
options. This groups similar taxa and samples together, revealing patterns in the data.Heatmap(data_matrix,
name = "Abundance",
col = colorRamp2(c(min(data_matrix), median(data_matrix), max(data_matrix)),
c("blue", "white", "red")),
show_row_names = TRUE,
show_column_names = TRUE,
cluster_rows = TRUE,
cluster_columns = TRUE
)
This section builds upon the previous heatmap by adding annotations for the samples, specifically their compartments. Sample annotations provide additional context to the heatmap, enhancing interpretability.
compartment
variable labels each column (sample) in the data_matrix
with its corresponding compartment."DC1"
.compartment
vector is checked against the number of columns in data_matrix
."DC1"
mapped to "#43978D"
.HeatmapAnnotation()
function is used to define an annotation for the heatmap. It assigns each sample to its respective compartment with the specified color.top_annotation
argument. This places a color-coded bar above the heatmap to represent the compartments of the samples.## [1] 13
# Create Compartment labels for the samples, repeating "DC1" for all columns in data_matrix
compartment <- rep("DC1", ncol(data_matrix))
# Check if the lengths match
length(compartment) == ncol(data_matrix) # Should return TRUE
## [1] TRUE
# Colors for compartments
compartment_colors <- c("DC1" = "#43978D")
# Create the annotation
sample_annotation <- HeatmapAnnotation(
Compartment = compartment,
col = list(Compartment = compartment_colors)
)
# Create the heatmap with the annotation
Heatmap(
data_matrix,
name = "Abundance",
col = colorRamp2(
c(min(data_matrix), median(data_matrix), max(data_matrix)), # Color scale points
c("blue", "white", "red") # Colors corresponding to min, median, and max values
),
show_row_names = TRUE, # Show row names
show_column_names = TRUE, # Show column names
cluster_rows = TRUE, # Cluster rows
cluster_columns = TRUE, # Cluster columns
top_annotation = HeatmapAnnotation(
Compartment = anno_simple(
compartment, # Variable for compartment annotation
col = compartment_colors # Custom colors for compartments
)
)
)
pheatmap
This section demonstrates how to create a heatmap using the pheatmap
package. It includes customizations such as annotation and color schemes, providing a comprehensive approach to visualizing the data.
## Change colors
colsHeat<- c("#F7F7F7", "#92C5DE", "#0571B0", "#F4A582", "#CA0020")
# Create an annotation dataframe
annotation_df <- data.frame(Compartment = compartment)
# Define annotation colors
annotation_colors <- list(Compartment = c("DC1" = "#43978D"))
# Plot heatmap using pheatmap
pheatmap(data_matrix,
cluster_cols = FALSE,
cluster_rows = TRUE,
#scale = "column",
#gaps_row = 5,
clustering_distance_rows = "euclidean",
clustering_distance_cols = "euclidean",
annotation_colors = annotation_colors,
annotation_col = annotation_df,
show_colnames = TRUE,
color = colorRampPalette(c(colsHeat))(50),
border_color = "#f8edeb",
display_numbers = FALSE)
The code is performing a column-wise normalization on data_matrix by dividing each element in a column by the mean of that column.
Purpose of the Code The result, stored in data, is a transformed version of data_matrix where each column has been scaled so that the average value in each column is 1. This transformation is often done to normalize the data across columns to account for differences in scale or to control for sample-specific variation in abundance data.
# Row-wise normalization
#data <- apply(data_matrix, 1, function(x) { x / mean(x) })
# Column-wise normalization
data <- apply(data_matrix, 2, function(x) { x / mean(x) })
# Different normalization
#sequencing_depth <- 20000
#read_counts_matrix <- data_matrix * sequencing_depth
# Step 2: Log-normalize each value in the matrix
#data <- log10(1 + read_counts_matrix) / max(log10(1 + read_counts_matrix))
# Sort by Days
colnames(data) <- colnames(data)[order(as.numeric(gsub("Exp1_(\\d+).*", "\\1", colnames(data))))]
pheatmap(data,
cluster_cols = FALSE,
cluster_rows = TRUE,
#scale = "column",
#gaps_row = 5,
clustering_distance_rows = "euclidean",
clustering_distance_cols = "euclidean",
annotation_colors = annotation_colors,
annotation_col = annotation_df, # Add annotation dataframe here
show_colnames = TRUE,
color = colorRampPalette(c(colsHeat))(50),
border_color = "#f8edeb",
display_numbers = FALSE)
This section demonstrates how to create an interactive heatmap using the plotly
package. This approach allows users to interact with the data visualization, enhancing the analysis experience.
colnames(data)
: Used as labels for the x-axis, representing the columns of the data matrix.rownames(data)
: Used as labels for the y-axis, representing the rows of the data matrix.data
: The actual data matrix to be visualized.colors
argument specifies the color palette for the heatmap. In this case, colsHeat
is used as a custom color scheme for data representation.margin = list(l = 120)
: Adds space on the left for row labels.xaxis
and yaxis
: Customize axis ticks and labels for clarity."Abundance"
) indicating the meaning of the color gradient.p <- plot_ly(
x = colnames(data),
y = rownames(data),
z = data,
type = "heatmap",
colors = colsHeat,
showscale = TRUE,
colorbar = list(title = "Abundance") # Optional colorbar title
) %>%
layout(
margin = list(l = 120), # Space for row names on the left
xaxis = list(showticklabels = TRUE, ticks = ""),
yaxis = list(tickvals = 1:nrow(data), ticktext = rownames(data))
)
p
This section demonstrates how to enhance a heatmap visualization with an additional annotation layer. Specifically, a single-row heatmap is added to indicate a compartment (e.g., “DC1”) for all columns of the main heatmap.
colorscale
is fixed to a single color (#43978D
), representing “DC1.”0
).plot_ly
is used to create both the main heatmap (p
) and the annotation heatmap (p_compartment
).subplot
function combines these two visualizations, stacking the annotation on top of the main heatmap.heights = c(0.1, 0.9)
), giving a smaller proportion to the annotation heatmap.shareX = TRUE
), ensuring alignment between the main heatmap and the annotation strip.# Create the compartment annotation heatmap (only DC1, shown in #43978D)
compartment_annotation <- rep(0, ncol(data)) # Only 0s for DC1
p_compartment <- plot_ly(
x = colnames(data),
y = "Compartment", # Annotation row title
z = matrix(compartment_annotation, nrow = 1), # Annotation as a single-row matrix
type = "heatmap",
colorscale = list(list(0, "#43978D"), list(1, "#43978D")), # Fixed color for DC1 only
showscale = FALSE # No colorbar for annotation
) %>%
layout(
xaxis = list(showticklabels = TRUE, ticks = ""),
yaxis = list(
tickvals = c(0), # Position the annotation title
ticktext = "Compartment"
)
)
# Combine the main heatmap and the annotation strip
subplot(p_compartment, p, nrows = 2, heights = c(0.1, 0.9), shareX = TRUE)
Extra activities for those who finish the activities or for those who want to do homework
Animate the PCoA Plot for Beta Diversity
ord_DataFrame$Compartment_Day <- paste(ord_DataFrame$Compartment_Unit, ord_DataFrame$Day, sep = " - ")
# Ensure 'Day' is numeric and sorted
ord_DataFrame$Day <- as.numeric(as.character(ord_DataFrame$Day))
ord_DataFrame <- ord_DataFrame[order(ord_DataFrame$Day), ]
# Create the animation
anim <- ggplot(ord_DataFrame, aes(x = X45.7., y = X19.4., color = Compartment, shape = Unit)) +
geom_point(size=5, alpha=0.8) +
geom_text_repel(aes(label = sprintf("%.1f", Day)), nudge_x = 0.06, size = 5.0, segment.alpha = 0.8) + # Format Day to 1 decimal
scale_color_manual(values = cols_compartment) +
scale_shape_manual(values = c(18, 19, 17)) +
theme_pubr(border = TRUE) +
theme(legend.position="right") +
transition_states(Day) + # transition in numeric order of Day
facet_wrap(~ Study) +
ease_aes('linear')
animate(anim, duration = 25, fps = 20, renderer = gifski_renderer())
This section showcases the use of the heatmaply
package to create an interactive heatmap with enhanced functionality. The focus is on exploring data visually, including options for clustering, color customization, and annotations.
heatmaply(data,
show_dendrogram = c(TRUE, FALSE), # Cluster rows and columns
column_text_angle = 90, # Orientation of the column labels
colors = colorRampPalette(c("#F7F7F7", "#92C5DE", "#0571B0", "#F4A582", "#CA0020"))(50),
show_row_names = TRUE,
show_column_names = FALSE,
annotation_col = data.frame(Compartment = compartment_annotation),
row_text_angle = 0)
## R version 4.4.1 (2024-06-14)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.3
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Copenhagen
## tzcode source: internal
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] gifski_1.32.0-1 gganimate_1.0.9 circlize_0.4.16
## [4] heatmaply_1.5.0 viridis_0.6.5 viridisLite_0.4.2
## [7] bookdown_0.40 plotly_4.10.4 ComplexHeatmap_2.20.0
## [10] kableExtra_1.4.0 dplyr_1.1.4 DT_0.33
## [13] patchwork_1.3.0 RColorBrewer_1.1-3 pheatmap_1.0.12
## [16] ggh4x_0.2.8 ggrepel_0.9.6 reshape2_1.4.4
## [19] tidyr_1.3.1 ggpubr_0.6.0 ggplot2_3.5.1
##
## loaded via a namespace (and not attached):
## [1] gridExtra_2.3 rlang_1.1.4 magrittr_2.0.3
## [4] clue_0.3-65 GetoptLong_1.0.5 matrixStats_1.4.1
## [7] compiler_4.4.1 png_0.1-8 systemfonts_1.1.0
## [10] vctrs_0.6.5 stringr_1.5.1 pkgconfig_2.0.3
## [13] shape_1.4.6.1 crayon_1.5.3 fastmap_1.2.0
## [16] backports_1.5.0 labeling_0.4.3 ca_0.71.1
## [19] utf8_1.2.4 rmarkdown_2.28 purrr_1.0.2
## [22] xfun_0.48 cachem_1.1.0 jsonlite_1.8.9
## [25] progress_1.2.3 highr_0.11 tweenr_2.0.3
## [28] prettyunits_1.2.0 broom_1.0.7 parallel_4.4.1
## [31] cluster_2.1.6 R6_2.5.1 bslib_0.8.0
## [34] stringi_1.8.4 car_3.1-3 jquerylib_0.1.4
## [37] Rcpp_1.0.13 assertthat_0.2.1 iterators_1.0.14
## [40] knitr_1.48 IRanges_2.38.1 tidyselect_1.2.1
## [43] rstudioapi_0.16.0 abind_1.4-8 yaml_2.3.10
## [46] TSP_1.2-4 doParallel_1.0.17 codetools_0.2-20
## [49] tibble_3.2.1 plyr_1.8.9 withr_3.0.1
## [52] evaluate_1.0.0 xml2_1.3.6 pillar_1.9.0
## [55] carData_3.0-5 foreach_1.5.2 stats4_4.4.1
## [58] generics_0.1.3 hms_1.1.3 S4Vectors_0.42.1
## [61] munsell_0.5.1 scales_1.3.0 glue_1.8.0
## [64] lazyeval_0.2.2 tools_4.4.1 dendextend_1.18.0
## [67] data.table_1.16.0 webshot_0.5.5 ggsignif_0.6.4
## [70] registry_0.5-1 crosstalk_1.2.1 seriation_1.5.6
## [73] colorspace_2.1-1 Formula_1.2-5 cli_3.6.3
## [76] fansi_1.0.6 svglite_2.1.3 gtable_0.3.5
## [79] rstatix_0.7.2 sass_0.4.9 digest_0.6.37
## [82] BiocGenerics_0.50.0 rjson_0.2.23 htmlwidgets_1.6.4
## [85] farver_2.1.2 htmltools_0.5.8.1 lifecycle_1.0.4
## [88] httr_1.4.7 GlobalOptions_0.1.2