Data Analysis PXD040621#
Plan
read data and log2 transform intensity values
aggregate peptide intensities to protein intensities
format data from long to wide format
remove contaminant proteins
check for missing values
Clustermap of sample and proteins
differential analysis (Volcano Plots)
Enrichment Analysis
check for maltose update pathway (Fig. 3 in paper)
%pip install acore vuecore
Show code cell output
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: acore in /home/runner/.local/lib/python3.12/site-packages (0.2.0)
Requirement already satisfied: vuecore in /home/runner/.local/lib/python3.12/site-packages (0.0.5)
Requirement already satisfied: numpy<2.1.0 in /home/runner/.local/lib/python3.12/site-packages (from acore) (2.0.2)
Requirement already satisfied: pandas in /home/runner/.local/lib/python3.12/site-packages (from acore) (2.3.1)
Requirement already satisfied: dsp-pandas in /home/runner/.local/lib/python3.12/site-packages (from acore) (0.0.5)
Requirement already satisfied: scipy in /home/runner/.local/lib/python3.12/site-packages (from acore) (1.15.3)
Requirement already satisfied: scikit-learn>=1.5 in /home/runner/.local/lib/python3.12/site-packages (from acore) (1.7.0)
Requirement already satisfied: networkx in /home/runner/.local/lib/python3.12/site-packages (from acore) (3.5)
Requirement already satisfied: biopython in /home/runner/.local/lib/python3.12/site-packages (from acore) (1.85)
Requirement already satisfied: combat in /home/runner/.local/lib/python3.12/site-packages (from acore) (0.3.3)
Requirement already satisfied: gseapy!=1.1.5 in /home/runner/.local/lib/python3.12/site-packages (from acore) (1.1.9)
Requirement already satisfied: kmapper in /home/runner/.local/lib/python3.12/site-packages (from acore) (2.1.0)
Requirement already satisfied: lifelines in /home/runner/.local/lib/python3.12/site-packages (from acore) (0.30.0)
Requirement already satisfied: pingouin in /home/runner/.local/lib/python3.12/site-packages (from acore) (0.5.5)
Requirement already satisfied: python-louvain in /home/runner/.local/lib/python3.12/site-packages (from acore) (0.16)
Requirement already satisfied: PyWGCNA!=2.2.0 in /home/runner/.local/lib/python3.12/site-packages (from acore) (1.16.0)
Requirement already satisfied: snfpy in /home/runner/.local/lib/python3.12/site-packages (from acore) (0.2.2)
Requirement already satisfied: umap-learn in /home/runner/.local/lib/python3.12/site-packages (from acore) (0.5.9.post2)
Requirement already satisfied: statsmodels in /home/runner/.local/lib/python3.12/site-packages (from acore) (0.14.5)
Requirement already satisfied: inmoose in /home/runner/.local/lib/python3.12/site-packages (from acore) (0.7.8)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from acore) (2.31.0)
Requirement already satisfied: rarfile in /home/runner/.local/lib/python3.12/site-packages (from acore) (4.2)
Requirement already satisfied: plotly in /home/runner/.local/lib/python3.12/site-packages (from vuecore) (5.15.0)
Requirement already satisfied: beautifulsoup4 in /home/runner/.local/lib/python3.12/site-packages (from vuecore) (4.13.4)
Requirement already satisfied: dash in /home/runner/.local/lib/python3.12/site-packages (from vuecore) (3.1.1)
Requirement already satisfied: matplotlib in /home/runner/.local/lib/python3.12/site-packages (from vuecore) (3.10.3)
Requirement already satisfied: kaleido in /home/runner/.local/lib/python3.12/site-packages (from vuecore) (0.2.0)
Requirement already satisfied: pyvis in /home/runner/.local/lib/python3.12/site-packages (from vuecore) (0.3.2)
Requirement already satisfied: wordcloud in /home/runner/.local/lib/python3.12/site-packages (from vuecore) (1.9.4)
Requirement already satisfied: cyjupyter in /home/runner/.local/lib/python3.12/site-packages (from vuecore) (0.2.0)
Requirement already satisfied: nltk in /home/runner/.local/lib/python3.12/site-packages (from vuecore) (3.9.1)
Requirement already satisfied: webweb in /home/runner/.local/lib/python3.12/site-packages (from vuecore) (0.0.37)
Requirement already satisfied: dash-cytoscape in /home/runner/.local/lib/python3.12/site-packages (from vuecore) (1.0.2)
Requirement already satisfied: contourpy>=1.0.1 in /home/runner/.local/lib/python3.12/site-packages (from matplotlib->vuecore) (1.3.2)
Requirement already satisfied: cycler>=0.10 in /home/runner/.local/lib/python3.12/site-packages (from matplotlib->vuecore) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/runner/.local/lib/python3.12/site-packages (from matplotlib->vuecore) (4.58.5)
Requirement already satisfied: kiwisolver>=1.3.1 in /home/runner/.local/lib/python3.12/site-packages (from matplotlib->vuecore) (1.4.8)
Requirement already satisfied: packaging>=20.0 in /home/runner/.local/lib/python3.12/site-packages (from matplotlib->vuecore) (24.2)
Requirement already satisfied: pillow>=8 in /home/runner/.local/lib/python3.12/site-packages (from matplotlib->vuecore) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/lib/python3/dist-packages (from matplotlib->vuecore) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/lib/python3/dist-packages (from matplotlib->vuecore) (2.8.2)
Requirement already satisfied: anndata>=0.8.0 in /home/runner/.local/lib/python3.12/site-packages (from PyWGCNA!=2.2.0->acore) (0.11.4)
Requirement already satisfied: biomart>=0.9.2 in /home/runner/.local/lib/python3.12/site-packages (from PyWGCNA!=2.2.0->acore) (0.9.2)
Requirement already satisfied: reactome2py>=3.0.0 in /home/runner/.local/lib/python3.12/site-packages (from PyWGCNA!=2.2.0->acore) (3.0.0)
Requirement already satisfied: rsrc>=0.1.3 in /home/runner/.local/lib/python3.12/site-packages (from PyWGCNA!=2.2.0->acore) (0.1.3)
Requirement already satisfied: seaborn>=0.11.1 in /home/runner/.local/lib/python3.12/site-packages (from PyWGCNA!=2.2.0->acore) (0.13.2)
Requirement already satisfied: setuptools>=59.8.0 in /usr/lib/python3/dist-packages (from PyWGCNA!=2.2.0->acore) (68.1.2)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas->acore) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /home/runner/.local/lib/python3.12/site-packages (from pandas->acore) (2025.2)
Requirement already satisfied: ipython>=5.3.0 in /home/runner/.local/lib/python3.12/site-packages (from pyvis->vuecore) (9.4.0)
Requirement already satisfied: jinja2>=2.9.6 in /usr/lib/python3/dist-packages (from pyvis->vuecore) (3.1.2)
Requirement already satisfied: jsonpickle>=1.4.1 in /home/runner/.local/lib/python3.12/site-packages (from pyvis->vuecore) (4.1.1)
Requirement already satisfied: joblib>=1.2.0 in /home/runner/.local/lib/python3.12/site-packages (from scikit-learn>=1.5->acore) (1.5.1)
Requirement already satisfied: threadpoolctl>=3.1.0 in /home/runner/.local/lib/python3.12/site-packages (from scikit-learn>=1.5->acore) (3.6.0)
Requirement already satisfied: patsy>=0.5.6 in /home/runner/.local/lib/python3.12/site-packages (from statsmodels->acore) (1.0.1)
Requirement already satisfied: soupsieve>1.2 in /home/runner/.local/lib/python3.12/site-packages (from beautifulsoup4->vuecore) (2.7)
Requirement already satisfied: typing-extensions>=4.0.0 in /usr/lib/python3/dist-packages (from beautifulsoup4->vuecore) (4.10.0)
Requirement already satisfied: mpmath>=1.1.0 in /home/runner/.local/lib/python3.12/site-packages (from combat->acore) (1.3.0)
Requirement already satisfied: ipywidgets>=7.0.0 in /home/runner/.local/lib/python3.12/site-packages (from cyjupyter->vuecore) (8.1.7)
Requirement already satisfied: Flask<3.2,>=1.0.4 in /home/runner/.local/lib/python3.12/site-packages (from dash->vuecore) (3.1.1)
Requirement already satisfied: Werkzeug<3.2 in /home/runner/.local/lib/python3.12/site-packages (from dash->vuecore) (3.1.3)
Requirement already satisfied: importlib-metadata in /home/runner/.local/lib/python3.12/site-packages (from dash->vuecore) (8.7.0)
Requirement already satisfied: retrying in /home/runner/.local/lib/python3.12/site-packages (from dash->vuecore) (1.4.0)
Requirement already satisfied: nest-asyncio in /home/runner/.local/lib/python3.12/site-packages (from dash->vuecore) (1.6.0)
Requirement already satisfied: tenacity>=6.2.0 in /home/runner/.local/lib/python3.12/site-packages (from plotly->vuecore) (9.1.2)
Requirement already satisfied: openpyxl in /home/runner/.local/lib/python3.12/site-packages (from dsp-pandas->acore) (3.1.5)
Requirement already satisfied: fastcluster in /home/runner/.local/lib/python3.12/site-packages (from inmoose->acore) (1.3.0)
Requirement already satisfied: autograd>=1.5 in /home/runner/.local/lib/python3.12/site-packages (from lifelines->acore) (1.8.0)
Requirement already satisfied: autograd-gamma>=0.3 in /home/runner/.local/lib/python3.12/site-packages (from lifelines->acore) (0.5.0)
Requirement already satisfied: formulaic>=0.2.2 in /home/runner/.local/lib/python3.12/site-packages (from lifelines->acore) (1.1.1)
Requirement already satisfied: click in /usr/lib/python3/dist-packages (from nltk->vuecore) (8.1.6)
Requirement already satisfied: regex>=2021.8.3 in /home/runner/.local/lib/python3.12/site-packages (from nltk->vuecore) (2024.11.6)
Requirement already satisfied: tqdm in /home/runner/.local/lib/python3.12/site-packages (from nltk->vuecore) (4.67.1)
Requirement already satisfied: pandas-flavor in /home/runner/.local/lib/python3.12/site-packages (from pingouin->acore) (0.7.0)
Requirement already satisfied: tabulate in /home/runner/.local/lib/python3.12/site-packages (from pingouin->acore) (0.9.0)
Requirement already satisfied: numba>=0.51.2 in /home/runner/.local/lib/python3.12/site-packages (from umap-learn->acore) (0.61.2)
Requirement already satisfied: pynndescent>=0.5 in /home/runner/.local/lib/python3.12/site-packages (from umap-learn->acore) (0.5.13)
Requirement already satisfied: array-api-compat!=1.5,>1.4 in /home/runner/.local/lib/python3.12/site-packages (from anndata>=0.8.0->PyWGCNA!=2.2.0->acore) (1.12.0)
Requirement already satisfied: h5py>=3.7 in /home/runner/.local/lib/python3.12/site-packages (from anndata>=0.8.0->PyWGCNA!=2.2.0->acore) (3.14.0)
Requirement already satisfied: natsort in /home/runner/.local/lib/python3.12/site-packages (from anndata>=0.8.0->PyWGCNA!=2.2.0->acore) (8.4.0)
Requirement already satisfied: blinker>=1.9.0 in /home/runner/.local/lib/python3.12/site-packages (from Flask<3.2,>=1.0.4->dash->vuecore) (1.9.0)
Requirement already satisfied: itsdangerous>=2.2.0 in /home/runner/.local/lib/python3.12/site-packages (from Flask<3.2,>=1.0.4->dash->vuecore) (2.2.0)
Requirement already satisfied: markupsafe>=2.1.1 in /usr/lib/python3/dist-packages (from Flask<3.2,>=1.0.4->dash->vuecore) (2.1.5)
Requirement already satisfied: interface-meta>=1.2.0 in /home/runner/.local/lib/python3.12/site-packages (from formulaic>=0.2.2->lifelines->acore) (1.3.0)
Requirement already satisfied: wrapt>=1.0 in /home/runner/.local/lib/python3.12/site-packages (from formulaic>=0.2.2->lifelines->acore) (1.17.2)
Requirement already satisfied: decorator in /home/runner/.local/lib/python3.12/site-packages (from ipython>=5.3.0->pyvis->vuecore) (5.2.1)
Requirement already satisfied: ipython-pygments-lexers in /home/runner/.local/lib/python3.12/site-packages (from ipython>=5.3.0->pyvis->vuecore) (1.1.1)
Requirement already satisfied: jedi>=0.16 in /home/runner/.local/lib/python3.12/site-packages (from ipython>=5.3.0->pyvis->vuecore) (0.19.2)
Requirement already satisfied: matplotlib-inline in /home/runner/.local/lib/python3.12/site-packages (from ipython>=5.3.0->pyvis->vuecore) (0.1.7)
Requirement already satisfied: pexpect>4.3 in /usr/lib/python3/dist-packages (from ipython>=5.3.0->pyvis->vuecore) (4.9.0)
Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in /home/runner/.local/lib/python3.12/site-packages (from ipython>=5.3.0->pyvis->vuecore) (3.0.51)
Requirement already satisfied: pygments>=2.4.0 in /usr/lib/python3/dist-packages (from ipython>=5.3.0->pyvis->vuecore) (2.17.2)
Requirement already satisfied: stack_data in /home/runner/.local/lib/python3.12/site-packages (from ipython>=5.3.0->pyvis->vuecore) (0.6.3)
Requirement already satisfied: traitlets>=5.13.0 in /home/runner/.local/lib/python3.12/site-packages (from ipython>=5.3.0->pyvis->vuecore) (5.14.3)
Requirement already satisfied: comm>=0.1.3 in /home/runner/.local/lib/python3.12/site-packages (from ipywidgets>=7.0.0->cyjupyter->vuecore) (0.2.2)
Requirement already satisfied: widgetsnbextension~=4.0.14 in /home/runner/.local/lib/python3.12/site-packages (from ipywidgets>=7.0.0->cyjupyter->vuecore) (4.0.14)
Requirement already satisfied: jupyterlab_widgets~=3.0.15 in /home/runner/.local/lib/python3.12/site-packages (from ipywidgets>=7.0.0->cyjupyter->vuecore) (3.0.15)
Requirement already satisfied: llvmlite<0.45,>=0.44.0dev0 in /home/runner/.local/lib/python3.12/site-packages (from numba>=0.51.2->umap-learn->acore) (0.44.0)
Requirement already satisfied: json5>=0.8.4 in /home/runner/.local/lib/python3.12/site-packages (from reactome2py>=3.0.0->PyWGCNA!=2.2.0->acore) (0.12.0)
Requirement already satisfied: memoir>=0.0.3 in /home/runner/.local/lib/python3.12/site-packages (from rsrc>=0.1.3->PyWGCNA!=2.2.0->acore) (0.0.3)
Requirement already satisfied: reprit>=0.3.0 in /home/runner/.local/lib/python3.12/site-packages (from rsrc>=0.1.3->PyWGCNA!=2.2.0->acore) (0.9.0)
Requirement already satisfied: zipp>=3.20 in /home/runner/.local/lib/python3.12/site-packages (from importlib-metadata->dash->vuecore) (3.23.0)
Requirement already satisfied: et-xmlfile in /home/runner/.local/lib/python3.12/site-packages (from openpyxl->dsp-pandas->acore) (2.0.0)
Requirement already satisfied: xarray in /home/runner/.local/lib/python3.12/site-packages (from pandas-flavor->pingouin->acore) (2025.7.0)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /home/runner/.local/lib/python3.12/site-packages (from jedi>=0.16->ipython>=5.3.0->pyvis->vuecore) (0.8.4)
Requirement already satisfied: wcwidth in /home/runner/.local/lib/python3.12/site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=5.3.0->pyvis->vuecore) (0.2.13)
Requirement already satisfied: executing>=1.2.0 in /home/runner/.local/lib/python3.12/site-packages (from stack_data->ipython>=5.3.0->pyvis->vuecore) (2.2.0)
Requirement already satisfied: asttokens>=2.1.0 in /home/runner/.local/lib/python3.12/site-packages (from stack_data->ipython>=5.3.0->pyvis->vuecore) (3.0.0)
Requirement already satisfied: pure-eval in /home/runner/.local/lib/python3.12/site-packages (from stack_data->ipython>=5.3.0->pyvis->vuecore) (0.2.3)
Note: you may need to restart the kernel to use updated packages.
from pathlib import Path
import acore.differential_regulation
import acore.enrichment_analysis
import acore.normalization
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import scipy.stats
import seaborn as sns
import vuecore
from acore.io.uniprot import fetch_annotations, process_annotations
from vuecore.viz import get_enrichment_plots
Read in the data#
file_in
: input file with the quantified peptide data in MSstats format as provided by quantms
The file will be loaded from the repository if it is not present.
file_in = Path("data/PXD040621/processed/PXD040621.sdrf_openms_design_msstats_in.csv")
if not file_in.exists():
file_in = (
"https://raw.githubusercontent.com/biosustain/dsp_course_proteomics_intro/HEAD"
"/data/PXD040621/processed/PXD040621.sdrf_openms_design_msstats_in.csv"
)
df = pd.read_csv(file_in, sep=",", header=0) # .set_index([])
df.head()
ProteinName | PeptideSequence | PrecursorCharge | FragmentIon | ProductCharge | IsotopeLabelType | Condition | BioReplicate | Run | Intensity | Reference | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | sp|P00959|SYM_ECOLI | AAAAPVTGPLADDPIQETITFDDFAK | 2 | NaN | 0 | L | Control | 1 | 1 | 201,065,600.000 | 20220830_JL-4884_Forster_Ecoli_DMSO_rep1_EG-1.... |
1 | sp|P00959|SYM_ECOLI | AAAAPVTGPLADDPIQETITFDDFAK | 2 | NaN | 0 | L | Control | 2 | 2 | 74,844,780.000 | 20220830_JL-4884_Forster_Ecoli_DMSO_rep2_EG-2.... |
2 | sp|P00959|SYM_ECOLI | AAAAPVTGPLADDPIQETITFDDFAK | 2 | NaN | 0 | L | Control | 3 | 3 | 67,591,390.000 | 20220830_JL-4884_Forster_Ecoli_DMSO_rep3_EG-3.... |
3 | sp|P00959|SYM_ECOLI | AAAAPVTGPLADDPIQETITFDDFAK | 2 | NaN | 0 | L | Control | 4 | 4 | 76,388,810.000 | 20220830_JL-4884_Forster_Ecoli_DMSO_rep4_EG-4.... |
4 | sp|P00959|SYM_ECOLI | AAAAPVTGPLADDPIQETITFDDFAK | 2 | NaN | 0 | L | Sulforaphane | 5 | 5 | 116,247,100.000 | 20220830_JL-4884_Forster_Ecoli_Suf_rep1_EG-5.mzML |
define the output folder for our VueGen report which we will create later
out_dir = "data/PXD040621/report/"
out_dir = Path(out_dir)
out_dir.mkdir(parents=True, exist_ok=True)
We have the following columns in the data:
cols = [
"ProteinName",
"PeptideSequence",
"PrecursorCharge",
"FragmentIon",
"ProductCharge",
"IsotopeLabelType",
"Condition",
"BioReplicate",
"Run",
"Intensity",
"Reference",
]
Log2 transform the intensity values#
log2 transformations are common for lognormal distributed data
df["Intensity"] = np.log2(df["Intensity"].astype(float))
df.head()
ProteinName | PeptideSequence | PrecursorCharge | FragmentIon | ProductCharge | IsotopeLabelType | Condition | BioReplicate | Run | Intensity | Reference | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | sp|P00959|SYM_ECOLI | AAAAPVTGPLADDPIQETITFDDFAK | 2 | NaN | 0 | L | Control | 1 | 1 | 27.583 | 20220830_JL-4884_Forster_Ecoli_DMSO_rep1_EG-1.... |
1 | sp|P00959|SYM_ECOLI | AAAAPVTGPLADDPIQETITFDDFAK | 2 | NaN | 0 | L | Control | 2 | 2 | 26.157 | 20220830_JL-4884_Forster_Ecoli_DMSO_rep2_EG-2.... |
2 | sp|P00959|SYM_ECOLI | AAAAPVTGPLADDPIQETITFDDFAK | 2 | NaN | 0 | L | Control | 3 | 3 | 26.010 | 20220830_JL-4884_Forster_Ecoli_DMSO_rep3_EG-3.... |
3 | sp|P00959|SYM_ECOLI | AAAAPVTGPLADDPIQETITFDDFAK | 2 | NaN | 0 | L | Control | 4 | 4 | 26.187 | 20220830_JL-4884_Forster_Ecoli_DMSO_rep4_EG-4.... |
4 | sp|P00959|SYM_ECOLI | AAAAPVTGPLADDPIQETITFDDFAK | 2 | NaN | 0 | L | Sulforaphane | 5 | 5 | 26.793 | 20220830_JL-4884_Forster_Ecoli_Suf_rep1_EG-5.mzML |
Exploratory and Data Quality Plots (peptide level)#
df[“BioReplicate”] = df[“BioReplicate”].replace({5: 1, 6: 2, 7: 3, 8: 4}) fg = sns.displot( data=df.rename(columns={“BioReplicate”: “Rep”, “Condition”: “C.”}), x=”Intensity”, col=”C.”, row=”Rep”, # hue=”Reactor_ID”, kind=”kde”, height=2, aspect=1.1, )
Aggregate the peptide intensities to protein intensities#
we use the median of the peptide intensities for each protein
There are more sophisticated ways to do this, e.g. using MaxLFQ, iBAQ, FlashLFQ, DirectLFQ, etc.
shorten sample name for readability
proteins = (
df.groupby(["ProteinName", "Reference"])["Intensity"].median().unstack(level=0)
)
proteins.index = proteins.index.str.split("_").str[4:6].str.join("_")
proteins
ProteinName | CON_ENSEMBL:ENSBTAP00000024462;CON_ENSEMBL:ENSBTAP00000024466 | CON_O76013 | CON_P00761 | CON_P01966 | CON_P02070 | CON_P02533 | CON_P02538 | CON_P02662 | CON_P02663 | CON_P02666 | ... | sp|Q47319|TAPT_ECOLI | sp|Q47536|YAIP_ECOLI | sp|Q47622|SAPA_ECOLI | sp|Q47679|YAFV_ECOLI | sp|Q47710|YQJK_ECOLI | sp|Q57261|TRUD_ECOLI | sp|Q59385-2|COPA_ECOLI | sp|Q59385|COPA_ECOLI | sp|Q7DFV3|YMGG_ECOLI | sp|Q93K97|ADPP_ECOLI |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Reference | |||||||||||||||||||||
DMSO_rep1 | NaN | NaN | 31.055 | 24.202 | 26.890 | 28.336 | 25.828 | NaN | 26.762 | NaN | ... | NaN | NaN | 25.343 | NaN | 27.038 | 28.411 | 23.555 | 27.640 | 28.513 | 27.223 |
DMSO_rep2 | NaN | 25.647 | 30.117 | 25.711 | NaN | 25.595 | 26.210 | NaN | 27.593 | NaN | ... | NaN | NaN | NaN | NaN | 26.841 | 27.941 | 25.240 | 27.244 | 27.621 | 25.291 |
DMSO_rep3 | 25.529 | NaN | 33.767 | NaN | NaN | 26.992 | 25.441 | 28.129 | 28.895 | 29.559 | ... | NaN | NaN | 24.576 | NaN | 26.609 | 27.070 | NaN | 27.525 | 27.679 | 24.359 |
DMSO_rep4 | NaN | NaN | 33.702 | NaN | NaN | 27.205 | 26.093 | NaN | 27.764 | NaN | ... | NaN | NaN | 25.945 | 23.902 | 27.164 | 26.680 | 22.524 | 27.404 | 27.256 | 25.767 |
Suf_rep1 | NaN | NaN | 34.068 | NaN | NaN | NaN | 24.442 | NaN | 25.938 | NaN | ... | NaN | NaN | 25.836 | NaN | 26.819 | 27.995 | NaN | 27.499 | 28.090 | 25.956 |
Suf_rep2 | NaN | NaN | 33.252 | NaN | 26.571 | 26.495 | 25.070 | NaN | NaN | NaN | ... | NaN | NaN | NaN | 24.162 | 27.268 | 27.055 | NaN | 27.667 | 27.526 | 25.231 |
Suf_rep3 | NaN | NaN | 31.844 | 25.927 | NaN | 26.249 | 28.169 | 22.968 | 26.466 | NaN | ... | 25.463 | NaN | NaN | NaN | 24.741 | 27.313 | NaN | 27.708 | 27.814 | 26.103 |
Suf_rep4 | NaN | NaN | 33.857 | NaN | NaN | 27.208 | 24.690 | NaN | NaN | NaN | ... | NaN | 24.468 | 24.757 | 24.040 | 27.071 | 26.643 | NaN | 27.848 | 27.605 | 26.178 |
8 rows × 2306 columns
Remove contaminant proteins#
Remove the contaminant proteins which were added to the fasta file used in the data processing. Contaminant proteins are e.g. creation which gets into the sample from the human skin or hair when the sample is prepared.
These are filtered out as they are most of the time not relevant, but a contamination.
decoy_proteins = proteins.filter(like="CON_", axis=1)
proteins = proteins.drop(decoy_proteins.columns, axis=1)
proteins
ProteinName | sp|A5A613|YCIY_ECOLI | sp|P00350|6PGD_ECOLI | sp|P00363|FRDA_ECOLI | sp|P00370|DHE4_ECOLI | sp|P00393|NDH_ECOLI | sp|P00448|SODM_ECOLI | sp|P00452|RIR1_ECOLI | sp|P00490|PHSM_ECOLI | sp|P00509|AAT_ECOLI | sp|P00547|KHSE_ECOLI | ... | sp|Q47319|TAPT_ECOLI | sp|Q47536|YAIP_ECOLI | sp|Q47622|SAPA_ECOLI | sp|Q47679|YAFV_ECOLI | sp|Q47710|YQJK_ECOLI | sp|Q57261|TRUD_ECOLI | sp|Q59385-2|COPA_ECOLI | sp|Q59385|COPA_ECOLI | sp|Q7DFV3|YMGG_ECOLI | sp|Q93K97|ADPP_ECOLI |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Reference | |||||||||||||||||||||
DMSO_rep1 | 27.180 | 28.152 | 30.247 | 27.459 | 26.824 | 25.610 | NaN | 27.864 | 29.979 | 26.065 | ... | NaN | NaN | 25.343 | NaN | 27.038 | 28.411 | 23.555 | 27.640 | 28.513 | 27.223 |
DMSO_rep2 | NaN | 27.926 | 30.262 | 26.873 | 26.757 | 24.901 | NaN | 26.439 | 29.048 | NaN | ... | NaN | NaN | NaN | NaN | 26.841 | 27.941 | 25.240 | 27.244 | 27.621 | 25.291 |
DMSO_rep3 | NaN | 27.653 | 29.970 | 26.600 | 25.442 | 25.054 | 27.172 | 26.382 | 28.777 | NaN | ... | NaN | NaN | 24.576 | NaN | 26.609 | 27.070 | NaN | 27.525 | 27.679 | 24.359 |
DMSO_rep4 | NaN | 27.152 | 29.471 | 26.439 | 25.799 | 24.790 | NaN | 26.820 | 29.485 | 25.524 | ... | NaN | NaN | 25.945 | 23.902 | 27.164 | 26.680 | 22.524 | 27.404 | 27.256 | 25.767 |
Suf_rep1 | NaN | 27.442 | 30.005 | 27.400 | 26.671 | 25.564 | NaN | 27.685 | 29.295 | NaN | ... | NaN | NaN | 25.836 | NaN | 26.819 | 27.995 | NaN | 27.499 | 28.090 | 25.956 |
Suf_rep2 | NaN | 27.032 | 30.086 | 27.189 | 26.886 | 25.378 | 27.364 | 27.531 | 29.284 | NaN | ... | NaN | NaN | NaN | 24.162 | 27.268 | 27.055 | NaN | 27.667 | 27.526 | 25.231 |
Suf_rep3 | NaN | 27.815 | 29.904 | 27.139 | 26.711 | 25.318 | 26.062 | 27.545 | 29.357 | 26.265 | ... | 25.463 | NaN | NaN | NaN | 24.741 | 27.313 | NaN | 27.708 | 27.814 | 26.103 |
Suf_rep4 | NaN | 27.587 | 29.575 | 27.224 | 26.321 | 25.360 | 25.101 | 27.705 | 29.584 | 26.427 | ... | NaN | 24.468 | 24.757 | 24.040 | 27.071 | 26.643 | NaN | 27.848 | 27.605 | 26.178 |
8 rows × 2269 columns
Create a label for each sample based on the metadata.
we will use a string in the sample name, but you can see how the metadata is organized
meta = df[["Condition", "BioReplicate", "Run", "Reference"]].drop_duplicates()
meta
Condition | BioReplicate | Run | Reference | |
---|---|---|---|---|
0 | Control | 1 | 1 | 20220830_JL-4884_Forster_Ecoli_DMSO_rep1_EG-1.... |
1 | Control | 2 | 2 | 20220830_JL-4884_Forster_Ecoli_DMSO_rep2_EG-2.... |
2 | Control | 3 | 3 | 20220830_JL-4884_Forster_Ecoli_DMSO_rep3_EG-3.... |
3 | Control | 4 | 4 | 20220830_JL-4884_Forster_Ecoli_DMSO_rep4_EG-4.... |
4 | Sulforaphane | 5 | 5 | 20220830_JL-4884_Forster_Ecoli_Suf_rep1_EG-5.mzML |
5 | Sulforaphane | 6 | 6 | 20220830_JL-4884_Forster_Ecoli_Suf_rep2_EG-6.mzML |
6 | Sulforaphane | 7 | 7 | 20220830_JL-4884_Forster_Ecoli_Suf_rep3_EG-7.mzML |
7 | Sulforaphane | 8 | 8 | 20220830_JL-4884_Forster_Ecoli_Suf_rep4_EG-8.mzML |
label_encoding = {0: "control", 1: "10 µm sulforaphane"}
label_suf = pd.Series(
proteins.index.str.contains("Suf_").astype(int),
index=proteins.index,
name="label_suf",
dtype=np.int8,
).map(label_encoding)
label_suf
Reference
DMSO_rep1 control
DMSO_rep2 control
DMSO_rep3 control
DMSO_rep4 control
Suf_rep1 10 µm sulforaphane
Suf_rep2 10 µm sulforaphane
Suf_rep3 10 µm sulforaphane
Suf_rep4 10 µm sulforaphane
Name: label_suf, dtype: object
Plot the data completeness for each protein.#
view_name = "Protein"
out_dir_subsection = out_dir / "1_data" / "completeness"
out_dir_subsection.mkdir(parents=True, exist_ok=True)
view_name = "Protein"
ax = (
proteins.notna()
.sum()
.sort_values()
.plot(
rot=45,
ylabel=f"Number of Samples {view_name.lower()} was observed in",
)
)
ax.get_figure().savefig(
out_dir_subsection / f"data_completeness_step_plot.png",
bbox_inches="tight",
dpi=300,
)

view_name = "Protein"
ax = (
proteins.notna()
.sum()
.value_counts()
.sort_index(ascending=False)
.plot(
kind="bar",
title=f"Data Completeness per {view_name}",
xlabel=f"Number of Samples {view_name.lower()} was observed in",
ylabel=f"Number of {view_name}s",
color="steelblue",
figsize=(10, 6),
)
)
ax.get_figure().savefig(
out_dir_subsection / f"data_completeness_bar_plot.png",
bbox_inches="tight",
dpi=300,
)

# Explode column names to examine split by '|'
proteins_meta = (
proteins.columns.str.split("|", expand=True)
.to_frame()
.dropna(how="any", axis=1)
.reset_index(drop=True)
)
proteins_meta.columns = ["Source", "ProteinName", "GeneName"]
proteins_meta["GeneName"] = proteins_meta["GeneName"].str.split("_").str[0]
proteins_meta.index = proteins.columns
proteins_meta.index.name = "identifier"
proteins_meta
Source | ProteinName | GeneName | |
---|---|---|---|
identifier | |||
sp|A5A613|YCIY_ECOLI | sp | A5A613 | YCIY |
sp|P00350|6PGD_ECOLI | sp | P00350 | 6PGD |
sp|P00363|FRDA_ECOLI | sp | P00363 | FRDA |
sp|P00370|DHE4_ECOLI | sp | P00370 | DHE4 |
sp|P00393|NDH_ECOLI | sp | P00393 | NDH |
... | ... | ... | ... |
sp|Q57261|TRUD_ECOLI | sp | Q57261 | TRUD |
sp|Q59385-2|COPA_ECOLI | sp | Q59385-2 | COPA |
sp|Q59385|COPA_ECOLI | sp | Q59385 | COPA |
sp|Q7DFV3|YMGG_ECOLI | sp | Q7DFV3 | YMGG |
sp|Q93K97|ADPP_ECOLI | sp | Q93K97 | ADPP |
2269 rows × 3 columns
For later in the enrichment analysis let’s replace the protein identifier from the Fasta file with the UNIPROT ID
proteins.columns = proteins_meta["ProteinName"].rename("UniprotID")
proteins
UniprotID | A5A613 | P00350 | P00363 | P00370 | P00393 | P00448 | P00452 | P00490 | P00509 | P00547 | ... | Q47319 | Q47536 | Q47622 | Q47679 | Q47710 | Q57261 | Q59385-2 | Q59385 | Q7DFV3 | Q93K97 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Reference | |||||||||||||||||||||
DMSO_rep1 | 27.180 | 28.152 | 30.247 | 27.459 | 26.824 | 25.610 | NaN | 27.864 | 29.979 | 26.065 | ... | NaN | NaN | 25.343 | NaN | 27.038 | 28.411 | 23.555 | 27.640 | 28.513 | 27.223 |
DMSO_rep2 | NaN | 27.926 | 30.262 | 26.873 | 26.757 | 24.901 | NaN | 26.439 | 29.048 | NaN | ... | NaN | NaN | NaN | NaN | 26.841 | 27.941 | 25.240 | 27.244 | 27.621 | 25.291 |
DMSO_rep3 | NaN | 27.653 | 29.970 | 26.600 | 25.442 | 25.054 | 27.172 | 26.382 | 28.777 | NaN | ... | NaN | NaN | 24.576 | NaN | 26.609 | 27.070 | NaN | 27.525 | 27.679 | 24.359 |
DMSO_rep4 | NaN | 27.152 | 29.471 | 26.439 | 25.799 | 24.790 | NaN | 26.820 | 29.485 | 25.524 | ... | NaN | NaN | 25.945 | 23.902 | 27.164 | 26.680 | 22.524 | 27.404 | 27.256 | 25.767 |
Suf_rep1 | NaN | 27.442 | 30.005 | 27.400 | 26.671 | 25.564 | NaN | 27.685 | 29.295 | NaN | ... | NaN | NaN | 25.836 | NaN | 26.819 | 27.995 | NaN | 27.499 | 28.090 | 25.956 |
Suf_rep2 | NaN | 27.032 | 30.086 | 27.189 | 26.886 | 25.378 | 27.364 | 27.531 | 29.284 | NaN | ... | NaN | NaN | NaN | 24.162 | 27.268 | 27.055 | NaN | 27.667 | 27.526 | 25.231 |
Suf_rep3 | NaN | 27.815 | 29.904 | 27.139 | 26.711 | 25.318 | 26.062 | 27.545 | 29.357 | 26.265 | ... | 25.463 | NaN | NaN | NaN | 24.741 | 27.313 | NaN | 27.708 | 27.814 | 26.103 |
Suf_rep4 | NaN | 27.587 | 29.575 | 27.224 | 26.321 | 25.360 | 25.101 | 27.705 | 29.584 | 26.427 | ... | NaN | 24.468 | 24.757 | 24.040 | 27.071 | 26.643 | NaN | 27.848 | 27.605 | 26.178 |
8 rows × 2269 columns
And let’s save a table with the data for inspection
proteins_meta.to_csv(out_dir_subsection / "proteins_identifiers.csv")
proteins.to_csv(out_dir_subsection / "proteins.csv")
Hierarchical Clustering of data#
using completely observed data only Find correlations in data
out_dir_subsection = out_dir / "1_data" / "clustermap"
out_dir_subsection.mkdir(parents=True, exist_ok=True)
_group_labels = label_encoding.values()
lut = dict(zip(_group_labels, [f"C{i}" for i in range(len(_group_labels))]))
row_colors = label_suf.map(lut).rename("group color")
row_colors
Reference
DMSO_rep1 C0
DMSO_rep2 C0
DMSO_rep3 C0
DMSO_rep4 C0
Suf_rep1 C1
Suf_rep2 C1
Suf_rep3 C1
Suf_rep4 C1
Name: group color, dtype: object
vuecore.set_font_sizes(7)
cg = sns.clustermap(
proteins.dropna(how="any", axis=1),
method="ward",
row_colors=row_colors,
figsize=(11, 6),
robust=True,
xticklabels=True,
yticklabels=True,
)
fig = cg.figure
cg.ax_heatmap.set_xlabel("Proteins")
cg.ax_heatmap.set_ylabel("Sample ID")
vuecore.select_xticks(cg.ax_heatmap)
handles = [
plt.Line2D([0], [0], marker="o", color="w", markerfacecolor=lut[name], markersize=8)
for name in lut
]
cg.ax_cbar.legend(
handles, _group_labels, title="Groups", loc="lower left", bbox_to_anchor=(2, 0.5)
)
fname = out_dir_subsection / "clustermap_ward.png"
# vuecore.savefig(fig, fname, pdf=True, dpi=600, tight_layout=False)
fig.savefig(
out_dir_subsection / "clustermap_ward.png",
bbox_inches="tight",
dpi=300,
)

Analytical Plots#
data distribution (e.g. histogram)
coefficient of variation (CV)
number of identified proteins per sample
# ToDo: bin width functionaity: bins should match between all plots (see pimms)
ax = proteins.T.hist(layout=(2, 4), bins=20, sharex=True, sharey=True, figsize=(8, 4))

Coefficient of Variation (CV)#
CV = standard deviation / mean
per group
df_cvs = (
proteins.groupby(label_suf) # .join(metadata[grouping])
# .agg(scipy.stats.variation)
.agg([scipy.stats.variation, "mean"]) # .rename_axis(["feat", "stat"], axis=1)
)
df_cvs
UniprotID | A5A613 | P00350 | P00363 | P00370 | P00393 | ... | Q57261 | Q59385-2 | Q59385 | Q7DFV3 | Q93K97 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
variation | mean | variation | mean | variation | mean | variation | mean | variation | mean | ... | variation | mean | variation | mean | variation | mean | variation | mean | variation | mean | |
label_suf | |||||||||||||||||||||
10 µm sulforaphane | NaN | NaN | 0.010 | 27.469 | 0.006 | 29.892 | 0.004 | 27.238 | 0.008 | 26.647 | ... | 0.018 | 27.252 | NaN | NaN | 0.004 | 27.680 | 0.008 | 27.759 | 0.015 | 25.867 |
control | NaN | 27.180 | 0.013 | 27.721 | 0.011 | 29.987 | 0.014 | 26.843 | 0.023 | 26.205 | ... | 0.025 | 27.525 | NaN | 23.773 | 0.005 | 27.453 | 0.017 | 27.767 | 0.040 | 25.660 |
2 rows × 4538 columns
df_cvs = df_cvs.stack(0, future_stack=True).reset_index().dropna()
df_cvs
label_suf | UniprotID | variation | mean | |
---|---|---|---|---|
1 | 10 µm sulforaphane | P00350 | 0.010 | 27.469 |
2 | 10 µm sulforaphane | P00363 | 0.006 | 29.892 |
3 | 10 µm sulforaphane | P00370 | 0.004 | 27.238 |
4 | 10 µm sulforaphane | P00393 | 0.008 | 26.647 |
5 | 10 µm sulforaphane | P00448 | 0.004 | 25.405 |
... | ... | ... | ... | ... |
4,532 | control | Q47710 | 0.008 | 26.913 |
4,533 | control | Q57261 | 0.025 | 27.525 |
4,535 | control | Q59385 | 0.005 | 27.453 |
4,536 | control | Q7DFV3 | 0.017 | 27.767 |
4,537 | control | Q93K97 | 0.040 | 25.660 |
3144 rows × 4 columns
default_args = dict(
facet_col="label_suf",
# facet_row="Time",
labels={
"label_suf": "group",
"variation": "CV",
},
)
fig = px.scatter(
data_frame=df_cvs,
x="variation",
y="mean",
trendline="ols",
**default_args,
)
fname = "cv_vs_mean"
# ? save
fig
Hierarchical Clustering of normalized data#
using completely observed data only Checkout the recipe on normalization methods.
normalization_method = "median"
X = acore.normalization.normalize_data(
proteins.dropna(how="any", axis=1), normalization_method
)
X
UniprotID | P00350 | P00363 | P00370 | P00393 | P00448 | P00490 | P00509 | P00550 | P00561 | P00562 | ... | Q46868 | Q46893 | Q46920 | Q46948 | Q47147 | Q47710 | Q57261 | Q59385 | Q7DFV3 | Q93K97 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Reference | |||||||||||||||||||||
DMSO_rep1 | 27.858 | 29.953 | 27.165 | 26.530 | 25.316 | 27.570 | 29.685 | 26.429 | 26.826 | 27.150 | ... | 29.139 | 27.865 | 27.059 | 26.919 | 26.365 | 26.744 | 28.117 | 27.346 | 28.219 | 26.929 |
DMSO_rep2 | 28.064 | 30.399 | 27.011 | 26.894 | 25.039 | 26.576 | 29.185 | 25.603 | 26.598 | 26.720 | ... | 29.868 | 27.814 | 26.840 | 27.330 | 26.516 | 26.978 | 28.078 | 27.381 | 27.758 | 25.429 |
DMSO_rep3 | 27.854 | 30.170 | 26.800 | 25.643 | 25.254 | 26.582 | 28.977 | 27.217 | 26.693 | 27.039 | ... | 29.720 | 28.852 | 27.007 | 27.230 | 26.680 | 26.809 | 27.271 | 27.725 | 27.879 | 24.559 |
DMSO_rep4 | 27.319 | 29.638 | 26.606 | 25.966 | 24.957 | 26.987 | 29.652 | 26.835 | 26.685 | 26.529 | ... | 28.653 | 28.294 | 27.308 | 27.574 | 26.441 | 27.331 | 26.847 | 27.571 | 27.423 | 25.935 |
Suf_rep1 | 27.261 | 29.824 | 27.219 | 26.490 | 25.383 | 27.504 | 29.114 | 26.637 | 26.764 | 27.028 | ... | 29.604 | 28.407 | 26.867 | 27.706 | 26.765 | 26.638 | 27.815 | 27.318 | 27.910 | 25.775 |
Suf_rep2 | 27.029 | 30.084 | 27.187 | 26.884 | 25.375 | 27.529 | 29.281 | 27.030 | 27.099 | 26.015 | ... | 28.981 | 28.579 | 26.920 | 27.505 | 26.613 | 27.266 | 27.053 | 27.665 | 27.523 | 25.228 |
Suf_rep3 | 27.817 | 29.907 | 27.141 | 26.714 | 25.321 | 27.548 | 29.359 | 27.011 | 26.773 | 26.364 | ... | 28.755 | 28.379 | 27.096 | 27.254 | 26.284 | 24.743 | 27.316 | 27.711 | 27.817 | 26.106 |
Suf_rep4 | 27.533 | 29.521 | 27.169 | 26.266 | 25.306 | 27.650 | 29.529 | 26.921 | 26.931 | 26.808 | ... | 27.866 | 28.746 | 27.093 | 27.622 | 26.954 | 27.017 | 26.589 | 27.793 | 27.551 | 26.123 |
8 rows × 1446 columns
X.median(axis="columns")
Reference
DMSO_rep1 27.278
DMSO_rep2 27.278
DMSO_rep3 27.278
DMSO_rep4 27.278
Suf_rep1 27.278
Suf_rep2 27.278
Suf_rep3 27.278
Suf_rep4 27.278
dtype: float64
vuecore.set_font_sizes(7)
cg = sns.clustermap(
X,
method="ward",
row_colors=row_colors,
figsize=(11, 6),
robust=True,
xticklabels=True,
yticklabels=True,
)
fig = cg.figure
cg.ax_heatmap.set_xlabel("Proteins")
cg.ax_heatmap.set_ylabel("Sample ID")
vuecore.select_xticks(cg.ax_heatmap)
handles = [
plt.Line2D([0], [0], marker="o", color="w", markerfacecolor=lut[name], markersize=8)
for name in lut
]
cg.ax_cbar.legend(
handles, _group_labels, title="Groups", loc="lower left", bbox_to_anchor=(2, 0.5)
)
fname = out_dir_subsection / "clustermap_ward.png"
# vuecore.savefig(fig, fname, pdf=True, dpi=600, tight_layout=False)
fig.savefig(
out_dir_subsection / f"clustermap_ward_{normalization_method}.png",
bbox_inches="tight",
dpi=300,
)

Differential Regulation#
out_dir_subsection = out_dir / "2_differential_regulation"
out_dir_subsection.mkdir(parents=True, exist_ok=True)
Retain all proteins with at least 3 observations in each group#
this is a requirement for a standard t-test
you could look into imputation methods to fill in missing values)
protein in at least two samples per group?
missing all in one condition?
Let’s not impute, but filter for proteins with at least 3 observations in each group
group_counts = proteins.groupby(label_suf).count()
group_counts
UniprotID | A5A613 | P00350 | P00363 | P00370 | P00393 | P00448 | P00452 | P00490 | P00509 | P00547 | ... | Q47319 | Q47536 | Q47622 | Q47679 | Q47710 | Q57261 | Q59385-2 | Q59385 | Q7DFV3 | Q93K97 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
label_suf | |||||||||||||||||||||
10 µm sulforaphane | 0 | 4 | 4 | 4 | 4 | 4 | 3 | 4 | 4 | 2 | ... | 1 | 1 | 2 | 2 | 4 | 4 | 0 | 4 | 4 | 4 |
control | 1 | 4 | 4 | 4 | 4 | 4 | 1 | 4 | 4 | 2 | ... | 0 | 0 | 3 | 1 | 4 | 4 | 3 | 4 | 4 | 4 |
2 rows × 2269 columns
Then we can filter the proteins to only those with at least 3 observations in each grou
mask = group_counts.groupby("label_suf").transform(lambda x: x >= 3).all(axis=0)
mask
UniprotID
A5A613 False
P00350 True
P00363 True
P00370 True
P00393 True
...
Q57261 True
Q59385-2 False
Q59385 True
Q7DFV3 True
Q93K97 True
Length: 2269, dtype: bool
view = proteins.loc[:, mask].join(label_suf)
group = "label_suf"
diff_reg = acore.differential_regulation.run_anova(
view,
alpha=0.15,
drop_cols=[],
subject=None,
group=group,
).sort_values("pvalue", ascending=True)
diff_reg["rejected"] = diff_reg["rejected"].astype(bool)
diff_reg.sort_values("pvalue")
identifier | T-statistics | pvalue | mean_group1 | mean_group2 | std(group1) | std(group2) | log2FC | test | correction | padj | rejected | group1 | group2 | FC | -log10 pvalue | Method | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
957 | P10384 | -11.514 | 0.000 | 26.430 | 27.475 | 0.131 | 0.088 | -1.045 | t-Test | FDR correction BH | 0.044 | True | control | 10 µm sulforaphane | 0.485 | 4.589 | Unpaired t-test |
1,678 | Q46835 | -8.146 | 0.000 | 26.723 | 27.677 | 0.194 | 0.058 | -0.954 | t-Test | FDR correction BH | 0.127 | True | control | 10 µm sulforaphane | 0.516 | 3.735 | Unpaired t-test |
120 | P08200 | -7.471 | 0.000 | 28.759 | 29.455 | 0.072 | 0.145 | -0.696 | t-Test | FDR correction BH | 0.127 | True | control | 10 µm sulforaphane | 0.617 | 3.527 | Unpaired t-test |
1,542 | P75726 | 7.460 | 0.000 | 28.082 | 26.852 | 0.222 | 0.180 | 1.230 | t-Test | FDR correction BH | 0.127 | True | control | 10 µm sulforaphane | 2.345 | 3.524 | Unpaired t-test |
38 | P02943 | -7.006 | 0.000 | 26.117 | 27.920 | 0.428 | 0.124 | -1.802 | t-Test | FDR correction BH | 0.143 | True | control | 10 µm sulforaphane | 0.287 | 3.375 | Unpaired t-test |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
983 | P14407 | -0.003 | 0.998 | 28.561 | 28.562 | 0.393 | 0.656 | -0.001 | t-Test | FDR correction BH | 0.999 | False | control | 10 µm sulforaphane | 0.999 | 0.001 | Unpaired t-test |
560 | P0ABB0 | 0.002 | 0.998 | 31.306 | 31.305 | 0.318 | 0.051 | 0.000 | t-Test | FDR correction BH | 0.999 | False | control | 10 µm sulforaphane | 1.000 | 0.001 | Unpaired t-test |
1,531 | P69797 | 0.002 | 0.998 | 30.255 | 30.254 | 0.391 | 0.328 | 0.001 | t-Test | FDR correction BH | 0.999 | False | control | 10 µm sulforaphane | 1.000 | 0.001 | Unpaired t-test |
452 | P0A9L3 | -0.002 | 0.998 | 30.452 | 30.453 | 0.433 | 0.106 | -0.001 | t-Test | FDR correction BH | 0.999 | False | control | 10 µm sulforaphane | 1.000 | 0.001 | Unpaired t-test |
1,059 | P23538 | -0.002 | 0.999 | 26.283 | 26.284 | 0.109 | 0.192 | -0.000 | t-Test | FDR correction BH | 0.999 | False | control | 10 µm sulforaphane | 1.000 | 0.001 | Unpaired t-test |
1696 rows × 17 columns
diff_reg.sort_values("pvalue").head(20)
identifier | T-statistics | pvalue | mean_group1 | mean_group2 | std(group1) | std(group2) | log2FC | test | correction | padj | rejected | group1 | group2 | FC | -log10 pvalue | Method | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
957 | P10384 | -11.514 | 0.000 | 26.430 | 27.475 | 0.131 | 0.088 | -1.045 | t-Test | FDR correction BH | 0.044 | True | control | 10 µm sulforaphane | 0.485 | 4.589 | Unpaired t-test |
1,678 | Q46835 | -8.146 | 0.000 | 26.723 | 27.677 | 0.194 | 0.058 | -0.954 | t-Test | FDR correction BH | 0.127 | True | control | 10 µm sulforaphane | 0.516 | 3.735 | Unpaired t-test |
120 | P08200 | -7.471 | 0.000 | 28.759 | 29.455 | 0.072 | 0.145 | -0.696 | t-Test | FDR correction BH | 0.127 | True | control | 10 µm sulforaphane | 0.617 | 3.527 | Unpaired t-test |
1,542 | P75726 | 7.460 | 0.000 | 28.082 | 26.852 | 0.222 | 0.180 | 1.230 | t-Test | FDR correction BH | 0.127 | True | control | 10 µm sulforaphane | 2.345 | 3.524 | Unpaired t-test |
38 | P02943 | -7.006 | 0.000 | 26.117 | 27.920 | 0.428 | 0.124 | -1.802 | t-Test | FDR correction BH | 0.143 | True | control | 10 µm sulforaphane | 0.287 | 3.375 | Unpaired t-test |
1,455 | P62768 | -6.288 | 0.001 | 28.999 | 29.682 | 0.147 | 0.118 | -0.682 | t-Test | FDR correction BH | 0.213 | False | control | 10 µm sulforaphane | 0.623 | 3.123 | Unpaired t-test |
1,508 | P68066 | -5.856 | 0.001 | 31.551 | 32.216 | 0.101 | 0.169 | -0.665 | t-Test | FDR correction BH | 0.235 | False | control | 10 µm sulforaphane | 0.630 | 2.960 | Unpaired t-test |
579 | P0ABK5 | -5.841 | 0.001 | 29.630 | 30.524 | 0.246 | 0.100 | -0.894 | t-Test | FDR correction BH | 0.235 | False | control | 10 µm sulforaphane | 0.538 | 2.955 | Unpaired t-test |
1,099 | P25437 | -5.304 | 0.002 | 29.088 | 29.744 | 0.183 | 0.112 | -0.657 | t-Test | FDR correction BH | 0.324 | False | control | 10 µm sulforaphane | 0.634 | 2.739 | Unpaired t-test |
1,108 | P25539 | -5.230 | 0.002 | 26.862 | 27.166 | 0.061 | 0.080 | -0.304 | t-Test | FDR correction BH | 0.324 | False | control | 10 µm sulforaphane | 0.810 | 2.708 | Unpaired t-test |
553 | P0AB80 | -5.149 | 0.002 | 28.177 | 28.853 | 0.205 | 0.098 | -0.675 | t-Test | FDR correction BH | 0.324 | False | control | 10 µm sulforaphane | 0.626 | 2.674 | Unpaired t-test |
1,382 | P45577 | -5.069 | 0.002 | 29.028 | 29.467 | 0.049 | 0.142 | -0.439 | t-Test | FDR correction BH | 0.324 | False | control | 10 µm sulforaphane | 0.738 | 2.640 | Unpaired t-test |
950 | P0CK95 | -4.969 | 0.003 | 26.408 | 27.282 | 0.295 | 0.076 | -0.874 | t-Test | FDR correction BH | 0.330 | False | control | 10 µm sulforaphane | 0.546 | 2.597 | Unpaired t-test |
610 | P0AC33 | -4.725 | 0.003 | 26.764 | 27.356 | 0.206 | 0.069 | -0.592 | t-Test | FDR correction BH | 0.364 | False | control | 10 µm sulforaphane | 0.663 | 2.489 | Unpaired t-test |
80 | P06721 | -4.719 | 0.003 | 25.529 | 26.244 | 0.198 | 0.172 | -0.715 | t-Test | FDR correction BH | 0.364 | False | control | 10 µm sulforaphane | 0.609 | 2.487 | Unpaired t-test |
1,268 | P36943 | 6.160 | 0.004 | NaN | 28.375 | NaN | 0.133 | NaN | t-Test | FDR correction BH | 0.364 | False | control | 10 µm sulforaphane | NaN | 2.397 | Unpaired t-test |
812 | P0AEZ9 | -4.510 | 0.004 | 28.393 | 28.921 | 0.178 | 0.098 | -0.528 | t-Test | FDR correction BH | 0.364 | False | control | 10 µm sulforaphane | 0.693 | 2.392 | Unpaired t-test |
873 | P0AFU8 | -4.479 | 0.004 | 28.742 | 29.429 | 0.230 | 0.132 | -0.686 | t-Test | FDR correction BH | 0.364 | False | control | 10 µm sulforaphane | 0.621 | 2.377 | Unpaired t-test |
491 | P0AA39 | -4.438 | 0.004 | 25.367 | 26.097 | 0.271 | 0.087 | -0.730 | t-Test | FDR correction BH | 0.364 | False | control | 10 µm sulforaphane | 0.603 | 2.358 | Unpaired t-test |
1,057 | P23331 | -4.404 | 0.005 | 27.976 | 28.253 | 0.077 | 0.078 | -0.277 | t-Test | FDR correction BH | 0.364 | False | control | 10 µm sulforaphane | 0.825 | 2.342 | Unpaired t-test |
diff_reg.plot(x="log2FC", y="-log10 pvalue", kind="scatter", title=group)
<Axes: title={'center': 'label_suf'}, xlabel='log2FC', ylabel='-log10 pvalue'>

Interactive Volcano Plot#
str_cols = diff_reg.dtypes[diff_reg.dtypes == "object"].index.tolist()
hover_data = {
"rejected": ":.0f",
**{
c: ":.4f"
for c in [
"padj",
"FC",
]
},
**{c: True for c in str_cols},
}
fig = px.scatter(
diff_reg,
x="log2FC",
y="-log10 pvalue",
color="rejected",
hover_data=hover_data,
width=1200,
height=800,
title=f"Volcano plot for {view_name}s",
)
fig
Save result to subsection folder
fig.write_json(
out_dir_subsection / "0_volcano_plot.json",
pretty=False,
)
diff_reg.to_csv(out_dir_subsection / "1_differential_regulation.csv")
Enrichment Analysis#
out_dir_subsection = out_dir / "uniprot_annotations"
out_dir_subsection.mkdir(parents=True, exist_ok=True)
Fetch the annotations from UniProt API.#
fname_annotations = out_dir_subsection / "annotations.csv"
try:
annotations = pd.read_csv(fname_annotations, index_col=0)
print(f"Loaded annotations from {fname_annotations}")
except FileNotFoundError:
print(f"Fetching annotations for {proteins.columns.size} UniProt IDs.")
FIELDS = "go_p,go_c,go_f"
annotations = fetch_annotations(proteins.columns, fields=FIELDS)
annotations = process_annotations(annotations, fields=FIELDS)
# cache the annotations
fname_annotations.parent.mkdir(exist_ok=True, parents=True)
annotations.to_csv(fname_annotations, index=True)
annotations
Fetching annotations for 2269 UniProt IDs.
Fetched: 500 / 2269
Fetched: 1000 / 2269
Fetched: 1500 / 2269
Fetched: 2000 / 2269
Fetched: 2269 / 2269
identifier | source | annotation | |
---|---|---|---|
3 | P00350 | Gene Ontology (biological process) | D-gluconate metabolic process [GO:0019521] |
4 | P00350 | Gene Ontology (cellular component) | cytosol [GO:0005829] |
5 | P00350 | Gene Ontology (molecular function) | guanosine tetraphosphate binding [GO:0097216] |
6 | P00350 | Gene Ontology (molecular function) | identical protein binding [GO:0042802] |
7 | P00350 | Gene Ontology (molecular function) | NADP binding [GO:0050661] |
... | ... | ... | ... |
15,989 | Q93K97 | Gene Ontology (molecular function) | magnesium ion binding [GO:0000287] |
15,990 | Q93K97 | Gene Ontology (molecular function) | protein homodimerization activity [GO:0042803] |
15,991 | Q93K97 | Gene Ontology (molecular function) | pyrophosphatase activity [GO:0016462] |
15,992 | Q93K97 | Gene Ontology (biological process) | response to heat [GO:0009408] |
15,993 | Q93K97 | Gene Ontology (biological process) | ribose phosphate metabolic process [GO:0019693] |
14994 rows × 3 columns
Run the enrichment analysis#
background is the set of identified proteins in the experiment (not the whole proteome of the organisim, here E. coli)
The enrichment is performed separately for the up- and down-regulated proteins (‘rejected’), which are few in our example where we had to set the adjusted p-value to 0.15.
In the enrichment we set the cutoff for the adjusted p-value to 0.2, which is a bit arbitrary to see some results.
enriched = acore.enrichment_analysis.run_up_down_regulation_enrichment(
regulation_data=diff_reg,
annotation=annotations,
min_detected_in_set=1,
lfc_cutoff=1,
pval_col="padj", # toggle if it does not work
correction_alpha=0.2, # adjust the p-value to see more or less results
)
enriched
terms | identifiers | foreground | background | foreground_pop | background_pop | pvalue | padj | rejected | direction | comparison | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | citrate CoA-transferase activity [GO:0008814] | P75726 | 1 | 0 | 1 | 1,696 | 0.001 | 0.002 | True | upregulated | control~10 µm sulforaphane |
1 | citrate metabolic process [GO:0006101] | P75726 | 1 | 2 | 1 | 1,696 | 0.002 | 0.002 | True | upregulated | control~10 µm sulforaphane |
3 | ATP-independent citrate lyase complex [GO:0009... | P75726 | 1 | 2 | 1 | 1,696 | 0.002 | 0.002 | True | upregulated | control~10 µm sulforaphane |
4 | acetyl-CoA metabolic process [GO:0006084] | P75726 | 1 | 2 | 1 | 1,696 | 0.002 | 0.002 | True | upregulated | control~10 µm sulforaphane |
5 | citrate (pro-3S)-lyase activity [GO:0008815] | P75726 | 1 | 1 | 1 | 1,696 | 0.001 | 0.002 | True | upregulated | control~10 µm sulforaphane |
2 | cytoplasm [GO:0005737] | P75726 | 1 | 29 | 1 | 1,696 | 0.018 | 0.018 | True | upregulated | control~10 µm sulforaphane |
0 | long-chain fatty acid transporting porin acti... | P10384 | 1 | 0 | 2 | 1,696 | 0.001 | 0.002 | True | downregulated | control~10 µm sulforaphane |
2 | maltodextrin transmembrane transporter activi... | P02943 | 1 | 0 | 2 | 1,696 | 0.001 | 0.002 | True | downregulated | control~10 µm sulforaphane |
3 | maltose transmembrane transport [GO:1904981] | P02943 | 1 | 0 | 2 | 1,696 | 0.001 | 0.002 | True | downregulated | control~10 µm sulforaphane |
4 | maltose transmembrane transporter activity [G... | P02943 | 1 | 0 | 2 | 1,696 | 0.001 | 0.002 | True | downregulated | control~10 µm sulforaphane |
5 | maltose transporting porin activity [GO:0015481] | P02943 | 1 | 0 | 2 | 1,696 | 0.001 | 0.002 | True | downregulated | control~10 µm sulforaphane |
14 | cell outer membrane [GO:0009279] | P02943,P10384 | 2 | 43 | 2 | 1,696 | 0.001 | 0.002 | True | downregulated | control~10 µm sulforaphane |
15 | ligand-gated channel activity [GO:0022834] | P10384 | 1 | 0 | 2 | 1,696 | 0.001 | 0.002 | True | downregulated | control~10 µm sulforaphane |
8 | polysaccharide transport [GO:0015774] | P02943 | 1 | 0 | 2 | 1,696 | 0.001 | 0.002 | True | downregulated | control~10 µm sulforaphane |
16 | long-chain fatty acid transport [GO:0015909] | P10384 | 1 | 0 | 2 | 1,696 | 0.001 | 0.002 | True | downregulated | control~10 µm sulforaphane |
13 | carbohydrate transmembrane transporter activit... | P02943 | 1 | 1 | 2 | 1,696 | 0.002 | 0.003 | True | downregulated | control~10 µm sulforaphane |
1 | maltodextrin transmembrane transport [GO:0042... | P02943 | 1 | 1 | 2 | 1,696 | 0.002 | 0.003 | True | downregulated | control~10 µm sulforaphane |
11 | virus receptor activity [GO:0001618] | P02943 | 1 | 1 | 2 | 1,696 | 0.002 | 0.003 | True | downregulated | control~10 µm sulforaphane |
7 | outer membrane protein complex [GO:0106234] | P02943 | 1 | 2 | 2 | 1,696 | 0.004 | 0.004 | True | downregulated | control~10 µm sulforaphane |
6 | monoatomic ion transport [GO:0006811] | P02943 | 1 | 2 | 2 | 1,696 | 0.004 | 0.004 | True | downregulated | control~10 µm sulforaphane |
10 | porin activity [GO:0015288] | P02943 | 1 | 5 | 2 | 1,696 | 0.007 | 0.008 | True | downregulated | control~10 µm sulforaphane |
9 | pore complex [GO:0046930] | P02943 | 1 | 5 | 2 | 1,696 | 0.007 | 0.008 | True | downregulated | control~10 µm sulforaphane |
12 | DNA damage response [GO:0006974] | P02943 | 1 | 51 | 2 | 1,696 | 0.060 | 0.060 | True | downregulated | control~10 µm sulforaphane |
fig = get_enrichment_plots(
enriched,
identifier="anything", # ToDo: figure out what this does
args=dict(title="Enrichment Analysis"),
)
fig = fig[0]
fig.write_json(
out_dir_subsection / "enrichment_analysis.json",
pretty=True,
)
fig
Check for Maltose Uptake#
out_dir_subsection = out_dir / "3_maltose_uptake"
out_dir_subsection.mkdir(parents=True, exist_ok=True)
apply filtering of ‘differentially abundant proteins’ as described in the paper
Differentially abundant proteins were determined as those with log2 fold-change
1 and < -1, and p < 0.05 This means not multiple testing correction was applied.
view = diff_reg.query("pvalue < 0.05 and FC > 1") # .shape[0]
view.to_csv(
out_dir_subsection / "1_differently_regulated_as_in_paper.csv",
index=False,
)
view
identifier | T-statistics | pvalue | mean_group1 | mean_group2 | std(group1) | std(group2) | log2FC | test | correction | padj | rejected | group1 | group2 | FC | -log10 pvalue | Method | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1,542 | P75726 | 7.460 | 0.000 | 28.082 | 26.852 | 0.222 | 0.180 | 1.230 | t-Test | FDR correction BH | 0.127 | True | control | 10 µm sulforaphane | 2.345 | 3.524 | Unpaired t-test |
1,553 | P75825 | 4.344 | 0.005 | 26.903 | 25.378 | 0.327 | 0.512 | 1.525 | t-Test | FDR correction BH | 0.364 | False | control | 10 µm sulforaphane | 2.877 | 2.314 | Unpaired t-test |
580 | P0ABK9 | 4.238 | 0.005 | 26.363 | 25.637 | 0.230 | 0.187 | 0.726 | t-Test | FDR correction BH | 0.364 | False | control | 10 µm sulforaphane | 1.654 | 2.264 | Unpaired t-test |
1,288 | P37349 | 3.868 | 0.008 | 25.935 | 25.251 | 0.296 | 0.079 | 0.683 | t-Test | FDR correction BH | 0.438 | False | control | 10 µm sulforaphane | 1.606 | 2.082 | Unpaired t-test |
1,610 | P76440 | 3.796 | 0.009 | 27.182 | 26.639 | 0.233 | 0.084 | 0.543 | t-Test | FDR correction BH | 0.438 | False | control | 10 µm sulforaphane | 1.457 | 2.045 | Unpaired t-test |
961 | P11349 | 3.722 | 0.010 | 27.459 | 26.939 | 0.223 | 0.094 | 0.521 | t-Test | FDR correction BH | 0.445 | False | control | 10 µm sulforaphane | 1.435 | 2.007 | Unpaired t-test |
1,632 | P77252 | 3.624 | 0.011 | 26.540 | 26.012 | 0.131 | 0.215 | 0.528 | t-Test | FDR correction BH | 0.445 | False | control | 10 µm sulforaphane | 1.441 | 1.957 | Unpaired t-test |
142 | P09152 | 3.542 | 0.012 | 26.822 | 26.250 | 0.228 | 0.162 | 0.572 | t-Test | FDR correction BH | 0.445 | False | control | 10 µm sulforaphane | 1.487 | 1.914 | Unpaired t-test |
1,098 | P25397 | 3.191 | 0.019 | 27.632 | 27.003 | 0.177 | 0.292 | 0.629 | t-Test | FDR correction BH | 0.577 | False | control | 10 µm sulforaphane | 1.546 | 1.726 | Unpaired t-test |
261 | P0A7E3 | 3.181 | 0.019 | 27.335 | 26.631 | 0.146 | 0.355 | 0.704 | t-Test | FDR correction BH | 0.577 | False | control | 10 µm sulforaphane | 1.629 | 1.720 | Unpaired t-test |
64 | P05637 | 2.910 | 0.027 | 26.621 | 26.034 | 0.262 | 0.232 | 0.587 | t-Test | FDR correction BH | 0.589 | False | control | 10 µm sulforaphane | 1.503 | 1.569 | Unpaired t-test |
443 | P0A9I1 | 2.907 | 0.027 | 28.099 | 26.997 | 0.555 | 0.351 | 1.102 | t-Test | FDR correction BH | 0.589 | False | control | 10 µm sulforaphane | 2.147 | 1.567 | Unpaired t-test |
1,535 | P69829 | 2.725 | 0.034 | 26.906 | 26.284 | 0.058 | 0.391 | 0.621 | t-Test | FDR correction BH | 0.614 | False | control | 10 µm sulforaphane | 1.538 | 1.463 | Unpaired t-test |
1,190 | P31142 | 2.582 | 0.042 | 28.110 | 27.149 | 0.203 | 0.612 | 0.961 | t-Test | FDR correction BH | 0.647 | False | control | 10 µm sulforaphane | 1.947 | 1.380 | Unpaired t-test |
1,094 | P24232 | 2.577 | 0.042 | 27.136 | 26.324 | 0.530 | 0.129 | 0.811 | t-Test | FDR correction BH | 0.647 | False | control | 10 µm sulforaphane | 1.755 | 1.377 | Unpaired t-test |
1,275 | P37127 | 2.483 | 0.048 | 25.203 | 24.641 | 0.226 | 0.320 | 0.561 | t-Test | FDR correction BH | 0.680 | False | control | 10 µm sulforaphane | 1.476 | 1.322 | Unpaired t-test |
Let’s find the proteins highlighted in the volcano plot in Figure 3.
highlighted_genes = ["LamB", "MalE", "Malk", "CitF", "CitT", "CitE", "Frd"]
highlighted_genes = "|".join([p.upper() for p in highlighted_genes])
highlighted_genes = proteins_meta.query(
f"`GeneName`.str.contains('{highlighted_genes}')"
)
highlighted_genes
Source | ProteinName | GeneName | |
---|---|---|---|
identifier | |||
sp|P00363|FRDA_ECOLI | sp | P00363 | FRDA |
sp|P02943|LAMB_ECOLI | sp | P02943 | LAMB |
sp|P0A8Q0|FRDC_ECOLI | sp | P0A8Q0 | FRDC |
sp|P0A8Q3|FRDD_ECOLI | sp | P0A8Q3 | FRDD |
sp|P0A9I1|CITE_ECOLI | sp | P0A9I1 | CITE |
sp|P0AC47|FRDB_ECOLI | sp | P0AC47 | FRDB |
sp|P0AE74|CITT_ECOLI | sp | P0AE74 | CITT |
sp|P0AEX9|MALE_ECOLI | sp | P0AEX9 | MALE |
sp|P68187|MALK_ECOLI | sp | P68187 | MALK |
highlighted_proteins = "|".join([p.upper() for p in highlighted_genes["ProteinName"]])
view = diff_reg.query(f"`identifier`.str.contains('{highlighted_proteins}')")
view = view.set_index("identifier").join(proteins_meta.set_index("ProteinName"))
view.to_csv(
out_dir_subsection / "2_highlighted_proteins_in_figure3.csv",
index=False,
)
sel_cols = [
"identifier",
"GeneName",
"log2FC",
"pvalue",
"padj",
"rejected",
"group1",
"group2",
"Method",
]
view.reset_index()[sel_cols].sort_values("log2FC", ascending=False)
identifier | GeneName | log2FC | pvalue | padj | rejected | group1 | group2 | Method | |
---|---|---|---|---|---|---|---|---|---|
1 | P0A9I1 | CITE | 1.102 | 0.027 | 0.589 | False | control | 10 µm sulforaphane | Unpaired t-test |
2 | P0AE74 | CITT | 0.872 | 0.110 | 0.754 | False | control | 10 µm sulforaphane | Unpaired t-test |
3 | P0A8Q0 | FRDC | 0.815 | 0.159 | 0.756 | False | control | 10 µm sulforaphane | Unpaired t-test |
4 | P00363 | FRDA | 0.095 | 0.676 | 0.965 | False | control | 10 µm sulforaphane | Unpaired t-test |
5 | P0AC47 | FRDB | 0.062 | 0.808 | 0.981 | False | control | 10 µm sulforaphane | Unpaired t-test |
6 | P0AEX9 | MALE | 0.012 | 0.959 | 0.997 | False | control | 10 µm sulforaphane | Unpaired t-test |
0 | P02943 | LAMB | -1.802 | 0.000 | 0.143 | True | control | 10 µm sulforaphane | Unpaired t-test |
Let’s see their original data
view_proteins = (
(
proteins[highlighted_genes["ProteinName"].to_list()].T.join(
proteins_meta.set_index("ProteinName")["GeneName"]
)
)
.set_index("GeneName", append=True)
.T
) # to check]
view_proteins.to_csv(
out_dir_subsection / "3_highlighted_proteins_in_figure3_intensities.csv",
index=True,
)
view_proteins
UniprotID | P00363 | P02943 | P0A8Q0 | P0A8Q3 | P0A9I1 | P0AC47 | P0AE74 | P0AEX9 | P68187 |
---|---|---|---|---|---|---|---|---|---|
GeneName | FRDA | LAMB | FRDC | FRDD | CITE | FRDB | CITT | MALE | MALK |
DMSO_rep1 | 30.247 | 26.752 | 30.348 | 27.389 | 29.031 | 31.223 | 27.544 | 27.099 | NaN |
DMSO_rep2 | 30.262 | 26.012 | 28.815 | 27.061 | 27.933 | 30.947 | 26.364 | 26.785 | 24.881 |
DMSO_rep3 | 29.970 | 26.151 | 30.569 | 27.291 | 27.569 | 30.664 | 25.292 | 26.854 | NaN |
DMSO_rep4 | 29.471 | 25.555 | 29.097 | 24.906 | 27.865 | 30.815 | 26.608 | 27.037 | 25.423 |
Suf_rep1 | 30.005 | 27.844 | 28.350 | 26.703 | 27.296 | 31.360 | 25.485 | 26.518 | 26.047 |
Suf_rep2 | 30.086 | 28.095 | 29.444 | NaN | 26.744 | 30.317 | 25.691 | 26.864 | 24.964 |
Suf_rep3 | 29.904 | 27.971 | 29.172 | NaN | 26.563 | 30.818 | 25.549 | 27.536 | 25.545 |
Suf_rep4 | 29.575 | 27.769 | 28.602 | 25.294 | 27.386 | 30.907 | 25.594 | 26.809 | 26.134 |
How to explain the differences?