proteometer.abundance#

Functions#

get_prot_abund_scalars(→ dict[str, float])

Return a dictionary of protein abundance scalars for the given pairwise t-test.

prot_abund_correction(→ pandas.DataFrame)

Perform protein abundance correction based on the provided parameters.

prot_abund_correction_sig_only(→ pandas.DataFrame)

Adjusts peptide abundance values based on protein abundance scalars for

prot_abund_correction_matched(→ pandas.DataFrame)

Correct the peptide abundance data using the protein abundance values.

global_prot_normalization_and_stats(→ pandas.DataFrame)

Perform global protein normalization and statistical analysis.

Module Contents#

proteometer.abundance.get_prot_abund_scalars(prot: pandas.DataFrame, pairwise_ttest_name: str, sig_type: str = 'pval', sig_thr: float = 0.05) dict[str, float][source]#

Return a dictionary of protein abundance scalars for the given pairwise t-test.

Parameters:
  • prot (pd.DataFrame) – DataFrame containing protein-level data.

  • pairwise_ttest_name (str) – Name of the pairwise t-test.

  • sig_type (str, optional) – Type of significance metric to use for filtering. Defaults to “pval”.

  • sig_thr (float, optional) – Threshold for significance filtering. Defaults to 0.05.

Returns:

Dictionary of protein abundance scalars.

Return type:

dict[str, float]

proteometer.abundance.prot_abund_correction(pept: pandas.DataFrame, prot: pandas.DataFrame, par: proteometer.params.Params, columns_to_correct: collections.abc.Iterable[str] | None = None, pairwise_ttest_groups: collections.abc.Iterable[proteometer.stats.TTestGroup] | None = None, non_tt_cols: collections.abc.Iterable[str] | None = None) pandas.DataFrame[source]#

Perform protein abundance correction based on the provided parameters.

This function applies either paired or unpaired sample abundance correction depending on the abundance_correction_paired_samples attribute of the par parameter.

Parameters:
  • pept (pd.DataFrame) – A DataFrame containing peptide-level data.

  • prot (pd.DataFrame) – A DataFrame containing protein-level data.

  • par (Params) – A parameter object containing configuration for abundance correction.

  • columns_to_correct (Iterable[str] | None, optional) – Columns to correct for paired sample abundance correction. Required if par.abundance_correction_paired_samples is True.

  • pairwise_ttest_groups (Iterable[stats.TTestGroup] | None, optional) – Groups for pairwise t-tests in unpaired sample abundance correction. Required if par.abundance_correction_paired_samples is False.

  • non_tt_cols (Iterable[str] | None, optional) – Columns that should not be included in the t-test correction.

Returns:

A DataFrame with corrected protein abundances.

Return type:

pd.DataFrame

Raises:
  • ValueError – If columns_to_correct is not provided for paired sample correction.

  • ValueError – If pairwise_ttest_groups is not provided for unpaired sample correction.

proteometer.abundance.prot_abund_correction_sig_only(pept: pandas.DataFrame, prot: pandas.DataFrame, pairwise_ttest_groups: collections.abc.Iterable[proteometer.stats.TTestGroup], uniprot_col: str, sig_type: str = 'pval', sig_thr: float = 0.05) pandas.DataFrame[source]#

Adjusts peptide abundance values based on protein abundance scalars for significant pairwise t-test groups.

This function iterates over a collection of pairwise t-test groups, computes or retrieves protein abundance scalars, and applies these scalars to adjust the peptide abundance values for the specified treatment samples.

Parameters:
  • pept (pd.DataFrame) – DataFrame containing peptide-level data. Must include a column corresponding to uniprot_col for mapping protein identifiers.

  • prot (pd.DataFrame) – DataFrame containing protein-level data. Must include columns for protein abundance scalars or data required to compute them.

  • pairwise_ttest_groups (Iterable[stats.TTestGroup]) – An iterable of TTestGroup objects, each representing a pairwise t-test group with associated metadata (e.g., labels and treatment samples).

  • uniprot_col (str) – Column name in pept that contains UniProt identifiers for mapping to protein abundance data.

  • sig_type (str, optional) – Type of significance metric to use for filtering (e.g., “pval” for p-value or “adj-p” for adjusted p-value). Defaults to “pval”.

  • sig_thr (float, optional) – Threshold for significance filtering. Only proteins meeting this threshold will have their abundance scalars applied. Defaults to 0.05.

Returns:

Updated pept DataFrame with adjusted abundance values for

treatment samples and additional columns for protein abundance scalars.

Return type:

pd.DataFrame

proteometer.abundance.prot_abund_correction_matched(pept: pandas.DataFrame, prot: pandas.DataFrame, columns_to_correct: collections.abc.Iterable[str], uniprot_col: str, non_tt_cols: collections.abc.Iterable[str] | None = None) pandas.DataFrame[source]#

Correct the peptide abundance data using the protein abundance values.

This function takes the peptide data and corrects the intensity values for each peptide using the protein abundance values from the protein data. The correction is only applied to the treatment samples.

Parameters:
  • pept (pd.DataFrame) – A DataFrame containing peptide-level data.

  • prot (pd.DataFrame) – A DataFrame containing protein-level data.

  • columns_to_correct (Iterable[str]) – Columns to correct for protein abundance changes. Must be shared by pept and prot.

  • uniprot_col (str) – Column name for the Uniprot ID in both pept and prot.

  • non_tt_cols (Iterable[str] | None, optional) – Columns that should not be included in the abundance correction. Must be shared by pept and prot.

Returns:

Updated pept DataFrame with adjusted abundance values

for treatment samples and additional columns for protein abundance scalars.

Return type:

pd.DataFrame

proteometer.abundance.global_prot_normalization_and_stats(global_prot: pandas.DataFrame, int_cols: list[str], anova_cols: list[str], pairwise_ttest_groups: collections.abc.Iterable[proteometer.stats.TTestGroup], metadata: pandas.DataFrame, par: proteometer.params.Params) pandas.DataFrame[source]#

Perform global protein normalization and statistical analysis.

This function applies normalization and statistical tests to global proteomics data. It handles both median normalization and batch correction, depending on the parameters provided in the par object. It also performs ANOVA and pairwise t-tests.

Parameters:
  • global_prot (pd.DataFrame) – DataFrame containing global protein-level data.

  • int_cols (list[str]) – List of column names representing intensity data to normalize.

  • anova_cols (list[str]) – List of column names for ANOVA analysis.

  • pairwise_ttest_groups (Iterable[stats.TTestGroup]) – Iterable of TTestGroup objects for performing pairwise t-tests (each defines a control-treatment pair).

  • metadata (pd.DataFrame) – DataFrame containing metadata for batch correction and ANOVA analysis.

  • par (Params) – Parameter object containing configuration for normalization and statistical analysis.

Returns:

The normalized and statistically analyzed global protein data.

Return type:

pd.DataFrame