proteometer.lip#

Attributes#

AggDictFloat

Functions#

`filter_contaminants_reverse_pept`(→ pandas.DataFrame)	Filters out contaminants and reverse hits from a peptide DataFrame.
`filter_contaminants_reverse_prot`(→ pandas.DataFrame)	Filters out contaminants and reverse hits from a protein DataFrame.
`filtering_protein_based_on_peptide_number`(...)	Filters proteins based on the minimum number of peptides.
`get_clean_peptides`(→ pandas.DataFrame)	Cleans peptide sequences by removing modifications and returns a DataFrame with cleaned peptides.
`get_tryptic_types`(→ pandas.DataFrame)	Analyzes the tryptic pattern of peptides and classifies them as tryptic, semi-tryptic, or non-tryptic.
`select_tryptic_pattern`(→ pandas.DataFrame)	Selects peptides based on their digestion pattern.
`analyze_tryptic_pattern`(→ pandas.DataFrame)	Analyzes tryptic patterns and calculates statistics for peptides.
`rollup_to_lytic_site`(→ pandas.DataFrame)	Converts the double-peptide data frame to a site-level data frame.
`rollup_single_protein_to_lytic_site`(→ pandas.DataFrame)	Rolls up peptide-level limited proteolysis data to lytic sites.
`select_lytic_sites`(→ pandas.DataFrame)	Selects lytic sites based on the specified site type.
`delta_prok_site`(→ pandas.DataFrame)	Computes exposure values for each lytic (ProK) site.

Module Contents#

proteometer.lip.AggDictFloat[source]#

proteometer.lip.filter_contaminants_reverse_pept(df: pandas.DataFrame, search_tool: Literal['maxquant', 'msfragger', 'fragpipe'], protein_id_col_pept: str, uniprot_col: str) → pandas.DataFrame[source]#

Filters out contaminants and reverse hits from a peptide DataFrame.

Parameters:

df (pd.DataFrame) – Input DataFrame containing peptide data.
search_tool (Literal["maxquant", "msfragger", "fragpipe"]) – The search tool used for data generation.
protein_id_col_pept (str) – Column name containing protein IDs in the peptide DataFrame.
uniprot_col (str) – Column name to store UniProt IDs.

Returns:

Filtered DataFrame with contaminants and reverse hits removed.

Return type:

pd.DataFrame

proteometer.lip.filter_contaminants_reverse_prot(df: pandas.DataFrame, search_tool: Literal['maxquant', 'msfragger', 'fragpipe'], protein_id_col_prot: str, uniprot_col: str) → pandas.DataFrame[source]#

Filters out contaminants and reverse hits from a protein DataFrame.

Parameters:

df (pd.DataFrame) – Input DataFrame containing protein data.
search_tool (Literal["maxquant", "msfragger", "fragpipe"]) – The search tool used for data generation.
protein_id_col_prot (str) – Column name containing protein IDs in the protein DataFrame.
uniprot_col (str) – Column name to store UniProt IDs.

Returns:

Filtered DataFrame with contaminants and reverse hits removed.

Return type:

pd.DataFrame

proteometer.lip.filtering_protein_based_on_peptide_number(df2filter: pandas.DataFrame, peptide_counts_col: str, search_tool: Literal['maxquant', 'msfragger', 'fragpipe'], min_pept_count: int = 2) → pandas.DataFrame[source]#

Filters proteins based on the minimum number of peptides.

Parameters:

df2filter (pd.DataFrame) – Input DataFrame containing proteomics data.
peptide_counts_col (str) – Column name containing peptide counts.
search_tool (Literal["maxquant", "msfragger", "fragpipe"]) – The search tool used for data generation.
min_pept_count (int, optional) – Minimum number of peptides required. Defaults to 2.

Returns:

Filtered DataFrame with proteins having at least min_pept_count peptides.

Return type:

pd.DataFrame

proteometer.lip.get_clean_peptides(pept_df: pandas.DataFrame, peptide_col: str, clean_pept_col: str = 'clean_pept') → pandas.DataFrame[source]#

Cleans peptide sequences by removing modifications and returns a DataFrame with cleaned peptides.

Parameters:

pept_df (pd.DataFrame) – Input DataFrame containing peptide data.
peptide_col (str) – Column name containing peptide sequences.
clean_pept_col (str, optional) – Column name to store cleaned peptide sequences. Defaults to “clean_pept”.

Returns:

DataFrame with an additional column for cleaned peptide sequences.

Return type:

pd.DataFrame

proteometer.lip.get_tryptic_types(pept_df: pandas.DataFrame, prot_seq: str, peptide_col: str, clean_pept_col: str = 'clean_pept') → pandas.DataFrame[source]#

Analyzes the tryptic pattern of peptides and classifies them as tryptic, semi-tryptic, or non-tryptic.

Parameters:

pept_df (pd.DataFrame) – Input DataFrame containing peptide data.
prot_seq (str) – Protein sequence to analyze against.
peptide_col (str) – Column name containing peptide sequences.
clean_pept_col (str, optional) – Column name for cleaned peptide sequences. Defaults to “clean_pept”.

Returns:

DataFrame with additional columns for peptide start, end, and type.

Return type:

pd.DataFrame

proteometer.lip.select_tryptic_pattern(pept_df: pandas.DataFrame, prot_seq: str, tryptic_pattern: str = 'all', peptide_col: str = 'Sequence', clean_pept_col: str = 'clean_pept') → pandas.DataFrame[source]#

Selects peptides based on their digestion pattern.

Parameters:

pept_df (pd.DataFrame) – Input DataFrame containing peptide data.
prot_seq (str) – Protein sequence to analyze against.
tryptic_pattern (str, optional) – Digestion pattern to filter peptides. Defaults to “all”. must be one of: all, any-tryptic, tryptic, semi-tryptic, non-tryptic.
peptide_col (str, optional) – Column name containing peptide sequences. Defaults to “Sequence”.
clean_pept_col (str, optional) – Column name for cleaned peptide sequences. Defaults to “clean_pept”.

Returns:

Filtered DataFrame with peptides matching the specified digestion pattern.

Return type:

pd.DataFrame

proteometer.lip.analyze_tryptic_pattern(protein: pandas.DataFrame, sequence: str, pairwise_ttest_groups: collections.abc.Iterable[proteometer.stats.TTestGroup], peptide_col: str, description: str = '', anova_type: str = '[Group]', keep_non_tryptic: bool = True, id_separator: str = '@', sig_type: str = 'pval', sig_thr: float = 0.05) → pandas.DataFrame[source]#

Analyzes tryptic patterns and calculates statistics for peptides.

Parameters:

protein (pd.DataFrame) – Input DataFrame containing proteomics data.
sequence (str) – Protein sequence to analyze against.
pairwise_ttest_groups (Iterable[TTestGroup]) – Groups for pairwise t-tests.
peptide_col (str) – Column name containing peptide sequences.
description (str, optional) – Protein description to add to data frame. Defaults to “”.
anova_type (str, optional) – Type of ANOVA analysis. Defaults to “[Group]”.
keep_non_tryptic (bool, optional) – Whether to keep non-tryptic peptides. Defaults to True.
id_separator (str, optional) – Separator for peptide IDs. Defaults to “@”.
sig_type (str, optional) – Significance type (e.g., “pval”). Defaults to “pval”.
sig_thr (float, optional) – Significance threshold. Defaults to 0.05.

Returns:

DataFrame with analyzed tryptic patterns and statistics.

Return type:

pd.DataFrame

proteometer.lip.rollup_to_lytic_site(double_pept: pandas.DataFrame, prot_seqs: list[proteometer.fasta.SeqRecord], int_cols: collections.abc.Iterable[str], par: proteometer.params.Params) → pandas.DataFrame[source]#

Converts the double-peptide data frame to a site-level data frame.

Parameters:

double_pept (pd.DataFrame) – The double-peptide data frame.
prot_seqs (list[fasta.SeqRecord]) – The list of protein sequences.
int_cols (Iterable[str]) – The names of columns to with intensity values.
anova_cols (list[str]) – The columns for ANOVA.
pairwise_ttest_groups (Iterable[stats.TTestGroup]) – The pairwise T-test groups.
metadata (pd.DataFrame) – The metadata data frame.
par (Params) – The parameters for limitied proteolysis analysis.

Returns:

A data frame with the site-level data.

Return type:

pd.DataFrame

proteometer.lip.rollup_single_protein_to_lytic_site(df: pandas.DataFrame, int_cols: collections.abc.Iterable[str], uniprot_col: str, sequence: str, residue_col: str = 'Residue', description: str = '', tryptic_pattern: str = 'all', peptide_col: str = 'Sequence', clean_pept_col: str = 'clean_pept', id_separator: str = '@', id_col: str = 'id', pept_type_col: str = 'pept_type', site_col: str = 'Site', pos_col: str = 'Pos', multiply_rollup_counts: bool = True, ignore_NA: bool = True, alternative_protease: str = 'ProK', rollup_func: Literal['median', 'mean', 'sum'] = 'sum') → pandas.DataFrame[source]#

Rolls up peptide-level limited proteolysis data to lytic sites.

Parameters:

df (pd.DataFrame) – Input DataFrame containing peptide data.
int_cols (Iterable[str]) – Columns with intensity values to aggregate.
uniprot_col (str) – Column name for UniProt IDs.
sequence (str) – Protein sequence to analyze against.
residue_col (str, optional) – Column name for lytic residues. Defaults to “Residue”.
description (str, optional) – Protein description to add to data frame. Defaults to “”.
tryptic_pattern (str, optional) – Digestion pattern to filter peptides. Defaults to “all”.
peptide_col (str, optional) – Column name containing peptide sequences. Defaults to “Sequence”.
clean_pept_col (str, optional) – Column name for cleaned peptide sequences. Defaults to “clean_pept”.
id_separator (str, optional) – Separator for IDs. Defaults to “@”.
id_col (str, optional) – Column name for IDs. Defaults to “id”.
pept_type_col (str, optional) – Column name for peptide types. Defaults to “pept_type”.
site_col (str, optional) – Column name for lytic sites. Defaults to “Site”.
pos_col (str, optional) – Column name for positions. Defaults to “Pos”.
multiply_rollup_counts (bool, optional) – Whether to multiply rollup counts. Defaults to True.
ignore_NA (bool, optional) – Whether to ignore NA values. Defaults to True.
alternative_protease (str, optional) – Name of the alternative protease. Defaults to “ProK”.
rollup_func (Literal["median", "mean", "sum"], optional) – Aggregation function. Defaults to “median”.

Returns:

DataFrame with rolled-up lytic site data and aggregated statistics.

Return type:

pd.DataFrame

proteometer.lip.select_lytic_sites(site_df: pandas.DataFrame, site_type: str = 'prok', site_type_col: str = 'Lytic site type') → pandas.DataFrame[source]#

Selects lytic sites based on the specified site type.

Parameters:

site_df (pd.DataFrame) – Input DataFrame containing lytic site data.
site_type (str, optional) – Type of lytic site to select. Defaults to “prok”.
site_type_col (str, optional) – Column name for lytic site types. Defaults to “Lytic site type”.

Returns:

Filtered DataFrame with selected lytic sites.

Return type:

pd.DataFrame

proteometer.lip.delta_prok_site(peptide_df: pandas.DataFrame, site_df: pandas.DataFrame, int_cols: list[str], site_type_col: str = 'Type', site_protein_col: str = 'Protein', pept_protein_col: str = 'Protein', protein_length_col: str = 'Protein length', site_pept_col: str = 'Peptide', pept_pept_col: str = 'Peptide', position_col: str = 'Pos', pept_start_col: str = 'pept_start', pept_end_col: str = 'pept_end', rollup_method: Literal['median', 'mean', 'sum'] = 'median') → pandas.DataFrame[source]#

Computes exposure values for each lytic (ProK) site.

This is computed as the average log intensity of peptides for which the site is a lytic site minus the average log intensity peptides that contain the site in their sequence. The average function is determined by the rollup_method parameter.

Parameters:

peptide_df (pd.DataFrame) – DataFrame containing peptide data.
site_df (pd.DataFrame) – DataFrame containing lytic site data.
int_cols (list[str]) – List of columns to aggregate.
site_type_col (str, optional) – Column name for lytic site types. Defaults to “Type”.
site_protein_col (str, optional) – Column name for protein IDs in the lytic site DataFrame. Defaults to “Protein”.
pept_protein_col (str, optional) – Column name for protein IDs in the peptide DataFrame. Defaults to “Protein”.
protein_length_col (str, optional) – Column name for protein lengths. Defaults to “Protein length”.
site_pept_col (str, optional) – Column name for peptides in the lytic site DataFrame. Defaults to “Peptide”.
pept_pept_col (str, optional) – Column name for peptides in the peptide DataFrame. Defaults to “Peptide”.
position_col (str, optional) – Column name for positions in the lytic site DataFrame. Defaults to “Pos”.
pept_start_col (str, optional) – Column name for start positions in the peptide DataFrame. Defaults to “pept_start”.
pept_end_col (str, optional) – Column name for end positions in the peptide DataFrame. Defaults to “pept_end”.
rollup_method (Literal["median", "mean", "sum"], optional) – Aggregation method to use. Defaults to “median”. The “sum” is done in linear space.

Returns:

DataFrame with delta values for each lytic site.

Return type:

pd.DataFrame

proteometer.lip#

Attributes#

Functions#

Module Contents#

This Page