proteometer.lip#
Attributes#
Functions#
|
Filters out contaminants and reverse hits from a peptide DataFrame. |
|
Filters out contaminants and reverse hits from a protein DataFrame. |
Filters proteins based on the minimum number of peptides. |
|
|
Cleans peptide sequences by removing modifications and returns a DataFrame with cleaned peptides. |
|
Analyzes the tryptic pattern of peptides and classifies them as tryptic, semi-tryptic, or non-tryptic. |
|
Selects peptides based on their digestion pattern. |
|
Analyzes tryptic patterns and calculates statistics for peptides. |
|
Converts the double-peptide data frame to a site-level data frame. |
|
Rolls up peptide-level limited proteolysis data to lytic sites. |
|
Selects lytic sites based on the specified site type. |
|
Computes exposure values for each lytic (ProK) site. |
Module Contents#
- proteometer.lip.filter_contaminants_reverse_pept(df: pandas.DataFrame, search_tool: Literal['maxquant', 'msfragger', 'fragpipe'], protein_id_col_pept: str, uniprot_col: str) pandas.DataFrame [source]#
Filters out contaminants and reverse hits from a peptide DataFrame.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing peptide data.
search_tool (Literal["maxquant", "msfragger", "fragpipe"]) – The search tool used for data generation.
protein_id_col_pept (str) – Column name containing protein IDs in the peptide DataFrame.
uniprot_col (str) – Column name to store UniProt IDs.
- Returns:
Filtered DataFrame with contaminants and reverse hits removed.
- Return type:
pd.DataFrame
- proteometer.lip.filter_contaminants_reverse_prot(df: pandas.DataFrame, search_tool: Literal['maxquant', 'msfragger', 'fragpipe'], protein_id_col_prot: str, uniprot_col: str) pandas.DataFrame [source]#
Filters out contaminants and reverse hits from a protein DataFrame.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing protein data.
search_tool (Literal["maxquant", "msfragger", "fragpipe"]) – The search tool used for data generation.
protein_id_col_prot (str) – Column name containing protein IDs in the protein DataFrame.
uniprot_col (str) – Column name to store UniProt IDs.
- Returns:
Filtered DataFrame with contaminants and reverse hits removed.
- Return type:
pd.DataFrame
- proteometer.lip.filtering_protein_based_on_peptide_number(df2filter: pandas.DataFrame, peptide_counts_col: str, search_tool: Literal['maxquant', 'msfragger', 'fragpipe'], min_pept_count: int = 2) pandas.DataFrame [source]#
Filters proteins based on the minimum number of peptides.
- Parameters:
df2filter (pd.DataFrame) – Input DataFrame containing proteomics data.
peptide_counts_col (str) – Column name containing peptide counts.
search_tool (Literal["maxquant", "msfragger", "fragpipe"]) – The search tool used for data generation.
min_pept_count (int, optional) – Minimum number of peptides required. Defaults to 2.
- Returns:
Filtered DataFrame with proteins having at least
min_pept_count
peptides.- Return type:
pd.DataFrame
- proteometer.lip.get_clean_peptides(pept_df: pandas.DataFrame, peptide_col: str, clean_pept_col: str = 'clean_pept') pandas.DataFrame [source]#
Cleans peptide sequences by removing modifications and returns a DataFrame with cleaned peptides.
- Parameters:
- Returns:
DataFrame with an additional column for cleaned peptide sequences.
- Return type:
pd.DataFrame
- proteometer.lip.get_tryptic_types(pept_df: pandas.DataFrame, prot_seq: str, peptide_col: str, clean_pept_col: str = 'clean_pept') pandas.DataFrame [source]#
Analyzes the tryptic pattern of peptides and classifies them as tryptic, semi-tryptic, or non-tryptic.
- Parameters:
- Returns:
DataFrame with additional columns for peptide start, end, and type.
- Return type:
pd.DataFrame
- proteometer.lip.select_tryptic_pattern(pept_df: pandas.DataFrame, prot_seq: str, tryptic_pattern: str = 'all', peptide_col: str = 'Sequence', clean_pept_col: str = 'clean_pept') pandas.DataFrame [source]#
Selects peptides based on their digestion pattern.
- Parameters:
pept_df (pd.DataFrame) – Input DataFrame containing peptide data.
prot_seq (str) – Protein sequence to analyze against.
tryptic_pattern (str, optional) – Digestion pattern to filter peptides. Defaults to “all”. must be one of: all, any-tryptic, tryptic, semi-tryptic, non-tryptic.
peptide_col (str, optional) – Column name containing peptide sequences. Defaults to “Sequence”.
clean_pept_col (str, optional) – Column name for cleaned peptide sequences. Defaults to “clean_pept”.
- Returns:
Filtered DataFrame with peptides matching the specified digestion pattern.
- Return type:
pd.DataFrame
- proteometer.lip.analyze_tryptic_pattern(protein: pandas.DataFrame, sequence: str, pairwise_ttest_groups: collections.abc.Iterable[proteometer.stats.TTestGroup], peptide_col: str, description: str = '', anova_type: str = '[Group]', keep_non_tryptic: bool = True, id_separator: str = '@', sig_type: str = 'pval', sig_thr: float = 0.05) pandas.DataFrame [source]#
Analyzes tryptic patterns and calculates statistics for peptides.
- Parameters:
protein (pd.DataFrame) – Input DataFrame containing proteomics data.
sequence (str) – Protein sequence to analyze against.
pairwise_ttest_groups (Iterable[TTestGroup]) – Groups for pairwise t-tests.
peptide_col (str) – Column name containing peptide sequences.
description (str, optional) – Protein description to add to data frame. Defaults to “”.
anova_type (str, optional) – Type of ANOVA analysis. Defaults to “[Group]”.
keep_non_tryptic (bool, optional) – Whether to keep non-tryptic peptides. Defaults to True.
id_separator (str, optional) – Separator for peptide IDs. Defaults to “@”.
sig_type (str, optional) – Significance type (e.g., “pval”). Defaults to “pval”.
sig_thr (float, optional) – Significance threshold. Defaults to 0.05.
- Returns:
DataFrame with analyzed tryptic patterns and statistics.
- Return type:
pd.DataFrame
- proteometer.lip.rollup_to_lytic_site(double_pept: pandas.DataFrame, prot_seqs: list[proteometer.fasta.SeqRecord], int_cols: collections.abc.Iterable[str], par: proteometer.params.Params) pandas.DataFrame [source]#
Converts the double-peptide data frame to a site-level data frame.
- Parameters:
double_pept (pd.DataFrame) – The double-peptide data frame.
prot_seqs (list[fasta.SeqRecord]) – The list of protein sequences.
int_cols (Iterable[str]) – The names of columns to with intensity values.
pairwise_ttest_groups (Iterable[stats.TTestGroup]) – The pairwise T-test groups.
metadata (pd.DataFrame) – The metadata data frame.
par (Params) – The parameters for limitied proteolysis analysis.
- Returns:
A data frame with the site-level data.
- Return type:
pd.DataFrame
- proteometer.lip.rollup_single_protein_to_lytic_site(df: pandas.DataFrame, int_cols: collections.abc.Iterable[str], uniprot_col: str, sequence: str, residue_col: str = 'Residue', description: str = '', tryptic_pattern: str = 'all', peptide_col: str = 'Sequence', clean_pept_col: str = 'clean_pept', id_separator: str = '@', id_col: str = 'id', pept_type_col: str = 'pept_type', site_col: str = 'Site', pos_col: str = 'Pos', multiply_rollup_counts: bool = True, ignore_NA: bool = True, alternative_protease: str = 'ProK', rollup_func: Literal['median', 'mean', 'sum'] = 'sum') pandas.DataFrame [source]#
Rolls up peptide-level limited proteolysis data to lytic sites.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing peptide data.
int_cols (Iterable[str]) – Columns with intensity values to aggregate.
uniprot_col (str) – Column name for UniProt IDs.
sequence (str) – Protein sequence to analyze against.
residue_col (str, optional) – Column name for lytic residues. Defaults to “Residue”.
description (str, optional) – Protein description to add to data frame. Defaults to “”.
tryptic_pattern (str, optional) – Digestion pattern to filter peptides. Defaults to “all”.
peptide_col (str, optional) – Column name containing peptide sequences. Defaults to “Sequence”.
clean_pept_col (str, optional) – Column name for cleaned peptide sequences. Defaults to “clean_pept”.
id_separator (str, optional) – Separator for IDs. Defaults to “@”.
id_col (str, optional) – Column name for IDs. Defaults to “id”.
pept_type_col (str, optional) – Column name for peptide types. Defaults to “pept_type”.
site_col (str, optional) – Column name for lytic sites. Defaults to “Site”.
pos_col (str, optional) – Column name for positions. Defaults to “Pos”.
multiply_rollup_counts (bool, optional) – Whether to multiply rollup counts. Defaults to True.
ignore_NA (bool, optional) – Whether to ignore NA values. Defaults to True.
alternative_protease (str, optional) – Name of the alternative protease. Defaults to “ProK”.
rollup_func (Literal["median", "mean", "sum"], optional) – Aggregation function. Defaults to “median”.
- Returns:
DataFrame with rolled-up lytic site data and aggregated statistics.
- Return type:
pd.DataFrame
- proteometer.lip.select_lytic_sites(site_df: pandas.DataFrame, site_type: str = 'prok', site_type_col: str = 'Lytic site type') pandas.DataFrame [source]#
Selects lytic sites based on the specified site type.
- Parameters:
- Returns:
Filtered DataFrame with selected lytic sites.
- Return type:
pd.DataFrame
- proteometer.lip.delta_prok_site(peptide_df: pandas.DataFrame, site_df: pandas.DataFrame, int_cols: list[str], site_type_col: str = 'Type', site_protein_col: str = 'Protein', pept_protein_col: str = 'Protein', protein_length_col: str = 'Protein length', site_pept_col: str = 'Peptide', pept_pept_col: str = 'Peptide', position_col: str = 'Pos', pept_start_col: str = 'pept_start', pept_end_col: str = 'pept_end', rollup_method: Literal['median', 'mean', 'sum'] = 'median') pandas.DataFrame [source]#
Computes exposure values for each lytic (ProK) site.
This is computed as the average log intensity of peptides for which the site is a lytic site minus the average log intensity peptides that contain the site in their sequence. The average function is determined by the rollup_method parameter.
- Parameters:
peptide_df (pd.DataFrame) – DataFrame containing peptide data.
site_df (pd.DataFrame) – DataFrame containing lytic site data.
site_type_col (str, optional) – Column name for lytic site types. Defaults to “Type”.
site_protein_col (str, optional) – Column name for protein IDs in the lytic site DataFrame. Defaults to “Protein”.
pept_protein_col (str, optional) – Column name for protein IDs in the peptide DataFrame. Defaults to “Protein”.
protein_length_col (str, optional) – Column name for protein lengths. Defaults to “Protein length”.
site_pept_col (str, optional) – Column name for peptides in the lytic site DataFrame. Defaults to “Peptide”.
pept_pept_col (str, optional) – Column name for peptides in the peptide DataFrame. Defaults to “Peptide”.
position_col (str, optional) – Column name for positions in the lytic site DataFrame. Defaults to “Pos”.
pept_start_col (str, optional) – Column name for start positions in the peptide DataFrame. Defaults to “pept_start”.
pept_end_col (str, optional) – Column name for end positions in the peptide DataFrame. Defaults to “pept_end”.
rollup_method (Literal["median", "mean", "sum"], optional) – Aggregation method to use. Defaults to “median”. The “sum” is done in linear space.
- Returns:
DataFrame with delta values for each lytic site.
- Return type:
pd.DataFrame