proteometer.residue#
Functions#
|
Extracts residue names from an iterable of residue strings. |
|
Extracts residue positions from an iterable of residue strings. |
|
Counts the number of sites per protein in a given DataFrame. |
Counts the number of sites per protein in a given DataFrame, with the global proteomics |
Module Contents#
- proteometer.residue.get_res_names(residues: collections.abc.Iterable[str]) list[list[str]] [source]#
Extracts residue names from an iterable of residue strings.
- Parameters:
residues (Iterable[str]) – An iterable of residue strings, each containing an uppercase letter followed by digits and optional lowercase letters or hyphens.
- Returns:
- A list of lists, where each inner list contains the extracted
residue names from the corresponding input string.
- Return type:
- proteometer.residue.get_res_pos(residues: collections.abc.Iterable[str]) list[list[int]] [source]#
Extracts residue positions from an iterable of residue strings.
- Parameters:
residues (Iterable[str]) – An iterable of residue strings, each containing an uppercase letter followed by digits and optional lowercase letters or hyphens.
- Returns:
- A list of lists, where each inner list contains the extracted
residue positions from the corresponding input string.
- Return type:
- proteometer.residue.count_site_number(df: pandas.DataFrame, uniprot_col: str, site_number_col: str = 'site_number') pandas.DataFrame [source]#
Counts the number of sites per protein in a given DataFrame.
- Parameters:
- Returns:
DataFrame with the site number added.
- Return type:
pd.DataFrame
- proteometer.residue.count_site_number_with_global_proteomics(df: pandas.DataFrame, uniprot_col: str, id_col: str, site_number_col: str = 'site_number') pandas.DataFrame [source]#
Counts the number of sites per protein in a given DataFrame, with the global proteomics data used as the reference.
- Parameters:
df (pd.DataFrame) – DataFrame containing protein and site information. The index of this DataFrame must match
id_col
.uniprot_col (str) – Column name of the protein identifier.
id_col (str) – Column name of the identifier that matches the index of the DataFrame.
site_number_col (str, optional) – Name of the column to store the site number. Defaults to ‘site_number’.
- Returns:
DataFrame with the site number added.
- Return type:
pd.DataFrame