proteometer.residue#

Functions#

get_res_names(→ list[list[str]])

Extracts residue names from an iterable of residue strings.

get_res_pos(→ list[list[int]])

Extracts residue positions from an iterable of residue strings.

count_site_number(→ pandas.DataFrame)

Counts the number of sites per protein in a given DataFrame.

count_site_number_with_global_proteomics(...)

Counts the number of sites per protein in a given DataFrame, with the global proteomics

Module Contents#

proteometer.residue.get_res_names(residues: collections.abc.Iterable[str]) list[list[str]][source]#

Extracts residue names from an iterable of residue strings.

Parameters:

residues (Iterable[str]) – An iterable of residue strings, each containing an uppercase letter followed by digits and optional lowercase letters or hyphens.

Returns:

A list of lists, where each inner list contains the extracted

residue names from the corresponding input string.

Return type:

list[list[str]]

proteometer.residue.get_res_pos(residues: collections.abc.Iterable[str]) list[list[int]][source]#

Extracts residue positions from an iterable of residue strings.

Parameters:

residues (Iterable[str]) – An iterable of residue strings, each containing an uppercase letter followed by digits and optional lowercase letters or hyphens.

Returns:

A list of lists, where each inner list contains the extracted

residue positions from the corresponding input string.

Return type:

list[list[int]]

proteometer.residue.count_site_number(df: pandas.DataFrame, uniprot_col: str, site_number_col: str = 'site_number') pandas.DataFrame[source]#

Counts the number of sites per protein in a given DataFrame.

Parameters:
  • df (pd.DataFrame) – DataFrame containing protein and site information.

  • uniprot_col (str) – Column name of the protein identifier.

  • site_number_col (str, optional) – Name of the column to store the site number. Defaults to ‘site_number’.

Returns:

DataFrame with the site number added.

Return type:

pd.DataFrame

proteometer.residue.count_site_number_with_global_proteomics(df: pandas.DataFrame, uniprot_col: str, id_col: str, site_number_col: str = 'site_number') pandas.DataFrame[source]#

Counts the number of sites per protein in a given DataFrame, with the global proteomics data used as the reference.

Parameters:
  • df (pd.DataFrame) – DataFrame containing protein and site information. The index of this DataFrame must match id_col.

  • uniprot_col (str) – Column name of the protein identifier.

  • id_col (str) – Column name of the identifier that matches the index of the DataFrame.

  • site_number_col (str, optional) – Name of the column to store the site number. Defaults to ‘site_number’.

Returns:

DataFrame with the site number added.

Return type:

pd.DataFrame