proteometer.utils#

Functions#

flatten(→ List[Any])

Flattens a nested list into a single list.

generate_index(→ pandas.DataFrame)

Generate a unique index for a DataFrame based on protein column identifier and optional level column identifier.

check_missingness(→ pandas.DataFrame)

Calculate missingness for specified groups in a DataFrame.

filter_missingness(→ pandas.DataFrame)

Filter rows in a DataFrame based on missingness thresholds for specified groups.

expsum(→ float)

Module Contents#

proteometer.utils.flatten(s: List[Any]) List[Any][source]#

Flattens a nested list into a single list. :param s: A list that may contain nested lists. :type s: List[Any]

Returns:

A flattened list containing all elements from the input list.

Return type:

List[Any]

proteometer.utils.generate_index(df: pandas.DataFrame, prot_col: str, level_col: str | None = None, id_separator: str = '@', id_col: str = 'id') pandas.DataFrame[source]#

Generate a unique index for a DataFrame based on protein column identifier and optional level column identifier.

Parameters:
  • df (pd.DataFrame) – Input DataFrame.

  • prot_col (str) – Column name for protein identifiers.

  • level_col (str | None, optional) – Column name for level identifiers. Defaults to None.

  • id_separator (str, optional) – Separator for combining protein and level identifiers. Defaults to “@”.

  • id_col (str, optional) – Name of the new column for the generated index. Defaults to “id”.

Returns:

DataFrame with the generated index.

Return type:

pd.DataFrame

proteometer.utils.check_missingness(df: pandas.DataFrame, groups: collections.abc.Sequence[str], group_cols: collections.abc.Sequence[collections.abc.Sequence[str]]) pandas.DataFrame[source]#

Calculate missingness for specified groups in a DataFrame.

Parameters:
  • df (pd.DataFrame) – Input DataFrame.

  • groups (Sequence[str]) – Names of the groups.

  • group_cols (Sequence[Sequence[str]]) – Columns corresponding to each group.

Returns:

DataFrame with missingness information added.

Return type:

pd.DataFrame

proteometer.utils.filter_missingness(df: pandas.DataFrame, groups: collections.abc.Sequence[str], group_cols: collections.abc.Sequence[collections.abc.Sequence[str]], min_replicates_qc: int = 2, method: Literal['all', 'any'] = 'any') pandas.DataFrame[source]#

Filter rows in a DataFrame based on missingness thresholds for specified groups.

Parameters:
  • df (pd.DataFrame) – Input DataFrame.

  • groups (Sequence[str]) – Names of the groups.

  • group_cols (Sequence[Sequence[str]]) – Columns corresponding to each group.

  • min_replicates_qc (float, optional) – Threshold for minimal number of replicates that are not NA. Defaults to 2.

  • method (str, optional) – Method for filtering. Can be “all” or “any”. Defaults to “all”. If “all”, all groups must meet the threshold. If “any”, at least one group must meet the threshold.

Returns:

Filtered DataFrame.

Return type:

pd.DataFrame

proteometer.utils.expsum(x: pandas.Series[float]) float[source]#