proteometer.utils#

Functions#

`flatten`(→ List[Any])	Flattens a nested list into a single list.
`generate_index`(→ pandas.DataFrame)	Generate a unique index for a DataFrame based on protein column identifier and optional level column identifier.
`check_missingness`(→ pandas.DataFrame)	Calculate missingness for specified groups in a DataFrame.
`filter_missingness`(→ pandas.DataFrame)	Filter rows in a DataFrame based on missingness thresholds for specified groups.
`expsum`(→ float)

Module Contents#

proteometer.utils.flatten(s: List[Any]) → List[Any][source]#

Flattens a nested list into a single list. :param s: A list that may contain nested lists. :type s: List[Any]

Returns:: A flattened list containing all elements from the input list.
Return type:: List[Any]

proteometer.utils.generate_index(df: pandas.DataFrame, prot_col: str, level_col: str | None = None, id_separator: str = '@', id_col: str = 'id') → pandas.DataFrame[source]#

Generate a unique index for a DataFrame based on protein column identifier and optional level column identifier.

Parameters:

df (pd.DataFrame) – Input DataFrame.
prot_col (str) – Column name for protein identifiers.
level_col (str | None, optional) – Column name for level identifiers. Defaults to None.
id_separator (str, optional) – Separator for combining protein and level identifiers. Defaults to “@”.
id_col (str, optional) – Name of the new column for the generated index. Defaults to “id”.

Returns:

DataFrame with the generated index.

Return type:

pd.DataFrame

proteometer.utils.check_missingness(df: pandas.DataFrame, groups: collections.abc.Sequence[str], group_cols: collections.abc.Sequence[collections.abc.Sequence[str]]) → pandas.DataFrame[source]#

Calculate missingness for specified groups in a DataFrame.

Parameters:

df (pd.DataFrame) – Input DataFrame.
groups (Sequence[str]) – Names of the groups.
group_cols (Sequence[Sequence[str]]) – Columns corresponding to each group.

Returns:

DataFrame with missingness information added.

Return type:

pd.DataFrame

proteometer.utils.filter_missingness(df: pandas.DataFrame, groups: collections.abc.Sequence[str], group_cols: collections.abc.Sequence[collections.abc.Sequence[str]], min_replicates_qc: int = 2, method: Literal['all', 'any'] = 'any') → pandas.DataFrame[source]#

Filter rows in a DataFrame based on missingness thresholds for specified groups.

Parameters:

df (pd.DataFrame) – Input DataFrame.
groups (Sequence[str]) – Names of the groups.
group_cols (Sequence[Sequence[str]]) – Columns corresponding to each group.
min_replicates_qc (float, optional) – Threshold for minimal number of replicates that are not NA. Defaults to 2.
method (str, optional) – Method for filtering. Can be “all” or “any”. Defaults to “all”. If “all”, all groups must meet the threshold. If “any”, at least one group must meet the threshold.

Returns:

Filtered DataFrame.

Return type:

pd.DataFrame

proteometer.utils.expsum(x: pandas.Series[float]) → float[source]#

proteometer.utils#

Functions#

Module Contents#

This Page