proteometer.utils#
Functions#
|
Flattens a nested list into a single list. |
|
Generate a unique index for a DataFrame based on protein column identifier and optional level column identifier. |
|
Calculate missingness for specified groups in a DataFrame. |
|
Filter rows in a DataFrame based on missingness thresholds for specified groups. |
|
Module Contents#
- proteometer.utils.flatten(s: List[Any]) List[Any] [source]#
Flattens a nested list into a single list. :param s: A list that may contain nested lists. :type s: List[Any]
- Returns:
A flattened list containing all elements from the input list.
- Return type:
List[Any]
- proteometer.utils.generate_index(df: pandas.DataFrame, prot_col: str, level_col: str | None = None, id_separator: str = '@', id_col: str = 'id') pandas.DataFrame [source]#
Generate a unique index for a DataFrame based on protein column identifier and optional level column identifier.
- Parameters:
df (pd.DataFrame) – Input DataFrame.
prot_col (str) – Column name for protein identifiers.
level_col (str | None, optional) – Column name for level identifiers. Defaults to None.
id_separator (str, optional) – Separator for combining protein and level identifiers. Defaults to “@”.
id_col (str, optional) – Name of the new column for the generated index. Defaults to “id”.
- Returns:
DataFrame with the generated index.
- Return type:
pd.DataFrame
- proteometer.utils.check_missingness(df: pandas.DataFrame, groups: collections.abc.Sequence[str], group_cols: collections.abc.Sequence[collections.abc.Sequence[str]]) pandas.DataFrame [source]#
Calculate missingness for specified groups in a DataFrame.
- proteometer.utils.filter_missingness(df: pandas.DataFrame, groups: collections.abc.Sequence[str], group_cols: collections.abc.Sequence[collections.abc.Sequence[str]], min_replicates_qc: int = 2, method: Literal['all', 'any'] = 'any') pandas.DataFrame [source]#
Filter rows in a DataFrame based on missingness thresholds for specified groups.
- Parameters:
df (pd.DataFrame) – Input DataFrame.
groups (Sequence[str]) – Names of the groups.
group_cols (Sequence[Sequence[str]]) – Columns corresponding to each group.
min_replicates_qc (float, optional) – Threshold for minimal number of replicates that are not NA. Defaults to 2.
method (str, optional) – Method for filtering. Can be “all” or “any”. Defaults to “all”. If “all”, all groups must meet the threshold. If “any”, at least one group must meet the threshold.
- Returns:
Filtered DataFrame.
- Return type:
pd.DataFrame