valpas.confidence_evaluation.calculate_edge_confidence#

valpas.confidence_evaluation.calculate_edge_confidence(edges_df, positive_interactions, negative_interactions=None, exclude_negative_interactions=None, protein_col1='protein1', protein_col2='protein2', weight_col='weight', calculate_limit=10000, return_all=False, confidence_metric='ppv', additional_metrics=None, min_threshold_samples=1, min_counts=3, negative_ratio=0, normalize_pairs=False, extrapolate_confidence=False, verbose=True, **kwargs)#

Calculate confidence scores for edges based on positive/negative interaction lists

Parameters:
  • edges_df (DataFrame) – DataFrame with protein pairs and weights

  • positive_interactions (List[Tuple[str, str]]) – List of (protein1, protein2) tuples for known positives

  • negative_interactions (List[Tuple[str, str]]) – List of (protein1, protein2) tuples for known negatives

  • exclude_negative_interactions (List[Tuple[str, str]]) – List of (protein1, protein2) tuples to exclude from random negatives

  • protein_col1 (str) – Column name for first protein

  • protein_col2 (str) – Column name for second protein

  • weight_col (str) – Column name for edge weights

  • calculate_limit (int) – Only calculate confidence for the top N scoring edges

  • return_all (bool) – For calculate_limit if True will return all edges (w and w/o confidence)

  • confidence_metric (str) – Primary metric (‘ppv’, ‘precision’, ‘recall’, ‘f1’, ‘accuracy’, ‘enrichment’)

  • additional_metrics (List[str]) – List of additional metrics to calculate

  • min_threshold_samples (int) – Minimum samples needed above threshold for reliable confidence

  • min_counts (int) – Minimum number of matching values used in association calculation

  • normalize_pairs (bool) – Whether to normalize protein pair order (A,B) = (B,A)

  • extrapolate_confidence (bool) – Whether to assign predictions max confidence if they’re before confidence scores

  • verbose (bool) – Whether to print progress information

  • negative_ratio (int)

Returns:

DataFrame with added confidence scores and metrics

Return type:

DataFrame