valpas.confidence_evaluation.calculate_edge_confidence#
- valpas.confidence_evaluation.calculate_edge_confidence(edges_df, positive_interactions, negative_interactions=None, exclude_negative_interactions=None, protein_col1='protein1', protein_col2='protein2', weight_col='weight', calculate_limit=10000, return_all=False, confidence_metric='ppv', additional_metrics=None, min_threshold_samples=1, min_counts=3, negative_ratio=0, normalize_pairs=False, extrapolate_confidence=False, verbose=True, **kwargs)#
Calculate confidence scores for edges based on positive/negative interaction lists
- Parameters:
edges_df (DataFrame) – DataFrame with protein pairs and weights
positive_interactions (List[Tuple[str, str]]) – List of (protein1, protein2) tuples for known positives
negative_interactions (List[Tuple[str, str]]) – List of (protein1, protein2) tuples for known negatives
exclude_negative_interactions (List[Tuple[str, str]]) – List of (protein1, protein2) tuples to exclude from random negatives
protein_col1 (str) – Column name for first protein
protein_col2 (str) – Column name for second protein
weight_col (str) – Column name for edge weights
calculate_limit (int) – Only calculate confidence for the top N scoring edges
return_all (bool) – For calculate_limit if True will return all edges (w and w/o confidence)
confidence_metric (str) – Primary metric (‘ppv’, ‘precision’, ‘recall’, ‘f1’, ‘accuracy’, ‘enrichment’)
additional_metrics (List[str]) – List of additional metrics to calculate
min_threshold_samples (int) – Minimum samples needed above threshold for reliable confidence
min_counts (int) – Minimum number of matching values used in association calculation
normalize_pairs (bool) – Whether to normalize protein pair order (A,B) = (B,A)
extrapolate_confidence (bool) – Whether to assign predictions max confidence if they’re before confidence scores
verbose (bool) – Whether to print progress information
negative_ratio (int)
- Returns:
DataFrame with added confidence scores and metrics
- Return type:
DataFrame