valpas.valpas_core.associate#

valpas.valpas_core.associate(association_type='pearson', infile=None, infile2=None, infolder=None, file_type='csv', sheet=None, sheet2=None, output_type='sorted_list', filter_cutoff=0.9, normalization='none', min_counts=3, training_interactions=None, calculate_confidence=False, transform_clr=False, annotation_file=None, overwrite_output=False, outfile=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, report_file=None, annotation_args={'case_sensitive': True, 'handle_missing_annotations': 'keep', 'primary_annotation_column': 'annotation', 'primary_id_column': 'id', 'remove_duplicates': 'warn', 'sheet_name': 0, 'strip_whitespace': True, 'validate_ids': True, 'verbose': True}, learncorr_args={'learning_method': 'empirical', 'missing_strategy': 'median'}, autoencoder_args={'epochs': 200, 'hidden_dims': [256, 128], 'learning_rate': 0.001, 'mask_probability': 0.15, 'protein_embedding_dim': 128, 'sample_embedding_dim': 64, 'scaling_method': 'robust', 'validation_split': 0.2}, subset_args={'inplace': False, 'keep_conds': [], 'nconds': None, 'percentage': None, 'random_state': 0}, confidence_args={'additional_metrics': None, 'calculate_limit': 10000, 'confidence_metric': 'ppv', 'exclude_negative_interactions': None, 'extrapolate_confidence': False, 'min_threshold_samples': 1, 'negative_interactions': None, 'negative_ratio': 0, 'normalize_pairs': False, 'protein_col1': 'protein1', 'protein_col2': 'protein2', 'return_all': False, 'weight_col': 'weight'})#

Establishes association values between items (e.g., proteins, lipids, or metabolites).

Parameters:
  • association_type (str) – Type of association metric (e.g., ‘spearman’, ‘pearson’).

  • infile (str or Path) – Path to the primary file containing data points.

  • infile2 (str or Path) – Path to the optional second file (cross-omics).

  • infolder (str or Path) – Path to folder containing multiple files.

  • file_type (str) – File type (‘csv’ or ‘xlsx’).

  • sheet (str) – Name of the Excel sheet (if applicable).

  • sheet2 (str) – Secondary Excel sheet for the second file (if applicable).

  • output_type (str) – Output format (‘sorted_list’, ‘correlation_matrix’).

  • filter_cutoff (float) – Cutoff for filtering missing values.

  • normalization (str) – Normalization mode (‘pre’, ‘post’, ‘none’).

  • min_counts (int) – Filter out edges with fewer comparisons.

  • transform_clr

  • training_interactions

  • calculate_confidence (bool) – if True and training_interactions are provided then use training_interactions to calculate confidence values for predictions

  • subset_args (dict) – dict, default = {} Keyword arguments to pass to subsetting function

  • autoencoder_args (dict) – dict, default = {} Keyword arguments to pass to autoencoder function

  • learncorr_args (dict) – dict, default = {} Keyword arguments to pass to learn correlation function

  • confidence_args (dict) – dict, default = {} Keyword arguments to pass to confidence calculation function

  • overwrite_output (bool) – Whether to overwrite the existing output.

  • outfile (str or Path) – Path for saving the output file or sys.stdout.

  • annotation_args (dict)

Returns:

Computed associations as a DataFrame or saves to the outfile.

Return type:

result