valpas.valpas_core.associate#
- valpas.valpas_core.associate(association_type='pearson', infile=None, infile2=None, infolder=None, file_type='csv', sheet=None, sheet2=None, output_type='sorted_list', filter_cutoff=0.9, normalization='none', min_counts=3, training_interactions=None, calculate_confidence=False, transform_clr=False, annotation_file=None, overwrite_output=False, outfile=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, report_file=None, annotation_args={'case_sensitive': True, 'handle_missing_annotations': 'keep', 'primary_annotation_column': 'annotation', 'primary_id_column': 'id', 'remove_duplicates': 'warn', 'sheet_name': 0, 'strip_whitespace': True, 'validate_ids': True, 'verbose': True}, learncorr_args={'learning_method': 'empirical', 'missing_strategy': 'median'}, autoencoder_args={'epochs': 200, 'hidden_dims': [256, 128], 'learning_rate': 0.001, 'mask_probability': 0.15, 'protein_embedding_dim': 128, 'sample_embedding_dim': 64, 'scaling_method': 'robust', 'validation_split': 0.2}, subset_args={'inplace': False, 'keep_conds': [], 'nconds': None, 'percentage': None, 'random_state': 0}, confidence_args={'additional_metrics': None, 'calculate_limit': 10000, 'confidence_metric': 'ppv', 'exclude_negative_interactions': None, 'extrapolate_confidence': False, 'min_threshold_samples': 1, 'negative_interactions': None, 'negative_ratio': 0, 'normalize_pairs': False, 'protein_col1': 'protein1', 'protein_col2': 'protein2', 'return_all': False, 'weight_col': 'weight'})#
Establishes association values between items (e.g., proteins, lipids, or metabolites).
- Parameters:
association_type (str) – Type of association metric (e.g., ‘spearman’, ‘pearson’).
infile (str or Path) – Path to the primary file containing data points.
infile2 (str or Path) – Path to the optional second file (cross-omics).
infolder (str or Path) – Path to folder containing multiple files.
file_type (str) – File type (‘csv’ or ‘xlsx’).
sheet (str) – Name of the Excel sheet (if applicable).
sheet2 (str) – Secondary Excel sheet for the second file (if applicable).
output_type (str) – Output format (‘sorted_list’, ‘correlation_matrix’).
filter_cutoff (float) – Cutoff for filtering missing values.
normalization (str) – Normalization mode (‘pre’, ‘post’, ‘none’).
min_counts (int) – Filter out edges with fewer comparisons.
transform_clr
training_interactions
calculate_confidence (bool) – if True and training_interactions are provided then use training_interactions to calculate confidence values for predictions
subset_args (dict) – dict, default = {} Keyword arguments to pass to subsetting function
autoencoder_args (dict) – dict, default = {} Keyword arguments to pass to autoencoder function
learncorr_args (dict) – dict, default = {} Keyword arguments to pass to learn correlation function
confidence_args (dict) – dict, default = {} Keyword arguments to pass to confidence calculation function
overwrite_output (bool) – Whether to overwrite the existing output.
outfile (str or Path) – Path for saving the output file or sys.stdout.
annotation_args (dict)
- Returns:
Computed associations as a DataFrame or saves to the outfile.
- Return type:
result