Evaluation Procedure in VIFY

The evaluation procedure in the vify repository is implemented in the script scripts/evaluate.py. This script is responsible for assessing the performance of trained models using configurations specified in the config/eval_config.yaml file. Below is a detailed explanation of the evaluation process, its components, and how to use it effectively.

Evaluation Procedure in VIFY

Overview

The evaluation script is designed to:

Load configurations using Hydra and OmegaConf.
Extract features from the dataset.
Evaluate the model using a full dataset prediction.
Compute evaluation metrics such as AUC (Area Under the Curve).
Generate visualizations like ROC curves, scatter plots, and violin plots.
Save evaluation results for further analysis.

Configuration

The evaluation script relies on a configuration file located at config/eval_config.yaml. This file defines all the parameters required for evaluation. A detailed explaination about each parameter can be found in this config as well.

Evaluation Workflow

The evaluation process follows these steps:

Load Configuration:
- The configuration is loaded using Hydra and converted into a Python object for easier manipulation.
Feature Extraction:
- Features are extracted from the dataset using the extract_features_from_studies function if extract_features is set to True. Otherwise, features are loaded from disk.
Model Evaluation:
- The model is evaluated on the full dataset.
Metrics Computation:
- Metrics such as AUC, Sensitivity and Specificity are computed.
Visualization:
- Visualizations such as ROC curves, scatter plots, and violin plots are generated based on the evaluation results.
Save Results:
- The evaluation results, metrics, and visualizations are saved to the specified output directory.

Logging

The script supports two logging mechanisms:

ClearML:
- If use_clearml is enabled, the script initializes a ClearML task and logger using the get_task_and_logger function.
- This allows for detailed experiment tracking, including metrics, hyperparameters, and artifacts.
Standard Logging:
- If ClearML is not used, the script sets up standard Python logging with the specified logging level.

Output Structure

The outputs of the evaluation process are stored in

<output_path>/<project_name>/<task_name>/<timestamp>/

How to Run the Evaluation Script

To run the evaluation script, use the following command:

python scripts/evaluate.py

Code Walkthrough

1. Configuration Loading

@hydra.main(
    version_base=None,
    config_path="../config",
    config_name="eval_config",
)
def run_evaluation(omega_config: OmegaConf) -> None:
    config = cast(BaseConfig, OmegaConf.to_object(omega_config))

The @hydra.main decorator loads the configuration file.
The configuration is converted into a Python object for easier manipulation.

2. Feature Extraction

if config.extract_features:
    for image_type in config.image_types:
        extract_features_from_studies(
            studies_path=config.studies_path,
            patient_csv=config.patient_csv,
            output_path=experiment_path,
            segmentation_method=config.lesion_segmentation_method.id,
            queries=config.data_queries.kwargs[image_type],
            image_type=image_type,
            feature_extraction_methods=config.feature_extraction_methods,
            lesion_cutoff=config.lesion_segmentation_method.softmax_cut_off,
            dilation_factor=config.lesion_segmentation_method.dilation_factor,
        )
    features_path = experiment_path
elif config.features_path is not None:
    features_path = str(config.features_path)
else:
    raise ValueError(
        "Either extract features must be enabled or feature path must be provided."
    )

Features are either extracted from the dataset or loaded from disk, depending on the configuration.

3. Model Evaluation

with open(config.model_path, "rb") as f:
    model = pickle.load(f)

evaluate_full(
        csv=all_features_df,
        output_path=experiment_path,
        model=model,  # type: ignore
        training_threshold=config.classifier_threshold,
        num_predict_lesions=config.num_predict_lesions,
        plot_dict=config.plots_params.kwargs,
        exclude_num_first_columns=config.exclude_num_first_columns,
    )

The evaluate_model function performs single full evaluation and computes predictions and metrics on full dataset.
Model is loaded from disk as provided by the specified config path

This documentation provides a comprehensive guide to the evaluation procedure in the vify repository. For further details, refer to the source code in scripts/evaluate.py and the configuration file at config/eval_config.yaml.

Table of Contents​

Overview​

Configuration​

Evaluation Workflow​

Logging​

Output Structure​

How to Run the Evaluation Script​

Code Walkthrough​

1. Configuration Loading​

2. Feature Extraction​

3. Model Evaluation​

Table of Contents