Prediction Procedure in VIFY

The prediction procedure in the vify repository is implemented in the script scripts/predict_with_studies.py. This script is responsible for running predictions on studies using a pretrained model. It processes multiple studies, extracts features, and generates predictions in a structured format. Below is a detailed explanation of the prediction process, its components, and how to use it effectively.

Prediction Procedure in VIFY

Overview

The prediction script is designed to:

Load configurations using Hydra and OmegaConf.
Prepare the prediction environment, including logging and output directories.
Process a list of studies and extract features.
Run predictions for each study using a pretrained model.
Save the predictions in a structured format for further analysis.

Configuration

The prediction script relies on a configuration file located at config/predict_with_studies_config.yaml. This file defines all the parameters required for running predictions.

Prediction Workflow

The prediction process follows these steps:

Load Configuration:
- The configuration is loaded using Hydra and converted into a Python object for easier manipulation.
Prepare Output Directory:
- A unique output directory is created based on the current timestamp, project name, and task name.
Set Up Logging:
- Logging is configured to track the progress of the prediction process.
Retrieve Study List:
- The list of studies to be processed is retrieved from the specified directory and metadata file.
Process Studies and Run Predictions:
- Each study is processed, and predictions are generated using the pretrained model.
Save Results:
- The predictions are saved in a structured format, including handling cases where studies fail to process.

Logging

The script uses standard Python logging to track the progress of the prediction process. Logs include information about the studies being processed, errors encountered, and the location of saved results.

Output Structure

The outputs of the prediction process are saved in:

<output_base_path>/<project_name>/<task_name>/<timestamp>/

How to Run the Prediction Script

To run the prediction script, use the following command:

python scripts/predict_with_studies.py

Code Walkthrough

Below is a detailed explanation of the key components in scripts/predict_with_studies.py:

1. Configuration Loading

@hydra.main(
    version_base=None,
    config_path="../config/",
    config_name="predict_with_studies_config",
)
def run_study_predictions(omega_config: OmegaConf) -> None:
    config = cast(VifyPipelineConfig, OmegaConf.to_object(omega_config))

The @hydra.main decorator loads the configuration file.
The configuration is converted into a Python object for easier manipulation.

2. Output Directory Creation

time = datetime.now().strftime("%Y%m%d_%H%M%S")
experiment_path = os.path.join(
    config.output_base_path, config.project_name, config.task_name, time
)
os.makedirs(experiment_path, exist_ok=True)

A unique output directory is created based on the current timestamp, project name, and task name.

3. Logging Setup

setup_python_logger(config.logging_level)
logger = logging.getLogger(__name__)
logger.info(f"Running on experiment path ")

Logging is configured to track the progress of the prediction process.

4. Study List Retrieval

studies = get_study_list(config.studies_path, config.patient_csv)

The list of studies to be processed is retrieved from the specified directory and metadata file.

5. Study Processing and Prediction

def run_single_study(study_path: str, config: VifyPipelineConfig) -> None:
    """Main function to run the pipeline for debugging purposes."""
    # Create config object
    setup_python_logger(config.logging_level)

    # Load study
    study: Study = Study.from_path(study_path)

    # Create data model and pipeline step
    batches = VifyPipelineDataModel.from_study(study, config)
    classify = VifyPipelineStep(config=config)

    # Run inference
    for batch in batches:
        output = classify.infer(batch)
    return output.misc["virdx_score"]  # type: ignore

Each study is processed, and predictions are generated using the run_single_study function.
Errors encountered during processing are logged, and the corresponding study is marked as None.

6. Results Saving

max_num_lesions = max([len(v) for v in results.values() if v is not None])
for s, v in results.items():
    if v is not None:
        results[s] = v + [np.nan] * (max_num_lesions - len(v))
    else:
        results[s] = [np.nan] * max_num_lesions
preds = np.array([v for v in results.values()])
studies = list(results.keys())
results_df = pd.DataFrame(
    preds,
    index=studies,
    columns=[f"Lesion_" for i in range(max_num_lesions)],
)
results_df.to_csv(os.path.join(experiment_path, "predictions.csv"))

Predictions are saved in a structured format, ensuring consistency across studies with varying numbers of lesions.

This documentation provides a comprehensive guide to the prediction procedure in the vify repository. For further details, refer to the source code in scripts/predict_with_studies.py and the configuration file at config/predict_with_studies_config.yaml.

Table of Contents​

Overview​

Configuration​

Prediction Workflow​

Logging​

Output Structure​

How to Run the Prediction Script​

Code Walkthrough​

1. Configuration Loading​

2. Output Directory Creation​

3. Logging Setup​

4. Study List Retrieval​

5. Study Processing and Prediction​

6. Results Saving​

Table of Contents