Prediction Procedure in VIFY
The prediction procedure in the vify repository is implemented in the script scripts/predict_with_studies.py. This script is responsible for running predictions on studies using a pretrained model. It processes multiple studies, extracts features, and generates predictions in a structured format. Below is a detailed explanation of the prediction process, its components, and how to use it effectively.
Table of Contents
Overview
The prediction script is designed to:
- Load configurations using Hydra and OmegaConf.
- Prepare the prediction environment, including logging and output directories.
- Process a list of studies and extract features.
- Run predictions for each study using a pretrained model.
- Save the predictions in a structured format for further analysis.
Configuration
The prediction script relies on a configuration file located at config/predict_with_studies_config.yaml. This file defines all the parameters required for running predictions.
Prediction Workflow
The prediction process follows these steps:
-
Load Configuration:
- The configuration is loaded using Hydra and converted into a Python object for easier manipulation.
-
Prepare Output Directory:
- A unique output directory is created based on the current timestamp, project name, and task name.
-
Set Up Logging:
- Logging is configured to track the progress of the prediction process.
-
Retrieve Study List:
- The list of studies to be processed is retrieved from the specified directory and metadata file.
-
Process Studies and Run Predictions:
- Each study is processed, and predictions are generated using the pretrained model.
-
Save Results:
- The predictions are saved in a structured format, including handling cases where studies fail to process.
Logging
The script uses standard Python logging to track the progress of the prediction process. Logs include information about the studies being processed, errors encountered, and the location of saved results.
Output Structure
The outputs of the prediction process are saved in:
<output_base_path>/<project_name>/<task_name>/<timestamp>/
How to Run the Prediction Script
To run the prediction script, use the following command:
python scripts/predict_with_studies.py
Code Walkthrough
Below is a detailed explanation of the key components in scripts/predict_with_studies.py:
1. Configuration Loading
@hydra.main(
version_base=None,
config_path="../config/",
config_name="predict_with_studies_config",
)
def run_study_predictions(omega_config: OmegaConf) -> None:
config = cast(VifyPipelineConfig, OmegaConf.to_object(omega_config))
- The
@hydra.maindecorator loads the configuration file. - The configuration is converted into a Python object for easier manipulation.
2. Output Directory Creation
time = datetime.now().strftime("%Y%m%d_%H%M%S")
experiment_path = os.path.join(
config.output_base_path, config.project_name, config.task_name, time
)
os.makedirs(experiment_path, exist_ok=True)
- A unique output directory is created based on the current timestamp, project name, and task name.
3. Logging Setup
setup_python_logger(config.logging_level)
logger = logging.getLogger(__name__)
logger.info(f"Running on experiment path ")
- Logging is configured to track the progress of the prediction process.
4. Study List Retrieval
studies = get_study_list(config.studies_path, config.patient_csv)
- The list of studies to be processed is retrieved from the specified directory and metadata file.
5. Study Processing and Prediction
def run_single_study(study_path: str, config: VifyPipelineConfig) -> None:
"""Main function to run the pipeline for debugging purposes."""
# Create config object
setup_python_logger(config.logging_level)
# Load study
study: Study = Study.from_path(study_path)
# Create data model and pipeline step
batches = VifyPipelineDataModel.from_study(study, config)
classify = VifyPipelineStep(config=config)
# Run inference
for batch in batches:
output = classify.infer(batch)
return output.misc["virdx_score"] # type: ignore
- Each study is processed, and predictions are generated using the
run_single_studyfunction. - Errors encountered during processing are logged, and the corresponding study is marked as
None.
6. Results Saving
max_num_lesions = max([len(v) for v in results.values() if v is not None])
for s, v in results.items():
if v is not None:
results[s] = v + [np.nan] * (max_num_lesions - len(v))
else:
results[s] = [np.nan] * max_num_lesions
preds = np.array([v for v in results.values()])
studies = list(results.keys())
results_df = pd.DataFrame(
preds,
index=studies,
columns=[f"Lesion_" for i in range(max_num_lesions)],
)
results_df.to_csv(os.path.join(experiment_path, "predictions.csv"))
- Predictions are saved in a structured format, ensuring consistency across studies with varying numbers of lesions.
This documentation provides a comprehensive guide to the prediction procedure in the vify repository. For further details, refer to the source code in scripts/predict_with_studies.py and the configuration file at config/predict_with_studies_config.yaml.