Module `evaluation`

Function `_compute_roc_worker`

_compute_roc_worker(train_test_indices: list, seed: int, features: DataFrame, gt: DataFrame, patient_ids: list, model: LogisticRegression | RandomForestClassifier | Pipeline, calculate_feature_importances: bool = False) -> tuple

Worker function to compute ROC curve for a single cross-validation split.

Args: train_test_indices (list): List containing two arrays of indices for training and test sets in the format [train_indices, test_indices]. seed (int): Random seed for reproducibility when calculating feature importances. features (pd.Series): DataFrame containing feature values for all samples. gt (pd.DataFrame): DataFrame containing ground truth labels (0/1) for all samples. patient_ids (list): List of patient identifiers corresponding to each sample. model (Union[LogisticRegression, RandomForestClassifier, Pipeline]): Pre-trained sklearn model or pipeline to evaluate. Model will be cloned before fitting. calculate_feature_importances (bool, optional): If True, calculates permutation feature importance scores using the test set. Defaults to False.

Returns: tuple: Tuple containing:

fpr (array): False positive rates for ROC curve
tpr (array): True positive rates for ROC curve
thresholds (array): Classification thresholds for ROC curve
roc_auc (float): Area under ROC curve
feature_importances (array or None): Mean permutation importance scores if calculated, None otherwise
misclassified (list): List of patient IDs for misclassified samples

Function `evaluate`

View Source

evaluate(csv: DataFrame, output_path: str, image_type: Literal[combined, t2, adc, dwi, verdict, mismo_adc, Any], feature_extraction_method: str, model: LogisticRegression | RandomForestClassifier | Pipeline, model_type: str, features_used: list, num_cv_repeats: int, num_cv_folds: int, num_workers: int, exclude_num_first_columns: int, use_individual_lesions: bool, plot_dict: dict[str, Any], clearml_logger: UnionType[Logger, None] = None) -> dict

Evaluate a pretrained machine learning model on medical imaging data.

Args: csv (pd.DataFrame): DataFrame containing features, ground truth labels and patient metadata. Must include columns 'cancer', 'path', 'isup', and imaging features. output_path (str): Directory path where evaluation plots and results will be saved. image_type (Literal["combined", "t2", "adc", "dwi", "verdict", "mismo_adc"]): Type of MRI sequence used

combined: All sequences combined
t2: T2-weighted imaging
adc: Apparent Diffusion Coefficient maps
dwi: Diffusion Weighted Imaging
verdict: VERDICT imaging
mismo_adc: MISMO ADC maps use_feature (str): Name of the specific feature subset to use for evaluation. config (BaseConfig): Configuration object containing model parameters, training settings and evaluation options. model (Union[LogisticRegression, RandomForestClassifier, Pipeline]): Pre-trained sklearn model or pipeline to evaluate. Supported models are LogisticRegression and RandomForestClassifier. model_type (str): String identifier for the model type (e.g. "RandomForest", "LogisticRegression"). clearml_logger (ClearmlLogger | None): Optional ClearML logger for experiment tracking. Defaults to None.

Returns: dict: Dictionary containing evaluation metrics and results with keys:

fprs: False positive rates
tprs: True positive rates
thresholds: Classification thresholds
aucs: Area under ROC curves
f_importances: Feature importance scores

Function `evaluate_bootstrapped_model`

View Source

evaluate_bootstrapped_model(features: DataFrame, patient_ids: list, gt: Series, model: LogisticRegression | RandomForestClassifier | Pipeline, num_repeats: int, random_state: int = 42, num_workers: int = 32, num_cv_folds: int = 3) -> tuple

Evaluate a machine learning model using bootstrapped cross-validation.

Performs repeated stratified k-fold cross-validation to generate multiple ROC curves and compute feature importance scores. Uses parallel processing for efficiency.

Args: features (pd.DataFrame): Feature matrix where rows are samples and columns are features. patient_ids (list): List of patient identifiers corresponding to each sample. gt (pd.Series): Ground truth labels (0/1) for each sample. model (Union[LogisticRegression, RandomForestClassifier, Pipeline]): Sklearn model or pipeline that implements fit() and predict_proba(). Will be cloned for each fold. num_repeats (int): Number of times to repeat the k-fold cross-validation. random_state (int, optional): Random seed for reproducibility. Defaults to 42. num_workers (int, optional): Number of parallel processes to use. Defaults to 32. num_folds (int, optional): Number of folds for cross-validation. Defaults to 3. calculate_feature_importances (bool, optional): If True, computes permutation feature importance scores on test sets. Can be computationally expensive. Defaults to False.

Returns: tuple: Contains:

fprs (list): False positive rates for each ROC curve
tprs (list): True positive rates for each ROC curve
thresholds (list): Classification thresholds for each ROC curve
aucs (list): Area under ROC curve scores
feature_importances (list): Feature importance scores if calculated, empty list otherwise
misclassified (list): Patient IDs of misclassified samples across all folds

Function `evaluate_full`

View Source

evaluate_full(csv: DataFrame, output_path: str, model: LogisticRegression | RandomForestClassifier | Pipeline, training_threshold: float, exclude_num_first_columns: int, num_predict_lesions: int, plot_dict: dict[str, Any]) -> None

Evaluates a trained model on the full dataset without cross-validation splitting.

Performs prediction on all samples and generates evaluation plots and metrics including ROC curves, scatter plots, and violin plots. Saves results to CSV files.

Args: csv (pd.DataFrame): DataFrame containing features, ground truth labels and metadata. Must include columns 'cancer', 'path', 'isup', 'pirads' and feature columns. output_path (str): Directory path where evaluation plots and results will be saved. Will create plots and CSV files in this location. config (BaseConfig): Configuration object containing model parameters and evaluation options like number of lesions to predict and plotting settings. model (Union[LogisticRegression, RandomForestClassifier, Pipeline]): Pre-trained sklearn model or pipeline that implements predict() and predict_proba(). training_threshold (float): Classification threshold determined during model training. Used to convert probabilities to binary predictions.

Function `evaluate_hp`

View Source

evaluate_hp(csv: DataFrame, model: LogisticRegression | RandomForestClassifier | Pipeline, use_individual_lesions: bool, features_used: list, num_hp_cv_repeats: int, num_cv_folds: int, num_workers: int, exclude_num_first_columns: int) -> dict

Evaluate model performance within hyperparameter tuning without generating plots.

Performs repeated k-fold cross validation on the provided data and returns averaged performance metrics. Uses the same feature preprocessing as the main evaluation function but skips visualization steps and only determines single metric.

Args: csv (pd.DataFrame): DataFrame containing features, ground truth labels and patient metadata. Must include columns 'cancer', 'path', and feature columns. config (BaseConfig): Configuration object containing evaluation parameters including:

num_hp_cv_repeats: Number of cross-validation repeats
num_workers: Number of parallel processes
features_used: List of feature prefixes to include
use_individual_lesions: Whether to use per-lesion features model (Union[LogisticRegression, RandomForestClassifier, Pipeline]): Sklearn model or pipeline to evaluate. Must implement fit() and predict_proba().

Returns: dict: Dictionary containing evaluation metrics:

aucs: List of AUC scores from each cross-validation fold

Function `predict`

View Source

predict(model: UnionType[RandomForestClassifier, LogisticRegression, Pipeline], features: DataFrame, num_predict_lesions: int) -> tuple

Determines sample probabilities using the given model. Args: model (RandomForestClassifier | LogisticRegression | Pipeline): Trained model to use for prediction features (pd.DataFrame): Features to predict on num_predict_lesions (int): Number of lesions to predict per patient. If 1, uses single lesion prediction, otherwise predicts on multiple lesions and takes maximum probability.

Returns: tuple: Tuple containing:

y_proba (pd.Series): Prediction probabilities
features (pd.DataFrame): Features used for prediction
y_probas (pd.DataFrame): Per-lesion probabilities if multiple lesions, empty DataFrame if single lesion

Function _compute_roc_worker​

Function evaluate​

Function evaluate_bootstrapped_model​

Function evaluate_full​

Function evaluate_hp​

Function predict​

Function `_compute_roc_worker`

Function `evaluate`

Function `evaluate_bootstrapped_model`

Function `evaluate_full`

Function `evaluate_hp`

Function `predict`