Module evaluation
Function _compute_roc_worker
_compute_roc_worker(train_test_indices: list, seed: int, features: DataFrame, gt: DataFrame, patient_ids: list, model: LogisticRegression | RandomForestClassifier | Pipeline, calculate_feature_importances: bool = False) -> tuple
Worker function to compute ROC curve for a single cross-validation split.
Args: train_test_indices (list): List containing two arrays of indices for training and test sets in the format [train_indices, test_indices]. seed (int): Random seed for reproducibility when calculating feature importances. features (pd.Series): DataFrame containing feature values for all samples. gt (pd.DataFrame): DataFrame containing ground truth labels (0/1) for all samples. patient_ids (list): List of patient identifiers corresponding to each sample. model (Union[LogisticRegression, RandomForestClassifier, Pipeline]): Pre-trained sklearn model or pipeline to evaluate. Model will be cloned before fitting. calculate_feature_importances (bool, optional): If True, calculates permutation feature importance scores using the test set. Defaults to False.
Returns: tuple: Tuple containing:
- fpr (array): False positive rates for ROC curve
- tpr (array): True positive rates for ROC curve
- thresholds (array): Classification thresholds for ROC curve
- roc_auc (float): Area under ROC curve
- feature_importances (array or None): Mean permutation importance scores if calculated, None otherwise
- misclassified (list): List of patient IDs for misclassified samples
Function evaluate
evaluate(csv: DataFrame, output_path: str, image_type: Literal[combined, t2, adc, dwi, verdict, mismo_adc, Any], feature_extraction_method: str, model: LogisticRegression | RandomForestClassifier | Pipeline, model_type: str, features_used: list, num_cv_repeats: int, num_cv_folds: int, num_workers: int, exclude_num_first_columns: int, use_individual_lesions: bool, plot_dict: dict[str, Any], clearml_logger: UnionType[Logger, None] = None) -> dict
Evaluate a pretrained machine learning model on medical imaging data.
Args: csv (pd.DataFrame): DataFrame containing features, ground truth labels and patient metadata. Must include columns 'cancer', 'path', 'isup', and imaging features. output_path (str): Directory path where evaluation plots and results will be saved. image_type (Literal["combined", "t2", "adc", "dwi", "verdict", "mismo_adc"]): Type of MRI sequence used
- combined: All sequences combined
- t2: T2-weighted imaging
- adc: Apparent Diffusion Coefficient maps
- dwi: Diffusion Weighted Imaging
- verdict: VERDICT imaging
- mismo_adc: MISMO ADC maps use_feature (str): Name of the specific feature subset to use for evaluation. config (BaseConfig): Configuration object containing model parameters, training settings and evaluation options. model (Union[LogisticRegression, RandomForestClassifier, Pipeline]): Pre-trained sklearn model or pipeline to evaluate. Supported models are LogisticRegression and RandomForestClassifier. model_type (str): String identifier for the model type (e.g. "RandomForest", "LogisticRegression"). clearml_logger (ClearmlLogger | None): Optional ClearML logger for experiment tracking. Defaults to None.
Returns: dict: Dictionary containing evaluation metrics and results with keys:
- fprs: False positive rates
- tprs: True positive rates
- thresholds: Classification thresholds
- aucs: Area under ROC curves
- f_importances: Feature importance scores
Function evaluate_bootstrapped_model
evaluate_bootstrapped_model(features: DataFrame, patient_ids: list, gt: Series, model: LogisticRegression | RandomForestClassifier | Pipeline, num_repeats: int, random_state: int = 42, num_workers: int = 32, num_cv_folds: int = 3) -> tuple
Evaluate a machine learning model using bootstrapped cross-validation.
Performs repeated stratified k-fold cross-validation to generate multiple ROC curves and compute feature importance scores. Uses parallel processing for efficiency.
Args: features (pd.DataFrame): Feature matrix where rows are samples and columns are features. patient_ids (list): List of patient identifiers corresponding to each sample. gt (pd.Series): Ground truth labels (0/1) for each sample. model (Union[LogisticRegression, RandomForestClassifier, Pipeline]): Sklearn model or pipeline that implements fit() and predict_proba(). Will be cloned for each fold. num_repeats (int): Number of times to repeat the k-fold cross-validation. random_state (int, optional): Random seed for reproducibility. Defaults to 42. num_workers (int, optional): Number of parallel processes to use. Defaults to 32. num_folds (int, optional): Number of folds for cross-validation. Defaults to 3. calculate_feature_importances (bool, optional): If True, computes permutation feature importance scores on test sets. Can be computationally expensive. Defaults to False.
Returns: tuple: Contains:
- fprs (list): False positive rates for each ROC curve
- tprs (list): True positive rates for each ROC curve
- thresholds (list): Classification thresholds for each ROC curve
- aucs (list): Area under ROC curve scores
- feature_importances (list): Feature importance scores if calculated, empty list otherwise
- misclassified (list): Patient IDs of misclassified samples across all folds
Function evaluate_full
evaluate_full(csv: DataFrame, output_path: str, model: LogisticRegression | RandomForestClassifier | Pipeline, training_threshold: float, exclude_num_first_columns: int, num_predict_lesions: int, plot_dict: dict[str, Any]) -> None
Evaluates a trained model on the full dataset without cross-validation splitting.
Performs prediction on all samples and generates evaluation plots and metrics including ROC curves, scatter plots, and violin plots. Saves results to CSV files.
Args: csv (pd.DataFrame): DataFrame containing features, ground truth labels and metadata. Must include columns 'cancer', 'path', 'isup', 'pirads' and feature columns. output_path (str): Directory path where evaluation plots and results will be saved. Will create plots and CSV files in this location. config (BaseConfig): Configuration object containing model parameters and evaluation options like number of lesions to predict and plotting settings. model (Union[LogisticRegression, RandomForestClassifier, Pipeline]): Pre-trained sklearn model or pipeline that implements predict() and predict_proba(). training_threshold (float): Classification threshold determined during model training. Used to convert probabilities to binary predictions.
Function evaluate_hp
evaluate_hp(csv: DataFrame, model: LogisticRegression | RandomForestClassifier | Pipeline, use_individual_lesions: bool, features_used: list, num_hp_cv_repeats: int, num_cv_folds: int, num_workers: int, exclude_num_first_columns: int) -> dict
Evaluate model performance within hyperparameter tuning without generating plots.
Performs repeated k-fold cross validation on the provided data and returns averaged performance metrics. Uses the same feature preprocessing as the main evaluation function but skips visualization steps and only determines single metric.
Args: csv (pd.DataFrame): DataFrame containing features, ground truth labels and patient metadata. Must include columns 'cancer', 'path', and feature columns. config (BaseConfig): Configuration object containing evaluation parameters including:
- num_hp_cv_repeats: Number of cross-validation repeats
- num_workers: Number of parallel processes
- features_used: List of feature prefixes to include
- use_individual_lesions: Whether to use per-lesion features model (Union[LogisticRegression, RandomForestClassifier, Pipeline]): Sklearn model or pipeline to evaluate. Must implement fit() and predict_proba().
Returns: dict: Dictionary containing evaluation metrics:
- aucs: List of AUC scores from each cross-validation fold
Function predict
predict(model: UnionType[RandomForestClassifier, LogisticRegression, Pipeline], features: DataFrame, num_predict_lesions: int) -> tuple
Determines sample probabilities using the given model. Args: model (RandomForestClassifier | LogisticRegression | Pipeline): Trained model to use for prediction features (pd.DataFrame): Features to predict on num_predict_lesions (int): Number of lesions to predict per patient. If 1, uses single lesion prediction, otherwise predicts on multiple lesions and takes maximum probability.
Returns: tuple: Tuple containing:
- y_proba (pd.Series): Prediction probabilities
- features (pd.DataFrame): Features used for prediction
- y_probas (pd.DataFrame): Per-lesion probabilities if multiple lesions, empty DataFrame if single lesion