Skip to main content

Module feature_extraction

Function extract_features_from_studies

View Source

extract_features_from_studies(studies_path: str, patient_csv: UnionType[str, None], output_path: str, segmentation_method: QueryObject, queries: QueryObject, image_type: Literal[t2, adc, dwi, verdict, mismo_adc], num_workers: int = 32, feature_extraction_methods: list[Literal[mirp, simple]] = ['simple']) -> None

Extract radiomic and/or simple features from a set of MRI studies.

Args: studies_path (str): Root directory path containing all the studies to be processed. patient_csv (str | None): Path to CSV file containing patient clinical data. If None, uses study metadata. output_path (str): Directory path where extracted feature CSVs will be saved. segmentation_method (QueryObject): Query object specifying the segmentation method to use. queries (QueryObject): List of query strings to filter and extract specific feature sets. image_type (Literal["t2", "adc", "dwi", "verdict", "mismo_adc"]): Type of MRI sequence to analyze.

  • t2: T2-weighted imaging
  • adc: Apparent Diffusion Coefficient
  • dwi: Diffusion Weighted Imaging
  • verdict: VERDICT model parameters
  • mismo_adc: MISMO-derived ADC num_workers (int, optional): Number of parallel processes for feature extraction. Defaults to 32. use_features_list (list[Literal["mirp", "simple"]], optional): List of feature extraction methods to use.
  • mirp: PyRadiomics-based comprehensive feature set
  • simple: Basic intensity and morphological features Defaults to ["simple"].

Returns: None: Saves feature CSV files to output_path with format: __features.csv

Function extract_mirp_features

View Source

extract_mirp_features(img_vol: Volume, mask_vol: Volume, image_type: str, drop_meta: bool = True, max_lesions: int = 3, mask_select_largest_region: bool = False) -> DataFrame

Extracts the features from the input volume and mask. TODO: Maybe remove (no tests) Args: img_vol (Volume): The input volume. mask_vol (Volume): The mask volume. img_type (str): The image type. drop_meta (bool, optional): Whether to drop the meta data columns. Defaults to True. max_lesions (int, optional): The maximum number of lesions. Defaults to 4. mask_select_largest_region (bool, optional): Whether to select the largest region in the mask. Defaults to False.

Returns: pd.DataFrame: The extracted features.

Function extract_simple_features

View Source

extract_simple_features(img_vol: Volume, mask_vol: Volume, softmax_vol: Volume, anatomical_vol: Volume, image_type: str, num_lesions: int) -> DataFrame

Extracts a focused set of simple features from MRI volumes on a per-lesion basis.

Computes intensity statistics, morphological metrics, and location-based features for each detected lesion in order of decreasing size. TODO: Add fix to handel no lesions present in the image. Args: img_vol (Volume): Input MRI volume containing image intensities mask_vol (Volume): Binary lesion mask volume (0=background, 1=lesion) softmax_vol (Volume): Probability map volume from segmentation model anatomical_vol (Volume): Labeled anatomical segmentation volume image_type (str): Type of MRI sequence ('t2', 'adc', 'dwi', etc.) num_lesions (int, optional): Maximum number of largest lesions to analyze. Defaults to 2. Features for non-existent lesions (if fewer are present) will be set to NaN.

Returns: pd.DataFrame: DataFrame containing extracted features with columns:

  • Global metrics across all lesions
  • Per-lesion features (intensity stats, size, location)
  • Anatomical zone distributions
  • Model confidence scores

Function generate_feature_df

View Source

generate_feature_df(study_path: str, data_dict: dict, mask_dict: dict, clinical_dict: dict, psa_dict: dict, num_workers: int = 32, use_feature: Literal[mirp, simple] = 'simple', num_lesions: int = 2) -> DataFrame

Generate a feature dataframe combining patient data, clinical metrics, and radiomic features.

Args: study_path (str): Path to the MRI study directory containing image data data_dict (dict): Dictionary mapping query names to image volumes (Volume objects) mask_dict (dict): Dictionary containing lesion masks ('les_mask'), probability maps ('les_softmax'), and anatomical segmentations ('anatomy') clinical_dict (dict): Dictionary containing patient clinical data including:

  • cancer (bool): Cancer status
  • isup (int): ISUP grade
  • pirads (int): PI-RADS score
  • psa (float): PSA level
  • age (int): Patient age psa_dict (dict): Dictionary containing PSA density metrics:
  • psad (float): Overall PSA density
  • psad_pz (float): PSA density in peripheral zone
  • psad_cg (float): PSA density in central gland num_workers (int, optional): Number of parallel processes for feature extraction. Defaults to 32. use_feature (Literal["mirp", "simple"], optional): Feature extraction method to use:
  • mirp: PyRadiomics-based comprehensive feature set
  • simple: Basic intensity and morphological features Defaults to "simple".

Returns: pd.DataFrame: Combined dataframe containing:

  • Patient/clinical information
  • PSA density metrics
  • Extracted radiomic features per lesion
  • Global image features