Skip to main content

Module model_factories

Function get_logistic_regression

View Source

get_logistic_regression(params: dict = , random_state: int = 42) -> Pipeline

Returns a Logistic Regression model with preprocessing pipeline for handling missing values and other feature transforms.

Args: params (dict, optional): Parameters for logistic regression configuration including:

  • impute_strategy: Strategy for imputing missing values ('mean', 'median', 'most_frequent', 'constant')
  • transforms: List of feature transformations including sigmoid and nan replacement options
  • Standard LogisticRegression parameters (penalty, C, class_weight, etc.) random_state (int, optional): Seed for random number generation to ensure reproducibility. Defaults to 42.

Returns: Pipeline: Scikit-learn pipeline containing:

  1. Optional custom transformations (sigmoid, nan replacement)
  2. SimpleImputer for handling missing values
  3. StandardScaler for feature scaling
  4. LogisticRegression model

Function get_model

View Source

get_model(model_type: str, params: dict, random_state: int = 42) -> RandomForestClassifier | Pipeline

Returns a model object based on provided parameters. Further models must implement .fit() and .predict_proba() methods for full compatibility. Models are required to handle missing values (nan) automatically through preprocessing or internally.

Args: model_type (str): Type of model to initialize. Currently supports "RandomForest" and "LogisticRegression". params (dict): Model-specific parameters passed to the constructor. For LogisticRegression, additional parameters like 'impute_strategy' and 'transforms' can be specified. random_state (int, optional): Random seed for reproducibility. Defaults to 42.

Raises: NotImplementedError: If the requested model_type is not implemented in the factory.

Returns: Union[RandomForestClassifier,Pipeline]: Initialized model instance.

  • RandomForestClassifier for "RandomForest" type
  • Pipeline (with preprocessing) for "LogisticRegression" type

Function get_random_forest

View Source

get_random_forest(params: dict, random_state: int = 42) -> RandomForestClassifier

Initializes a Random Forest classifier with the provided parameters. The model automatically handles missing values (nan) through its internal processing.

Args: params (dict): Dictionary of parameters passed to RandomForestClassifier. Common parameters include: n_estimators, max_depth, min_samples_split, min_samples_leaf, max_features, class_weight. random_state (int, optional): Seed for random number generation, ensures reproducibility. Defaults to 42.

Returns: RandomForestClassifier: Initialized random forest classifier instance ready for training with .fit() method.