Module model_factories
Function get_logistic_regression
get_logistic_regression(params: dict = , random_state: int = 42) -> Pipeline
Returns a Logistic Regression model with preprocessing pipeline for handling missing values and other feature transforms.
Args: params (dict, optional): Parameters for logistic regression configuration including:
- impute_strategy: Strategy for imputing missing values ('mean', 'median', 'most_frequent', 'constant')
- transforms: List of feature transformations including sigmoid and nan replacement options
- Standard LogisticRegression parameters (penalty, C, class_weight, etc.) random_state (int, optional): Seed for random number generation to ensure reproducibility. Defaults to 42.
Returns: Pipeline: Scikit-learn pipeline containing:
- Optional custom transformations (sigmoid, nan replacement)
- SimpleImputer for handling missing values
- StandardScaler for feature scaling
- LogisticRegression model
Function get_model
get_model(model_type: str, params: dict, random_state: int = 42) -> RandomForestClassifier | Pipeline
Returns a model object based on provided parameters. Further models must implement .fit() and .predict_proba() methods for full compatibility. Models are required to handle missing values (nan) automatically through preprocessing or internally.
Args: model_type (str): Type of model to initialize. Currently supports "RandomForest" and "LogisticRegression". params (dict): Model-specific parameters passed to the constructor. For LogisticRegression, additional parameters like 'impute_strategy' and 'transforms' can be specified. random_state (int, optional): Random seed for reproducibility. Defaults to 42.
Raises: NotImplementedError: If the requested model_type is not implemented in the factory.
Returns: Union[RandomForestClassifier,Pipeline]: Initialized model instance.
- RandomForestClassifier for "RandomForest" type
- Pipeline (with preprocessing) for "LogisticRegression" type
Function get_random_forest
get_random_forest(params: dict, random_state: int = 42) -> RandomForestClassifier
Initializes a Random Forest classifier with the provided parameters. The model automatically handles missing values (nan) through its internal processing.
Args: params (dict): Dictionary of parameters passed to RandomForestClassifier. Common parameters include: n_estimators, max_depth, min_samples_split, min_samples_leaf, max_features, class_weight. random_state (int, optional): Seed for random number generation, ensures reproducibility. Defaults to 42.
Returns: RandomForestClassifier: Initialized random forest classifier instance ready for training with .fit() method.