• Empleos
  • Sobre nosotros
  • profesionales
    • Inicio
    • Empleos
    • Cursos y retos
  • empresas
    • Inicio
    • Publicar vacante
    • Nuestro proceso
    • Precios
    • Evaluaciones
    • Nómina
    • Blog
    • Comercial
    • Calculadora de salario

0

475
Vistas
How to penalize False Negatives more than False Positives

From the business perspective, false negatives lead to about tenfold higher costs (real money) than false positives. Given my standard binary classification models (logit, random forest, etc.), how can I incorporate this into my model?

Do I have to change (weight) the loss function in favor of the 'preferred' error (FP) ? If so, how to do that?

about 3 years ago · Santiago Trujillo
2 Respuestas
Responde la pregunta

0

There are several options for you:

  • As suggested in the comments, class_weight should boost the loss function towards the preferred class. This option is supported by various estimators, including sklearn.linear_model.LogisticRegression, sklearn.svm.SVC, sklearn.ensemble.RandomForestClassifier, and others. Note there's no theoretical limit to the weight ratio, so even if 1 to 100 isn't strong enough for you, you can go on with 1 to 500, etc.

  • You can also select the decision threshold very low during the cross-validation to pick the model that gives highest recall (though possibly low precision). The recall close to 1.0 effectively means false_negatives close to 0.0, which is what to want. For that, use sklearn.model_selection.cross_val_predict and sklearn.metrics.precision_recall_curve functions:

    y_scores = cross_val_predict(classifier, x_train, y_train, cv=3,
                                 method="decision_function")
    
    precisions, recalls, thresholds = precision_recall_curve(y_train, y_scores)
    

    If you plot the precisions and recalls against the thresholds, you should see the picture like this:

    precision-recall-tradeoff

    After picking the best threshold, you can use the raw scores from classifier.decision_function() method for your final classification.

Finally, try not to over-optimize your classifier, because you can easily end up with a trivial const classifier (which is obviously never wrong, but is useless).

about 3 years ago · Santiago Trujillo Denunciar

0

As @Maxim mentioned, there are 2 stages to make this kind of tuning: in the model training stage (like custom weights) and the prediction stage (like lowering the decision threshold).

Another tuning for the model-training stage is using a recall scorer. you can use it in your grid-search cross-validation (GridSearchCV) for tuning your classifier with the best hyper-param towards high recall.

GridSearchCV scoring parameter can either accepts the 'recall' string or the function recall_score.

Since you're using a binary classification, both options should work out of the box, and call recall_score with its default values that suits a binary classification:

  • average: 'binary' (i.e. one simple recall value)
  • pos_label: 1 (like numpy's True value)

Should you need to custom it, you can wrap an existing scorer, or a custom one, with make_scorer, and pass it to the scoring parameter.

For example:

from sklearn.metrics import recall_score, make_scorer

recall_custom_scorer = make_scorer(
    lambda y, y_pred, **kwargs: recall_score(y, y_pred, pos_label='yes')[1]
)

GridSearchCV(estimator=est, param_grid=param_grid, scoring=recall_custom_scorer, ...)
about 3 years ago · Santiago Trujillo Denunciar
Responde la pregunta
Encuentra empleos remotos

¡Descubre la nueva forma de encontrar empleo!

Top de empleos
Top categorías de empleo
Empresas
Publicar vacante Precios Nuestro proceso Comercial
Legal
Términos y condiciones Política de privacidad
© 2025 PeakU Inc. All Rights Reserved.

Andres GPT

Recomiéndame algunas ofertas
Necesito ayuda