Tuning Confidence Thresholds

This page covers the practical decisions: which threshold to start with, how to encode it in your application, and when to retune. For what a confidence score is and how to read the numbers, see Confidence scores in Concepts.

Starting thresholds by use case

The right threshold depends on the cost of a wrong prediction. Start conservative — you can always loosen later as you learn the model’s behavior in production.

Use case Cost of a wrong prediction Suggested auto-accept threshold
Medical triage, legal review, fraud action Very high ≥ 0.95, route everything else
Content moderation, financial routing High ≥ 0.90
Support ticket routing, document classification Medium ≥ 0.85
Product tagging, recommendation filtering Low ≥ 0.75
Social-media surfacing, exploratory analytics Very low ≥ 0.60 or no threshold

Three-zone routing in code

The standard pattern uses two thresholds to create three routing zones: auto-accept, human review, and reject.

result = invoke(input_data)

if result["confidence"] >= 0.90:
    take_automated_action(result["labelName"])
elif result["confidence"] >= 0.60:
    queue_for_human_review(result)
else:
    treat_as_uncertain(result)

The exact numbers are your call. The pattern of three zones with two thresholds is what generalizes — it gives you an auto-accept path for throughput, a review queue that generates training data, and a reject path for inputs that don’t fit any of your defined labels.

When to retune

Threshold settings are not “set and forget.” Revisit them when:

Look at the distribution, not just averages

The Nyckel console shows the confidence distribution across recent predictions. Use it to set thresholds based on your actual traffic rather than guessing:

TipWhen reviewing predictions, prioritize the ones closest to your threshold boundary. These are the inputs where a correction has the highest impact on future model behavior.