Review and Improve Predictions

This is the step that makes a function actually get better over time. Reviewing predictions converts each one into a training sample. As samples accumulate, Nyckel trains a private model on your data in the background and swaps it in automatically when it outperforms the current model.

You don’t trigger any of this. You just review.

Open the review queue

In the console, go to your function’s Review tab (sometimes labeled Test for functions that have not yet been invoked from production). Every prediction made by the endpoint — from the console, from your application, from anywhere — appears here.

For each prediction you’ll see:

The input
The predicted label and confidence
Buttons to confirm (the prediction is right) or correct (change the label to the right one)

Confirm or correct

Click through the queue. For each prediction:

If the predicted label is correct, confirm it. The prediction becomes a sample with the predicted label.
If the predicted label is wrong, correct it to the right label. The prediction becomes a sample with the corrected label.

Corrections are a much stronger training signal than confirms, because they tell the model exactly where its current decision boundary is wrong. If you only have time to review some predictions, prioritize the ones with low confidence — they’re the ones the model is most likely to have gotten wrong.

What happens behind the scenes

Once you have 2 confirmed examples per label, Nyckel automatically begins building a private model trained on your data. You’ll see a training indicator in the console. The zero-shot classifier continues serving predictions while the private model trains in the background — the endpoint stays live the whole time.

When the private model is ready, Nyckel benchmarks it against the zero-shot baseline on held-out samples. If it performs better, the endpoint automatically switches to it. You don’t trigger this manually.

The same thing happens every time you’ve added enough new samples to potentially improve the current model. Better models replace worse ones; worse models never ship. There’s no manual retraining and no deploy step.

How much to review, and how often

A few practical guidelines:

Aim for 5–10 confirmed samples per label to get a first private model with meaningfully better accuracy than the zero-shot baseline.
Review continuously, not in batches. A steady drip of 10–20 reviewed predictions per day usually beats one big labeling sprint, because the model improves between batches and you see the effect.
Prioritize low confidence and corrections. These are the predictions that, once labeled, move the model the most.

Bulk-add samples (optional)

If you already have a CSV or JSONL of labeled data, you can skip ahead. Open the Samples tab and click Import. Each row should contain an input and the correct label. This is the fastest way to bootstrap if you’ve already done a labeling project.

Annotate from the API

You don’t have to review in the console — you can also send annotations directly from your application. If your app surfaces user feedback (“this is wrong” buttons, support agent corrections, etc.), pipe those into the annotation API:

POST https://www.nyckel.com/v1/functions/{functionId}/annotations
Authorization: Bearer {your_access_token}
Content-Type: application/json

{
  "sampleId": "smp_abc123",
  "annotation": { "labelName": "Refund request" }
}

The deeper guide on this is Build a Feedback Loop under Developer Platform.

What you just did

Step	What happened
Confirmed and corrected predictions	Each one became a labeled training sample
Hit 2 samples per label	Nyckel began training a private model in the background
Reviewed more	The private model benchmarked better and replaced the zero-shot model on the endpoint

Next Steps — what to do once your function is live in production.