How Functions Work

A Nyckel function has five parts that all talk to each other. They’re introduced below in the order you’ll actually encounter them — the endpoint is live within seconds, the review queue fills up as soon as you invoke, accuracy becomes meaningful once reviews accumulate, the samples store grows from those reviews, and the model pipeline is the long-run payoff.

The five parts

1. The endpoint

The endpoint is a REST URL that takes an input and returns a prediction. It’s live from the moment you create the function — you don’t wait for training, you don’t deploy anything. For classification functions the endpoint can return useful results before you’ve added a single labeled example.

2. The review queue

Every prediction the endpoint makes is captured in the review queue. You can open it in the console and confirm correct predictions or correct wrong ones. Each action does two things at once: it scores the function (was that prediction right?) and creates a new training sample. The queue is how production usage feeds back into both testing and training.

3. Testing and accuracy metrics

As your reviews accumulate, Nyckel scores the function against them and surfaces accuracy directly in the console — overall and per label. You don’t have to build a separate eval set or wire up a benchmarking script. The number you see is the function’s real-world performance on the predictions it has actually been asked to make, and it updates as you review more.

4. The samples store

Samples are labeled examples — an input paired with the correct output. They are the source of truth for what the function should learn, and most of them come from reviews and corrections. You can also seed the store directly:

Uploading samples in bulk (CSV or JSONL)
Adding them one by one in the console
Confirming or correcting predictions in the review queue (every annotation becomes a new sample)

5. The model pipeline

The model pipeline is the part you never have to think about. Nyckel watches your sample count, retrains when it has enough new data to potentially improve, benchmarks the new model against the current one on held-out data, and swaps the endpoint over to the new model only if it’s actually better. You don’t trigger any of this — it just happens.

The loop

1Application invokes endpoint
2Endpoint returns prediction + confidence
3Application acts
4Reviewer confirms or corrects
5Accuracy updates + new sample is added
6Nyckel retrains and benchmarks
7Better model is swapped in

This loop is how classification functions work today, across every input type — text, images, and structured data. What differs from one classification function to the next is the input format, the set of labels, and how often you review.

What makes this useful

Three consequences of this design matter for how you build with Nyckel:

You start with predictions, not with training data. The endpoint is live immediately. For classification you get a zero-shot model out of the box; for prebuilt functions you get a model trained on millions of examples. This means you can prototype end-to-end before you’ve gathered a single label.

Production usage is testing data. Every review is also an evaluation, so you always know how well the function is performing on real inputs — not on a frozen benchmark from six months ago. No separate eval harness to build or maintain.

Production usage is training data. Every invoke that you review also becomes a sample. A function that handles 1,000 predictions a day where you review 5% of them gets 50 new training examples a day — without anyone running a labeling project.

Function Types — pick the right type for the decision you want to make.
How Functions Improve from Feedback — what’s happening inside the model pipeline.
Build and Improve a Spam Classifier — see the loop in action end-to-end.

How Functions Work

The five parts

The loop

What makes this useful

Next