How Functions Work
A Nyckel function has five parts that all talk to each other. They’re introduced below in the order you’ll actually encounter them — the endpoint is live within seconds, the review queue fills up as soon as you invoke, accuracy becomes meaningful once reviews accumulate, the samples store grows from those reviews, and the model pipeline is the long-run payoff.
The five parts
1. The endpoint
The endpoint is a REST URL that takes an input and returns a prediction. It’s live from the moment you create the function — you don’t wait for training, you don’t deploy anything. For classification functions the endpoint can return useful results before you’ve added a single labeled example.
2. The review queue
Every prediction the endpoint makes is captured in the review queue. You can open it in the console and confirm correct predictions or correct wrong ones. Each action does two things at once: it scores the function (was that prediction right?) and creates a new training sample. The queue is how production usage feeds back into both testing and training.
3. Testing and accuracy metrics
As your reviews accumulate, Nyckel scores the function against them and surfaces accuracy directly in the console — overall and per label. You don’t have to build a separate eval set or wire up a benchmarking script. The number you see is the function’s real-world performance on the predictions it has actually been asked to make, and it updates as you review more.
4. The samples store
Samples are labeled examples — an input paired with the correct output. They are the source of truth for what the function should learn, and most of them come from reviews and corrections. You can also seed the store directly:
- Uploading samples in bulk (CSV or JSONL)
- Adding them one by one in the console
- Confirming or correcting predictions in the review queue (every annotation becomes a new sample)
5. The model pipeline
The model pipeline is the part you never have to think about. Nyckel watches your sample count, retrains when it has enough new data to potentially improve, benchmarks the new model against the current one on held-out data, and swaps the endpoint over to the new model only if it’s actually better. You don’t trigger any of this — it just happens.
The loop
- 1Application invokes endpoint
- 2Endpoint returns prediction + confidence
- 3Application acts
- 4Reviewer confirms or corrects
- 5Accuracy updates + new sample is added
- 6Nyckel retrains and benchmarks
- 7Better model is swapped in
This loop is how classification functions work today, across every input type — text, images, and structured data. What differs from one classification function to the next is the input format, the set of labels, and how often you review.
What makes this useful
Three consequences of this design matter for how you build with Nyckel:
You start with predictions, not with training data. The endpoint is live immediately. For classification you get a zero-shot model out of the box; for prebuilt functions you get a model trained on millions of examples. This means you can prototype end-to-end before you’ve gathered a single label.
Production usage is testing data. Every review is also an evaluation, so you always know how well the function is performing on real inputs — not on a frozen benchmark from six months ago. No separate eval harness to build or maintain.
Production usage is training data. Every invoke that you review also becomes a sample. A function that handles 1,000 predictions a day where you review 5% of them gets 50 new training examples a day — without anyone running a labeling project.
Next
- Function Types — pick the right type for the decision you want to make.
- How Functions Improve from Feedback — what’s happening inside the model pipeline.
- Build and Improve a Spam Classifier — see the loop in action end-to-end.