Training & Accuracy

How Nyckel turns annotated samples into a working model, how it reports accuracy back to you, what to feed it, and the input-size limits to know about.

Accuracy

While a function is training, its accuracy is shown in the left navigation panel of the Nyckel console.

The top bar reflects the overall accuracy — the number of correctly predicted samples divided by the total number of samples.

Below it are class-level accuracy bars. Each bar shows, for one class, how many samples from that class the function predicted correctly. Per-class bars surface where the model is strong and where it needs more or better training data.

Cross-validation

To estimate function accuracy honestly, Nyckel uses cross-validation.

Cross-validation means training multiple models, each on most — but not all — of the samples. Each model then predicts the labels for the samples that were held out of its training set. Stitched together, this gives a fair prediction for every annotated sample without ever training and evaluating on the same data. The aggregate of those held-out predictions is what Nyckel reports as your function’s accuracy.

After cross-validation runs, Nyckel trains a single final model using all annotated data. That final model is what serves predictions for new invokes. Cross-validation is purely for the accuracy estimate; production traffic always hits the full-data model.

Training data

Nyckel trains your function from the data you import and annotate. To get the best performance, follow these guidelines when choosing what to import:

Context length

Nyckel’s Text and Tabular models rely on several large language models (LLMs) to read sample text. Every LLM has a context length — the maximum amount of text it can ingest (and therefore learn from) in a single sample. Nyckel’s LLMs currently have a context length of 512 tokens.

A “token” is a word, a piece of punctuation, or other language fragment. There is no clean 1-to-1 mapping between tokens and words or characters:

In practice, across a wide range of text, the Nyckel LLMs can ingest 300–500 words before hitting the limit. If your samples are routinely longer than that, you have two options:

  1. Use the full text anyway. The first 300–500 words often carry enough signal for Nyckel to learn a good model; the truncated tail is rarely the discriminating part.
  2. Shorten the input in pre-processing by splitting long samples into several shorter ones, each annotated with the same label.