Training & Accuracy
How Nyckel turns annotated samples into a working model, how it reports accuracy back to you, what to feed it, and the input-size limits to know about.
Accuracy
While a function is training, its accuracy is shown in the left navigation panel of the Nyckel console.
The top bar reflects the overall accuracy — the number of correctly predicted samples divided by the total number of samples.
Below it are class-level accuracy bars. Each bar shows, for one class, how many samples from that class the function predicted correctly. Per-class bars surface where the model is strong and where it needs more or better training data.
Cross-validation
To estimate function accuracy honestly, Nyckel uses cross-validation.
Cross-validation means training multiple models, each on most — but not all — of the samples. Each model then predicts the labels for the samples that were held out of its training set. Stitched together, this gives a fair prediction for every annotated sample without ever training and evaluating on the same data. The aggregate of those held-out predictions is what Nyckel reports as your function’s accuracy.
After cross-validation runs, Nyckel trains a single final model using all annotated data. That final model is what serves predictions for new invokes. Cross-validation is purely for the accuracy estimate; production traffic always hits the full-data model.
Training data
Nyckel trains your function from the data you import and annotate. To get the best performance, follow these guidelines when choosing what to import:
- Provide data similar to what your function will encounter in production. If possible, draw samples from the same system the function will be deployed in. A model trained on screenshots from one tool will be weaker on screenshots from a different tool, even when the task is “the same.”
- Provide balanced data. Include roughly the same number of samples per class. A function with 10,000 examples of one class and 50 of another will learn the dominant class well and the rare class poorly.
- Provide more data. More annotated samples beat almost every other lever. If you have to choose between cleaning a marginal sample and adding a new one, add a new one.
Context length
Nyckel’s Text and Tabular models rely on several large language models (LLMs) to read sample text. Every LLM has a context length — the maximum amount of text it can ingest (and therefore learn from) in a single sample. Nyckel’s LLMs currently have a context length of 512 tokens.
A “token” is a word, a piece of punctuation, or other language fragment. There is no clean 1-to-1 mapping between tokens and words or characters:
- Most tokenizers use one token for the word
cat. - The same tokenizer typically uses two tokens for
cat's. - The exact mapping depends on the tokenizer, which depends on the LLM.
In practice, across a wide range of text, the Nyckel LLMs can ingest 300–500 words before hitting the limit. If your samples are routinely longer than that, you have two options:
- Use the full text anyway. The first 300–500 words often carry enough signal for Nyckel to learn a good model; the truncated tail is rarely the discriminating part.
- Shorten the input in pre-processing by splitting long samples into several shorter ones, each annotated with the same label.