In this guide we'll improve an already trained function. If you don't have a trained function take a look at our Quick start first.
Trained machine learning functions do better with more data, so the first thing to do is import more data. This section gives some guidance on which data to import. Once identified, use the UI or API to add the data to your function. For API details refer to the Create text sample or Create image sample endpoints as appropriate.
The samples you use for training should be similar to the samples you expect to run your model on. For text, "similar" can be the types of word used, the tone and the text length. For images, "similar" can be things like lighting condition, orientation and distance to the camera. The best way to achieve this is to use data that you harvest from your production database / users. If possible, avoid making up your own train data and also avoid using public data.
If some of your classes are under-represented in your training data, try finding more. If your function is deployed, look for highly confident predictions for rare classes. If not, try browsing your database or running text based searches.
The most useful samples to add are those where your Nyckel function is wrong. These are in general difficult to identify since you may not know the correct class (if you did, you wouldn't need Nyckel!). However, on occasion, the correct class can be inferred e.g. by user interactions on your site.
Difficult samples are more informative to train on compared to trivial samples. If your function is in production, look for low confidence predictions. If not, try using the invoke panel in the UI and see which type of samples your function is not confident on.
It is useful to also add randomly selected data. This ensures your function is not over-fitting to the type of data you have already uploaded. Ideally, draw your samples from the data your function encounters, or will encounter, in production.
Once you have imported ample data, it is time to add annotations.
Annotation errors are problematic since they give Nyckel the wrong information. These can occur due to mis-clicks, or changes to the class definitions. To find potential annotation errors, filter on "Disagrees" in the "Function output" pulldown, and sort by "Most Confident Prediction". This will return samples where the function is confident but disagrees with the annotation. Browse the first few pages and fix any annotation errors you find.
If there are annotations available outside Nyckel, upload these to create more training data. Such annotations could be drawn e.g. from users interacting with your site, third-party annotation services, or a heuristic like a running a regexp on your text samples. Import can be done in the import panel of the UI or using the update-annotation API endpoint.
The Nyckel annotation UI is optimized for high annotation throughput. Get a cup of coffee, turn on some music and annotate some data.
As a general rule, try using as few labels as possible. If you are unsure which label to use when annotating, consider merging classes or re-organize class list altogether. You can use the API to create versions of your function with different sets of classes and see which is more accurate.
Nyckel uses a sophisticated AutoML engine to find the best accuracy for each function. However, if you are not happy with your function, let us know and we will try to improve the system for you.
We are constantly adding the latest advances in machine learning to our AutoML engine. This means that your function will get better over time, without any effort from your end!