Introducing Invoke Capture -- Integrated Active Learning for Classification
A machine learning model is only as good as the data it is trained on. At Nyckel, our goal is to give you have the tools to keep your data fresh, correct, and up to date.
To this goal, we are excited to introduce invoke capture – integrated active learning for our classification functions.
With our invoke capture feature, Nyckel automatically samples data from your deployed models — including random samples, low confidence predictions, or rare classes — for your team to review and annotate in Nyckel’s dashboard. Nyckel then uses this newly annotated training data to retrain and redeploy your improved model.
Invoke capture systems overview
Invoke capture is a key element of our end-to-end ML offering: a built-in data engine powered by our active learning system. Here is how it works:
- Call your trained model using the standard
- Nyckel automatically checks each data sample (image, text, etc.).
- Samples that look “informative” are added to a staging area. You can find this staging area in the “Capture” tab on Nyckel’s dashboard.
- Users annotate samples in the staging area at their own convenience.
- Annotated samples are automatically added to the training data.
- Nyckel re-trains and re-deploys the improved model.
How do we decide which samples to capture?
Identifying which samples to capture for annotation is not trivial. (Refer to our deep dive on the various methods you can use to capture informative data.) For Nyckel’s automated invoke capture, we use several strategies for capturing informative data, including:
|Low-confidence samples||These are samples where the model is uncertain about the prediction.|
|Random samples||Useful to avoid data drift and to get an unbiased measure of accuracy.|
|Samples from rare classes||Improve performance of rare classes. Help balance out the training data.|
|…||We’ll continue to add new sample types over time.|
Each strategy is assigned a quota in the buffer. For example, we may assign 300 slots to low-confidence samples. These slots are filled by continuously stack-ranking predictions and placing the lowest-confidence ones in the buffer. Another few slots are assigned to samples that are randomly selected. It may sound counter-intuitive to include random samples, but balancing the training data with randomly sampled data is important to ensure the model generalizes to new types of data. Other buffer quotas are assigned to samples that are likely to belong to rare classes, classes that are problematic, and so on.
Over time, we will improve our strategies and add new sample types.
Why is invoke capture important?
A machine learning function exposed to real world data is never fully trained. Data tends to change over time, and new corner cases will always pop up. Selecting and annotating more data (also called “active learning”) is therefore critical for a healthy ML production system.
However, most data also tends to be very boring. So if you rely simply on annotating randomly here and there, you will never discover the corner cases and data issues. This is where “active learning” shines. It helps you focus your valuable annotation time on the samples that matter the most.
Have any questions about our new invoke harvest feature? Reach out to us at any time.