Production Integration Patterns
This page covers practical patterns for running Nyckel prediction endpoints reliably in production.
Synchronous vs. asynchronous invocation
Synchronous — Call /invoke inline with the user request. Works well when:
- The prediction is shown to the user immediately.
- Latency is acceptable (typically under 200ms for text, slightly more for images).
Asynchronous — Queue the input, invoke in a background worker, and store the result. Works well when:
- The prediction is used internally (routing, tagging, filtering).
- You are processing batches of inputs.
- You want to decouple prediction latency from user response time.
Retries and error handling
Always implement retry logic for 429 (rate limit) and 5xx (server error) responses.
import time
import requests
def invoke_with_retry(function_id, data, token, retries=3):
url = f"https://www.nyckel.com/v1/functions/{function_id}/invoke"
headers = {"Authorization": f"Bearer {token}"}
for attempt in range(retries):
resp = requests.post(url, json={"data": data}, headers=headers)
if resp.status_code == 200:
return resp.json()
if resp.status_code in (429, 500, 502, 503):
time.sleep(2 ** attempt) # exponential backoff
continue
resp.raise_for_status()
raise RuntimeError("Max retries exceeded")
Store prediction results locally
Do not rely solely on Nyckel as your record of predictions. Store the sampleId, predicted label, confidence, and timestamp in your own database. This lets you:
- Audit predictions later.
- Submit annotations when outcomes are known.
- Analyze model performance over time.
Managing multiple functions
If your application uses more than one Nyckel function (for example, an image classifier and a text classifier), keep function IDs in configuration rather than hardcoded in your application code.
NYCKEL_FUNCTIONS = {
"image_moderation": "fn_abc123",
"ticket_routing": "fn_xyz789",
}
This makes it easy to swap or update functions without code changes.
Rate limits
Nyckel enforces per-account rate limits. If you expect high-volume traffic, contact Nyckel support to discuss your needs. For bursty workloads, consider queuing requests rather than calling /invoke directly at peak.