The shape
When it’s the right fit
- Prototype or early-stage — you’re still validating the business
- Low-to-moderate volume — a few hundred webhooks per minute is easily handled by a single handler
- Simple per-event work — update a row, send an email, notify a channel
- You control latency of downstream actions — if your DB write takes 30 seconds, this pattern breaks
Reference handler (Node + Express + Postgres)
Handling retry-vs-ack decisions
Sly retries on non-2xx responses. Your handler decides whether a given failure is transient (worth retrying) or permanent (ack and log).| Failure | Handler response | Sly behavior |
|---|---|---|
| Signature verification fails | 400 | No retry — assume bad actor |
| Unknown event type | 200 (ack) | No retry — log and move on |
| Transient DB connection failure | 500 | Retry with backoff |
| Downstream API timeout | 500 | Retry with backoff |
| Application bug (deserialization, null ref) | 200 (ack) + page ops | Log the bug; don’t retry into the same bug |
| Dead-letter (5 attempts exhausted) | N/A | Moves to DLQ; subscribe to webhook.dlq |
Dedupe strategies
Pick one based on scale: Small volume (<1k events/day): in-memory LRU cache, 48h TTL. Simple, fast, no DB roundtrip. Medium volume: Postgres table with unique constraint onevent_id. The INSERT ... ON CONFLICT DO NOTHING pattern shown above works.
High volume: Redis with SETNX + TTL. Faster than Postgres for read-mostly access.
Don’t block the ack path
Anything slow moves async. For a quick email or notification, in-handler is fine. For heavy work (re-render a PDF, call an LLM, hit 3 external APIs), either:- Enqueue a background job and ack immediately
- Upgrade to queue-backed workers
Multi-instance deployments
If you run multiple replicas of your webhook handler behind a load balancer, deduplication must be shared across replicas — the in-memory LRU won’t catch events handled by a sibling. Use the Postgres or Redis pattern above.Testing locally
See local testing for the tunnel setup. For unit tests, just construct a signed event and POST it to your handler:Observability
Log these fields on every webhook receipt:X-Sly-Event-IdX-Sly-Delivery-Idevent.type- Handler outcome (ack / retry-requested / error)
- Time-to-ack in ms
- Ack latency — p50/p95/p99 time from request start to 2xx
- Retry rate by event type — identifies which handlers are flaky
- Delivery lag — time from event to webhook receipt (rarely > 2s; growing = Sly-side issue)
When to outgrow this pattern
Signals to move to queue-backed workers:- Ack p95 approaches 1s
- Per-event work takes more than a few hundred ms
- Any per-event external API call
- Fan-out to multiple downstreams per event
See also
- Webhooks overview — delivery contract
- Event catalog — what can fire
- Verify signatures — security detail
- Recipes: Webhooks — multi-language verification + dedupe