Symptoms
- Sentry: spike
LiveblocksErrorORWebSocket connection failedот frontend - Multi-user workflow co-edit не работает (cursors не sync, broadcasts dropped)
POST /api/v1/liveblocks/authreturns 500 OR 502- Liveblocks dashboard (https://liveblocks.io (opens in a new tab)) показывает status issue OR fail rate
Severity & escalation
- PAGE 24/7 — real-time collab отключён. Однако degraded mode возможен: REST + versions работают (Phase 12 MD edit), просто без presence/cursors
- Ack window: 15 min
- Escalate за 30 min → engineering lead
- Long outage (>2h): communicate degraded mode к users через UI banner
Immediate actions (< 5 min)
- Check Liveblocks status: https://status.liveblocks.io/ (opens in a new tab) (или dashboard top bar)
- Check our auth endpoint:
curl -X POST arno-api.vadimpianof.workers.dev/api/v1/liveblocks/auth ...с valid JWT — error? - Check Liveblocks dashboard:
- https://liveblocks.io (opens in a new tab) → project
arno - Connection status / error rate / rate limit usage
- https://liveblocks.io (opens in a new tab) → project
- Verify secret:
wrangler secret list --config wrangler.toml | grep LIVEBLOCKS—LIVEBLOCKS_SECRET_KEYpresent
Diagnosis (5-20 min)
Branch A: Liveblocks-side outage
- Status page показывает incident → wait
- Switch frontend в degraded mode banner:
- "Real-time collab временно недоступен. Изменения сохраняются, видны после reload."
- MD editor (Phase 12) продолжает работать через REST
- Workflow canvas: read-only режим (user не может редактировать nodes/edges пока Liveblocks down — workflow primary storage там)
Branch B: Our auth endpoint broken
- Tail wrangler — найти exception в
liveblocks.ts::POST /api/v1/liveblocks/auth - Common причины:
LIVEBLOCKS_SECRET_KEYenv var missing or rotated incorrectly- Project ID hardcoded mismatch (см. master spec §I.2.2 —
project:${id}room naming) - User ownership check failed silently
Branch C: Rate limit hit
- Liveblocks Free: 100 connections/mo concurrent. Если we crossed → 429 на auth
- Check dashboard → Usage
- Mitigation: upgrade к Liveblocks Pro $99/mo per cost ladder; OR апply startup credits (parking master spec §III.6)
Recovery
| Issue | Action |
|---|---|
| Liveblocks-side outage | Banner degraded mode; wait. MD editor still works. |
| Secret missing/wrong | echo -n "sk_dev_..." | wrangler secret put LIVEBLOCKS_SECRET_KEY --config wrangler.toml |
| Rate limit | Apply Liveblocks Startup program / upgrade Pro |
| Auth endpoint code bug | Rollback last deploy → fix → redeploy |
Verification
POST /api/v1/liveblocks/authс valid JWT returns 200 c session token- Browser test: open project → cursor от 2nd window visible
- Sentry
LiveblocksErrorrate < 0.1% - Liveblocks dashboard "Healthy" status
Aftermath
- Post-mortem trigger: downtime > 30 min (collab harder to recover from than DB read-only)
- Document: degraded mode duration, какие users impacted
- Если rate limit hit — review usage trends, plan upgrade
Known false positives
- Liveblocks Yjs Storage REST API throttling на bulk fetch — periodic 429 для individual room queries, не actual outage. Не PAGE. Add backoff in our code.
- WebSocket reconnect storms при network blip на client-side — appears как many errors но self-heals