Symptoms
- Grafana alert:
rate(url_import_ssrf_rejections_total[5m]) > 10/min - Sentry: SSRFError exceptions spike > baseline 7-day avg
- API logs: many 400 responses from extractor с reason
safe_fetch_failed
Severity & escalation
- INVESTIGATE (not PAGE — SSRF rejections это правильное поведение защиты)
- Ack window: 4 hours business hours, next-day off-hours
- Escalate если sustained > 1 hour OR pattern indicates infrastructure compromise
Immediate actions (< 5 min)
-
Check rejection reasons distribution:
# Query Sentry / log aggregator for SSRFError reasons # Common reasons: scheme_not_allowed, no_safe_ips, dns_resolve_failed -
Sample failing URLs (anonymized):
- Если все private IP ranges → нормальная attack/scan from external
- Если valid domains → DNS issue OR config drift
-
Cross-reference с user_ids:
- Single user spamming? → anti-abuse (см user_id rate limit)
- Distributed across users? → external scan against extraction endpoint
Diagnosis (5-20 min)
Branch A: All rejections к private IP ranges
Это normal SSRF guard поведение. Causes:
- Скан внешний попытаться SSRF против ARNO
- Юзер ввёл URL который resolves к 127.0.0.1 (typo, dev URL)
- DNS rebinding attempt
Action: проверить нет ли уже rate limits на user_id. Если single user — possibly anti-abuse review.
Branch B: Rejections для valid public domains
Возможные causes:
- DNS resolver issues (Cloudflare DNS proxy outage?)
- BLOCKED_NETWORKS list misconfigured (e.g. accidentally added public range)
- IPv6 false positives (новый ipv6 prefix не в block list correctly)
Action:
- Verify resolution from Worker context:
wrangler tail | grep "dns_resolve_failed" - Compare с public DNS resolution:
dig +short example.com # whatever URL failing - Check BLOCKED_NETWORKS список в
packages/url-import-extractor/src/safe-fetch.ts— verify не добавили public range недавно
Branch C: dns_resolve_failed bursts
Indicates external DNS layer issue:
- Cloudflare DNS outage
- Network egress from Workers blocked
- Specific TLD не resolvable
Action: check Cloudflare status page. If CF issue — wait, document outage.
Recovery
If valid SSRF protection (Branch A):
- No action — system working as designed
- Optionally: send anti-abuse notification к offending user
- Verify rate limits on user_id are tight enough
If false positives (Branch B):
- Hotfix BLOCKED_NETWORKS list в safe-fetch.ts
- Deploy backend update
- Re-test failing URLs
If DNS layer issue (Branch C):
- Cannot fix at ARNO layer — wait для CF/upstream resolution
- Communicate к users (status page) если sustained
Aftermath
- Post-mortem если sustained > 30min OR caused customer-visible failures
- Backfill SSRFError metrics в Grafana если missed
- Tune alert threshold если too noisy (currently 10/min — может needs higher floor for high-traffic periods)
Known false positives
- Tranco-listed domain с regional CDN: DNS resolves к geo-specific IP that occasionally hits filtered range. Document specific case if recurring.
- IPv6 addresses в shared CGNAT (carrier-grade NAT) — может look like private. Verify prefix correctly in BLOCKED_NETWORKS.
References
- safe-fetch.ts (opens in a new tab) — implementation
- safe-fetch.test.ts (opens in a new tab) — 24 tests covering corner cases
- ADR 0007 — code-first design implies SSRF guard
- url_import_spec.md § XI.1 — full spec