Master spec

Consolidation: ARNO Design v6 + Observability v3 + Tech Stack v2 + audit fixes + URL-import. Status: Implementation-ready после week 1 prototyping verification. Single source of truth для design, observability и tech decisions.

Changelog

  • v1.3 (2026-05-22): Unparked §V row "Small-company path / ARNO Studio" → first feature: URL-import onboarding. Full spec в docs/url_import_spec.md. 12 ADRs (0007-0018) document pivots. 3 rounds /au audits closed 18 P0 + 44 P1 на стыках. Reasoning: small-biz mass onboarding without git-account friction — staging area V1, GitHub App PR V2.
  • v1.2 (2026-05-20): Added Implementation_Workflow.md — vertical-slice execution plan (16 phases с demo-driven milestones). Workflow дополняет §VII.1 master spec — phases там показывают dependency order, в Implementation_Workflow — execution sequence с visible artifacts per phase.
  • v1.1 (2026-05-20): Applied master-audit P0s (retention matrix, a11y, push mutex) + selected P1s (crypto choices, error contract, governance, glossary, launch checklist, multi-vendor outage matrix, source docs reference, version header).
  • v1.0 (2026-05-20): Initial consolidation от 3 individual specs.

Содержание

  • Часть 0. Vision, принципы, governance
  • Часть I. ARNO Product Design
  • Часть II. Observability
  • Часть III. Tech Stack
  • Часть IV. Unified MVP Scope
  • Часть V. Unified Парковка
  • Часть VI. Open Questions / Week 1 Prototyping
  • Часть VII. Дальше + Launch Readiness

Часть 0

0.1 Vision

ARNO — облачный редактор для дизайна продукта поверх существующих репозиториев. Объединяет команды (Maker = дизайн+продукт, Frontend, Editor) — каждая видит свой срез, не ломает чужое.

Killer feature: изменение компонента → мгновенно во всех экранах во всех проектах.

Сценарий MVP: большая компания со своими репами. Малый бизнес — парковка.

Sustainability: $5/mo MVP floor, scales к 100K users через config-tier upgrades без code rewrite.

0.2 Source documents (исторический trail)

Master spec consolidates три converged individual specs:

SourceAudit cycles до convergenceStatus
ARNO Product Design v65 (v1 → v2 → ... → v6, 0 P0 reached)Architectural baseline
Observability v33 (v1 → v2 → v3, plateau at 2 P0)Operational baseline
Tech Stack v22 (v1 → v2, 3 P0 → 0 после audit fixes)Implementation baseline

Master spec — condensed reference. Detailed reasoning, alternatives rejected, audit findings preserved в chat history. Master canonical, individual specs историческая трассировка.

0.3 Architectural principles (36 кросс-cutting)

  1. ARNO строит ТОЛЬКО редактор. Approval / CI / audit / hosting / build / auth — leverage GitHub, Yjs/Liveblocks, Cloudflare, Grafana, Sentry.
  2. Не дублируем правду. Code+TSX — git. MD — git (создаётся в ARNO). Workflow — Yjs. MD-edit live state — наша БД.
  3. Identity ≠ name. UUID везде. Refactor-safe.
  4. Pluggable providers — GitProvider, auth, render-adapter, bundle-hosting, queue, cache, email.
  5. Live propagation + activity-based snapshot.
  6. MVP-конфиг + scale-ready схема только где migration honestly cheap.
  7. Yjs primary live, git async snapshot для workflow.
  8. ARNO = coordination tool, не code rewriter. Drift MD↔TSX детектится.
  9. Impact analysis синхронно с edit-action.
  10. CRDT там где co-edit реален. Workflow — Yjs. MD-spec — REST + versions.
  11. Mechanism > feature claim. Каждая фича имеет explicit implementation strategy.
  12. Never silent drop user data. Conflicts surface explicit choice.
  13. Compiled output ≠ source. Bundle proxy для private repos acceptable exception.
  14. Liveblocks: Storage / Awareness / Broadcast — три разных канала.
  15. Progressive load > full sync для больших проектов.
  16. Same-session vs different-session — разные semantics для conflicts.
  17. Content-addressable URLs solve replica lag.
  18. Edge cache > backend optimization для immutable resources.
  19. DB trigger > application cron для on-write cleanup.
  20. Leverage existing > build new. Managed services.
  21. TypeScript end-to-end. Shared types FE-BE.
  22. Runtime-portable. Hono runs на Workers AND Node.
  23. Vendor-agnostic instrumentation. OpenTelemetry.
  24. Interface abstractions для swappable layers.
  25. Cost-tier ladder explicit. Predictable upgrades.
  26. Module boundaries enforce future extraction.
  27. Stateless services discipline.
  28. Free auth forever via Auth.js. No MAU billing.
  29. Operational SPOF mitigation от day 1.
  30. IaC from day 1.
  31. Prototype critical compatibilities week 1.
  32. Backups для vendor-held data — MVP scope.
  33. Migration matrices must be honest.
  34. Invariants need probes. Claimed correctness без active measurement = silent failure.
  35. Tail-based sampling > head-based для balancing cost и debug coverage.
  36. Async trace propagation must be explicit.

0.4 Data retention matrix

Data typeRetentionAnonymization on user deleteStorage
audit_log180d (configurable per project)user_id → deleted_user_<hash>Postgres
Observability logs (Loki)90dTTL-based delete accepted MVPGrafana Cloud
Error events (Sentry)90d per project defaultscrub PII via beforeSend hookSentry SaaS
Distributed traces (Tempo)50GB rollinguser_id anonymization post-MVPGrafana Cloud
MD versions20 per file (count-based, Postgres trigger)scrub on user delete (post-MVP)Postgres
Yjs backups90danonymize on backup export (post-MVP, parking)Cloudflare R2
component_md_rawindefinite until project deletescrub on user deletePostgres
project_share_linkuntil manual revokeno PII typicallyPostgres
project (archived state)indefinite until user deletefull delete on user requestPostgres
onboarding_session30d unused → auto-expirefull deletePostgres
Redis caches (all)TTL-based (varied 60s-58min)no PII storedCloudflare KV
pushed_by_us dedup5min TTLn/aCloudflare KV
JWT revocation listTTL = remaining token expiryn/aCloudflare KV

GDPR compliance: "Right to erasure" outputs:

  • Immediate: scrub users row, anonymize audit_log, drop personal MD content
  • Within 90d: observability logs expire naturally
  • Within 90d: Yjs backups expire naturally (no active scrub MVP)
  • Post-MVP: parallel scrub jobs для logs/traces/backups

0.5 Crypto choices

Use caseAlgorithmNotes
JWT signingHS256 (MVP)Symmetric, simple. Migrate к RS256 при multi-issuer
JWT_SECRET minimum strength256-bit randomGenerated via crypto.randomBytes(32)
user_id_hash для deleted_users trackingHMAC-SHA-256 с pepperPepper в env var. Prevents rainbow tables.
Content hashing (MD, bundles)SHA-256Content-addressable URLs
Webhook signature verificationHMAC-SHA-256GitHub default, explicit
Device flow user_code generation8-char base32 (40 bits entropy)Sufficient для 5min TTL
Session ID generationcrypto.randomUUID()v4 UUID
Password hashingn/a (no password auth, GitHub OAuth only)Post-MVP non-GitHub providers → Argon2id

0.6 Decision authority hierarchy

При обнаружении конфликтов между sections:

  1. Master spec wins над individual specs (этот документ canonical)
  2. ARNO Product Design wins над Tech Stack по product behavior
  3. Tech Stack wins над ARNO Product Design по infrastructure mechanics
  4. Observability wins над both по что monitored/alerted
  5. Conflicts surfaced → resolved в next master spec version, не workarounds

0.7 Spec governance

  • Changes via PR к master spec
  • Reviewer: founder + 1 co-admin (solo dev currently — future team grows process)
  • Major architecture changes: ADR (Architecture Decision Record) создан first в docs/adr/
  • Minor clarifications: directly в master spec, version bump (1.1 → 1.2)
  • Major rewrites: version bump (1.x → 2.0)
  • Changelog maintained в section 0 (top of doc)

0.8 Glossary

Term / AcronymDefinition
MakerCombined Design+Product role. Primary editor of workflow и MD specs.
WorkflowGraph of screens + edges representing product flow
ScreenComposition of component instances; node в workflow graph
EdgeConnection between screens triggered by interactive element
Component specMD file describing component contract (props, events, structural sections)
Component instanceUse of component within a screen с specific prop values
Render adapterProtocol+bundle that renders real React components в ARNO preview iframe
Session-branchPer-Maker git branch (arno/{user-handle}) for in-flight edits
CRDTConflict-free Replicated Data Type — Yjs-based real-time collab
RBACRole-Based Access Control
SPOFSingle Point of Failure
MAUMonthly Active Users (billing metric для Liveblocks, Clerk, etc.)
MVPMinimum Viable Product
SLOService Level Objective
KPIKey Performance Indicator
IaCInfrastructure as Code (Terraform)
PITRPoint-In-Time Recovery
TTLTime To Live
OAuthOAuth 2.0 authorization protocol
JWTJSON Web Token
OIDCOpenID Connect
DAGDirected Acyclic Graph
OTelOpenTelemetry
OTLPOpenTelemetry Protocol
CSPContent Security Policy
DPAData Processing Addendum
TOCTOUTime-Of-Check-Time-Of-Use (race condition class)
DOCloudflare Durable Objects
TTITime To Interactive
LCPLargest Contentful Paint
WAFWeb Application Firewall
GraphQLQuery language for APIs (для GitHub bulk fetch)
tRPCTyped RPC framework для TypeScript
FrontmatterYAML metadata block at top of MD file
Fenced blockMD comment-delimited structural section (<!-- arno:props v1 -->)
textEditable propSpec marker allowing Editor role к modify prop value
DriftMismatch между MD spec и TSX code
Bundle proxyARNO backend serving private repo bundles с auth
Pre-edit impact analysisSync UI confirm перед breaking change
Branch-aware viewMaker sees own branch changes; others see main
Soft snapshotComponent versions cached during active screen edit

Часть I. ARNO Product Design

I.1 Роли

РольWorkflowКомпозицияИнстансыMD-спекаКодSettings
Maker (Design+Product)👁
Frontend👁👁👁💬✅ (в IDE)👁
Editor (UX-writer)👁👁✏️ textEditable props💬👁
Viewer👁👁👁👁 опц.
BackendOUT of MVP (парковка)

I.1.1 GitHub permission mapping

admin/maintainwritetriagereadnone
MakerMakerMakerViewerno access

I.1.2 Permission enforcement

  • UI-level: disabled controls с tooltip
  • Workflow (CRDT): Liveblocks server-side mutation validation, path-allowlist:
    • Maker: any path
    • Editor: только screens.*.instances.*.props.[id].value где spec.props[id].textEditable=true
    • Viewer: reject
  • MD-edit (REST): server checks role + path
  • Audit log: rejected mutations с user_id, intent, timestamp
  • Тонкие роли — через .arno/config.json ссылки на GitHub Teams

I.2 Data model

I.2.1 Postgres

user                   { id, provider_user_id, prefs, is_admin }
project                { id, name, owner_id, visibility, state, current_size_bytes }
                       // state: 'pending_setup' | 'active' | 'archived'
connected_repo         { project_id, provider, url, paths, sync_mode, last_synced_sha,
                         bundle_hosting, config_path, last_visibility_check, last_visibility_status }
project_member         { project_id, user_id, role, source }
project_share_link     { id, project_id, scope, token, ... }
component_md_raw       { project_id, file_path, raw_md_text, frontmatter_id, content_sha,
                         current_version, updated_by, updated_at }
component_md_versions  { id, project_id, file_path, version, content, content_sha,
                         parent_version, saved_by, saved_at, session_id, label }
                       // AFTER INSERT trigger purges versions OFFSET 20
project_id_conflicts   { project_id, conflict_id_value, files[], detected_at, resolution_lock_holder }
webhook_job_queue      { id, project_id, branch, commit_sha, parent_sha, retries, next_attempt_at, status }
onboarding_session     { id, user_id, attempt_id, project_draft_id, current_step,
                         completed_steps, last_error, persisted_state }
audit_log              { id, user_id, action, target, intent, timestamp }
deleted_users          { user_id_hash, deleted_at }
// Auth.js Drizzle adapter tables:
accounts, sessions, verificationTokens

// Cloudflare KV - cached state:
pushed_by_us              // SET, TTL 5min, webhook dedup
viewer_snapshot           // viewer:project:{id} TTL 60s
bundle_proxy_cache        // bundle:{sha} TTL 1h
project_snapshot          // project_snapshot:{id} TTL 5min
gh_installation_token     // gh_token:{installation_id} TTL 58min
gh_bucket                 // rate limiter state per installation
id_conflict_lock          // id_conflict:{repo_id}:{conflict_id_value} NX EX 300
push_lock                 // push_lock:{project_id}:{branch} NX EX 60 (mutex)
revoked                   // revoked:{jti} JWT revocation, TTL = remaining token expiry
device_code               // device_code:{code} CLI device flow, TTL 10min
device_polls              // rate limit device flow polls

I.2.2 Liveblocks (три канала)

Storage (CRDT, persistent — workflow):

project (Y.Doc) {
  screens: Y.Map<screenId, Screen>
  edges:   Y.Array<Edge>
}
Screen { id, name, presentation, root: ComponentInstance }   // структура — парковка
Edge   { id, from: {screenId, instanceId, eventId}, to: {screenId}, action: 'navigate' }

Awareness: presence, current screen, cursors (ephemeral).

Broadcast:

  • md_saved { componentId, newVersion, contentSha, content?, by, timestamp } — content inline if <10KB
  • drift_detected { componentId, mismatches, status }
  • external_change { path, by }
  • subscription_changed { user_id }
  • workflow_updated { regions }

Dual rooms: project:${id} (workflow + presence) + repo:${id} (cross-project fan-out).

I.2.3 Git-репа клиента

.arno/config.json                       // version: 1, paths, pairing, bundleHosting, arnoChanges
.github/workflows/arno-build.yml        // bundle CI
.github/workflows/arno-check.yml        // UUID lint, parse, drift check
arno.entry.tsx                          // render-adapter entry
Design_system/*.md                        // frontmatter id + fenced structural blocks v1
src/components/*.tsx                      // код (ARNO не редактирует)

config.json:

{
  "version": 1,
  "componentsMd": "Design_system/",
  "componentsCode": "src/components/",
  "pairing": {
    "default": "[Name].md ↔ [Name].tsx",
    "overrides": { "Button.md": "src/components/Button.tsx#Button" }
  },
  "bundleHosting": { "type": "gh-pages | github-packages", "config": {} },
  "arnoChanges": { "backfillPolicy": "require-review" }
}

I.3 Архитектурные решения

I.3.1 Source of truth

СущностьГде живётКто пишет
Code (TSX)репа компанииFE в IDE
MD-спеки (canonical)репа компанииMaker через ARNO (REST → push), FE опц.
MD-edit live state + versionsнаша БДMaker REST auto-save
Workflow (screens + edges)Liveblocks Yjs StorageMaker CRDT
Component data cacheнаша БД (raw_md)sync-handler webhook

I.3.2 MD format + smart editor

  • Canonical = свободный markdown
  • Fenced structural blocks с версией (<!-- arno:props v1 -->)
  • textEditable: true маркер для Editor-доступных string props
  • Persistence: REST + versions (никогда silent drop)
  • Conflict UI с user choice
  • Real-time awareness через broadcast
  • Multi-tab coordination через BroadcastChannel API

Save flow:

POST /api/md/{project}/{path} с {content, base_version, session_id}

Server:
  if current_version == base_version → save
  elif conflict.author == request.user AND conflict.session_id == request.session_id
       → auto-rebase silent (same-session fast-forward)
  else → 409 conflict response

Conflict UI options: view diff / save mine / discard mine (saved as backup) / merge manually.

Postgres trigger purges versions OFFSET 20.

I.3.3 Discovery, sessions, write-back

  • .arno/config.json (versioned "version": 1)
  • Session = git branch arno/{user-handle}
  • Debounced push (~30s idle) в session-branch
  • Первый push → auto-PR; subsequent → update
  • Approval = GitHub (CODEOWNERS, branch protection, CI)

Pre-push HEAD check + concurrency mutex:

1. Acquire mutex (Redis SETNX):
   key = `push_lock:${project_id}:${branch}`
   if not redis.set(key, instance_id, NX=True, EX=60):
     throw 'concurrent_push_in_progress' — retry later
2. Pull latest HEAD via API
3. If HEAD == last_known_push_sha → push, update last_known
4. If diverged → NO auto-push → UI:
   "External commits on your session branch from IDE/git.
    [Open in GitHub] [Discard mine, pull theirs]"
5. Always release mutex в finally:
   redis.delete(`push_lock:${project_id}:${branch}`)
6. Never force-push automatically

Mutex prevents two concurrent ARNO instances pushing к same branch (TOCTOU race на step 2-3).

Pre-edit impact analysis: at edit-action time (not push), this-project scope, gated by sync completion.

I.3.4 Render

Edit mode: schematic boxes (zero setup).

Preview mode — tiered:

Тип измененияBehavior
Default value пропаOverride at render-time. Instant
Rename name пропа (id stable)Warning "Spec renamed, code not yet updated"
Новый проп в MDWarning, ignored в render
Новый компонент / TSX changeBuild session-branch ~1-2 мин

Drift detection (CI-based via react-docgen): 5 component statuses (green / red / yellow / purple / gray) + per-component opt-out.

Bundle hosting: gh-pages (public) / github-packages backend proxy (private), streaming pass-through.

I.3.5 Identity

  • Component / Prop UUIDs в MD frontmatter / fenced sections
  • Instance / Screen / Edge UUIDs в Yjs
  • arno backfill-ids utility
  • Collision handling: UI + default heuristic + Redis cross-project lock
  • arno lint блокирует merge PR с duplicates

I.3.6 Interactivity

  • Instance-level
  • События из fenced events: MD
  • Edge: {from: {screen, instance, event}, to: {screen}, action: 'navigate'}
  • Edit FROM screen (DevPanel "+"), просмотр НА workflow-canvas (read-only)

I.3.7 Real-time collab

  • Yjs только для workflow
  • Liveblocks transport + Storage + Awareness + Broadcast
  • Yjs ↔ git invariant: git_head_commits ⊆ yjs_serialized_state
  • Canonical serializer partial (passthrough prose, deterministic fenced)
  • Reverse sync (FE-push) → 3-way merge UI парковка, MVP GitHub fallback

I.3.8 Sync с репой

GitProvider abstraction:

core: listFiles, getFileContent, createBranch, commit, getDiff, verifyWebhookSignature
extensions: github.createPullRequest, github.resolveCodeowners

GitHub App required permissions: Contents R/W, Pull requests W, Metadata R, Webhooks W, Workflows W (scope .github/workflows/arno-*.yml), Packages R/W (для github-packages bundle hosting).

Optional: Members R, Checks W.

Rate limiter (KV token bucket per installation, 12,500/hour budget).

Token refresh middleware с SETNX lock (thundering herd protection).

Initial scan via GraphQL bulk (~20 queries для 2000 MDs).

Webhook handling: dedup (SHA + author=arno[bot]), fan-out via repo:{id} room, DAG ordering per-project, exponential backoff, persisted queue.

Auto-detect staleness: cron 10min для active projects.

I.3.9 Live propagation + edit stability + progressive load

  • Push в main → broadcast → клиенты обновляются
  • Branch-aware: Maker с changes → видит свою версию; другие — main
  • Soft snapshot activity-based (notification + apply/ignore)
  • Idle 30min → "auto-refresh in 60s"
  • Server snapshot cron 60s → Redis для progressive load
  • Open project: T=0 read-only (instant) → T=2-15s edit unlock

I.3.10 Sharing

  • Visibility: private | unlisted (token) | public
  • Scope: project | screen (frozen single)
  • Anonymous viewer; require_login флаг (UI post-MVP)
  • Manual revoke, multiple tokens. Expiry — не в MVP.
  • Viewer НЕ подключается к Liveblocks (server snapshot + Redis cache TTL 60s)
  • Zero MAU billing для viewers

I.3.11 Onboarding flow

Sequential + post-merge parallel tracks:

  1. Sign up — Auth.js + GitHub OAuth (см. §III.2.6)
  2. Install ARNO GitHub App
  3. Create project (list repos)
  4. arno init wizard (paths, bundleHosting, config_path, state persisted per-attempt)
  5. User merges PR
  6. Parallel:
    • Track A (GraphQL bulk scan ~10-30s): edit-mode unlocked
    • Track B (bundle CI ~60-120s): preview-mode unlocked
  7. Empty repo: hint к "ARNO Studio" parking (small-biz path)
  8. Land в workflow editor

Failure modes: PR not merged → email reminders (Day 1, 7), archive Day 30.

Project lifecycle: pending_setupactivearchived (90d inactivity by state-changing mutation; viewer access not counted).

I.4 Accessibility requirements

Target: WCAG 2.1 Level AA compliance для MVP.

Specific commitments:

  • Color contrast ≥4.5:1 для normal text, ≥3:1 для large text
  • All interactive elements keyboard-navigable (Tab, Enter, Esc, arrow keys)
  • Workflow canvas keyboard navigation:
    • Tab: между screens
    • Arrow keys: pan canvas / select adjacent screen
    • Enter: drill into screen
    • Esc: back to canvas
  • ARIA labels на все non-text interactive controls
  • Focus visible indicators (explicit ring, не browser defaults)
  • Screen reader testing as MVP gate (NVDA + VoiceOver)
  • No reliance on color alone (status indicators have text/icons)
  • Text resizable до 200% без horizontal scroll

CI enforcement:

  • axe-core в Playwright E2E suite — fail on violations
  • Lighthouse Accessibility score >90 per page (CI gate)

Out of MVP:

  • Full screen reader optimization для workflow canvas (complex, post-MVP)
  • High-contrast theme (system theme support только)
  • Keyboard-only prototype walkthrough (Tab through edges)

Documentation: docs/accessibility.md covers tested scenarios + known limitations.


Часть II. Observability

II.1 Stack

LayerToolCost MVPMigration
ErrorsSentry SaaS$0 (5K events/mo)Team $26
LogsGrafana Loki$0 (50GB/mo)Paid scaling
MetricsGrafana Mimir$0 (10K series)Paid scaling
TracesGrafana Tempo$0 (50GB/mo)Paid scaling
OTel Collectorn/a MVP (direct OTLP push)$0Add when traces >40GB/mo
DashboardsGrafana$0Same
AlertingGrafana Alerting → Slack/PagerDuty$0 free tiersPagerDuty paid
Dead-man-switchHealthchecks.io$0$5-20/mo
Status pageManual MVP$0Statuspage post-MVP

II.2 Logs

Schema (structured JSON to stdout):

{
  "timestamp": "ISO8601",
  "level": "debug | info | warn | error",
  "service": "api | worker | webhook | cron",
  "trace_id": "uuid",
  "request_id": "uuid",
  "span_id": "uuid | null",
  "user_id": "id | null",
  "project_id": "id | null",
  "session_id": "id | null",
  "event_type": "md.save | webhook.received | ...",
  "duration_ms": "number | null",
  "context": { "..." : "event-specific" },
  "error": { "type": "...", "message": "...", "stack": "..." }
}

request_id fallback middleware — always populated.

Sampling (trace-aware via baggage):

  • error / warn: 100% always
  • info: 100% MVP, 10% при scale
  • debug: 0% prod

PII blocklist: GitHub tokens, OAuth tokens, user emails (только user_id), MD content, TSX code, Cookies, Auth headers. Enforcement: lint rule + middleware redaction + code review.

II.3 Metrics

Counters / Histograms / Gauges — key metrics:

# Business
md_save_total{project_id, result}
workflow_mutation_total{project_id, kind}
webhook_received_total{event_type, dedup_skipped}
github_api_call_total{endpoint_class, status_code}
liveblocks_api_call_total{operation, status}
invariant_drift_detected_total{type, severity}
cron_dispatch_total{operation, result}

# Latency (histograms)
http_request_duration_ms{path, method, status_code}
md_save_duration_ms
webhook_processing_duration_ms
bundle_proxy_duration_ms{cache_layer}
snapshot_fetch_duration_ms{source}

# State (gauges)
active_projects_1h
liveblocks_active_connections{room_type}
project_size_bytes{project_id=<top-100|_other>}
degraded_mode_active{soft_dep}
otel_collector_buffer_utilization_pct

Cardinality budget:

  • user_id NEVER в labels
  • project_id → top-100 active sticky 24h, остальное _other
  • Status codes bucketed
  • Mimir max_series_per_tenant: 50000

II.4 Tracing

  • OpenTelemetry SDK auto-instrumentation
  • Manual wrapper для Liveblocks SDK, octokit
  • OTel Collector с tail-based sampling — post-MVP
  • HA Collector deployment + buffer monitoring + degraded sampling fallback

Async trace propagation helper (mandatory):

  • W3C TraceContext в queue/cron job payloads
  • safe_extract_trace_context с try/except для corrupted carriers

Frontend ↔ backend propagation через Sentry tracingOrigins.

II.5 Errors

  • Separate Sentry projects: arno-frontend, arno-backend
  • Frontend SDK lazy-loaded (после first interaction)
  • PII scrubbing via beforeSend
  • Release tagging deterministic: <service>@<git_sha[:8]>
  • Source maps uploaded в CI
  • Frontend hangs detection via Performance API

II.6 Alerting + SLOs

SLOs:

  • API availability: 99.5%
  • API read p95 latency: <500ms
  • API write p95 latency: <1000ms
  • Webhook processing p95: <30s
  • Bundle proxy p95: <500ms (95% edge hit)
  • Data correctness: 100% (verified by invariant probes)
  • Webhook delivery success: >99%
  • Drift detection coverage: >95% components

Alert tiers:

  • PAGE (24/7): health endpoint down, error rate >5%, infrastructure unreachable, invariant drift detected, OTel Collector buffer >95%, cardinality budget exceeded, webhook signature spike
  • NOTIFY (Slack + daily email): error rate >1%, latency degradation, rate limit hit, queue depth, cache hit drop, single soft dep degraded
  • TICKET (weekly review): DAU/WAU trends, table size growth, long-tail latency

Progressive threshold tuning: weeks 1-2 absolute → weeks 3-4 baseline collection → week 5+ baseline-relative.

Escalation: PAGE ack 15min → second on-call → engineering lead.

II.7 Invariant probes

Tiered strategy:

  • Fast probe (hourly, cheap): stored-state comparison (last_pushed_sha vs git HEAD)
  • Deep probe (weekly, expensive): full Yjs serialize → canonical compare с git HEAD

Probe types:

  1. Yjs ↔ git invariant
  2. MD content_sha verification
  3. Workflow canonicalization determinism
  4. ID uniqueness

Coverage rotation (activity-tier based): Tier 1 (hot) daily, Tier 2 (warm) weekly, Tier 3 (cold) monthly.

Independent execution: async parallel, per-probe-type metrics, one failure doesn't cascade.

Dual emit: metric (для alerts) + structured log (для historical query).

II.8 Healthchecks

  • /health — liveness, always 200 if responding
  • /ready — readiness, distinguishes hard (DB, Redis) vs soft (Liveblocks, GitHub) deps; returns degraded flag
  • /metrics — internal network only

Rate limits: /health, /ready 60 req/min per IP (Cloudflare edge). /metrics not exposed externally.

Multi Healthchecks.io URLs: main /health + probe runner heartbeat + each critical cron.

II.9 Runbooks

  • Repo: arno-runbooks
  • PR template requires runbook_url для alerts
  • CI check validates alert YAML → runbook file exists
  • Quarterly review
  • Daily orphan check

II.10 Customer support correlation

Grafana dashboard "User Debug View" с inputs (user_id, time_range, optional project_id) и panels (activity timeline, HTTP requests, Sentry errors, audit trail, traces, Liveblocks sessions). Access: support team only.

II.11 Compliance

См. §0.4 Data retention matrix для всех data types и retention policies.


Часть III. Tech Stack

III.1 Stack overview

LayerMVP ($5/mo floor)Scale (paid tiers)Migration
LanguageTypeScriptSame
Frontend frameworkNext.js 14+ (App Router)Same
Backend frameworkHono + tRPC + ZodSame Hono на Nodexs
Frontend hostingCloudflare PagesVercel Proxs
Backend runtimeCloudflare Workers PaidFly.io / AWS ECSxs
Background jobsCloudflare QueuesBullMQ + Redissm
CronCF Cron Triggers (2 Workers + dispatcher)Node cronxs
DatabaseNeon Postgres freeNeon Launch+xs
ORMDrizzle (dual: HTTP runtime, Pool migrations)Same
CacheCloudflare KV+ Upstash Redissm
Object storageCloudflare R2Same
CDN/WAFCloudflareSame
AuthAuth.js v5 (JWT mode) + Lucia fallbackSame
Real-time CRDTLiveblocks freePro / self-hosted Yjs (Path A DO / Path B external)sm-md / md-lg
EmailResend + abstraction layer+ SendGrid fallbackxs
ErrorsSentry freeSentry Team+xs
Logs/Metrics/TracesGrafana Cloud freePaid tierssm
CI/CDGitHub ActionsSame
HealthcheckHealthchecks.io freeSame
IaCTerraform Cloud freeSame
Domain*.pages.dev → Porkbun separate registrarSame

III.2 Per-layer key decisions

III.2.1 Backend module structure

apps/api/src/
  routes/        # Hono routes (HTTP entry)
  rpc/           # tRPC procedures
  services/      # Business logic (framework-agnostic)
  repositories/  # Data access via Drizzle
  middleware/    # Auth, rate limit, observability, CORS

CORS middleware environment-aware allowlist.

API versioning: /api/v1/... namespace, 6-month deprecation overlap.

III.2.2 Error contract

Standardized error response format для всех API endpoints:

interface ApiError {
  code: string         // 'unauthorized' | 'forbidden' | 'rate_limited' | 'conflict' | 'validation' | 'not_found' | 'internal'
  message: string      // user-facing
  details?: unknown    // for debugging
  retry_after_ms?: number  // для rate_limited
  trace_id?: string    // для support correlation
}

HTTP status code mapping:

  • 400 → validation
  • 401 → unauthorized
  • 403 → forbidden
  • 404 → not_found
  • 409 → conflict (with details containing conflict info)
  • 429 → rate_limited (with retry_after_ms)
  • 500 → internal
  • 502 → bundle_fetch_failed (specific external dep failure)
  • 503 → service_unavailable (degraded mode)

tRPC error mapping: tRPC errors converted к ApiError shape via global middleware.

Frontend handling: unified ApiErrorBoundary component renders based on code.

III.2.3 Workers Paid configuration

  • Bundle size CI measurement, warn at 4MB
  • Tree-shaking: octokit sub-packages, OTel selective, lazy-load rare features
  • GitHub App rate limiter (KV token bucket, 12,500/hour budget)

III.2.4 Cron — 2 Workers + dispatcher

Worker 1 — apps/cron/frequent/ (4 crons): snapshot refresh, top-N refresh, reconciliation, hourly token-check + probe-fast combined.

Worker 2 — apps/cron/scheduled/ (1 cron, dispatcher): hourly trigger с try/catch isolation per operation. Tasks: onboarding reminders, probe deep, repo visibility check, DB backup, Yjs backup, usage report, revocations cleanup.

III.2.5 Database — Drizzle dual driver

Runtime (Workers): @neondatabase/serverless HTTP driver через drizzle-orm/neon-http.

Migrations (CI/Node): @neondatabase/serverless Pool driver через drizzle-orm/neon-serverless. Supports multi-statement transactions.

Lint rule: tools/migrate/* cannot import в apps/*.

Migration workflow: GHA с production-migration environment (manual approval, prevent self-review).

Expand-contract pattern для zero-downtime.

Backup: Neon PITR 7 days + monthly snapshot к R2.

III.2.6 Auth — Auth.js v5 JWT mode

Frontend (Pages): Auth.js v5 с JWT session strategy (7-day expiry). Drizzle adapter. GitHub provider.

Backend (Workers): verifies JWT с versioned secrets (CURRENT + PREVIOUS):

async function authMiddleware(c, next) {
  const token = extractBearer(c.req.header('Authorization'))
  if (!token) return c.json({ code: 'unauthorized', message: 'Missing token' }, 401)
  let payload
  for (const secret of [env.JWT_SECRET_CURRENT, env.JWT_SECRET_PREVIOUS].filter(Boolean)) {
    try {
      ;({ payload } = await jwtVerify(token, new TextEncoder().encode(secret)))
      break
    } catch { continue }
  }
  if (!payload) return c.json({ code: 'unauthorized', message: 'Invalid token' }, 401)
  const revoked = await env.KV.get(`revoked:${payload.jti}`)
  if (revoked) return c.json({ code: 'unauthorized', message: 'Token revoked' }, 401)
  c.set('user', { id: payload.sub, scope: payload.scope })
  await next()
}

JWT_SECRET rotation: quarterly via versioned dual-secret pattern (7-day overlap window).

Three OAuth Apps: dev/staging/prod separate.

Fallback: Lucia Auth (Edge-native) если Auth.js prototype fails.

III.2.7 ARNO CLI — OAuth Device Flow

Standard OAuth 2.0 Device Authorization Grant (RFC 8628). Cross-platform credential storage (~/.arno/credentials.json, Windows %APPDATA%\ARNO\). Rate limit max 12 polls per device_code.

III.2.8 IaC — Terraform Cloud

Managed: Cloudflare resources, Neon databases, GitHub repository settings.

NOT managed: secrets (via wrangler secret / dashboard).

Sensitive variables: sensitive = true + lifecycle.ignore_changes.

III.2.9 SPOF mitigation

  • Multi-owner Cloudflare account (2+ admins, hardware key 2FA)
  • DNS independence: Porkbun/Namecheap registrar, switchable nameservers
  • DNS TTL 300s pre-set для critical records
  • Backup admin email external (Gmail/ProtonMail)
  • Scoped API tokens (never root)
  • Secrets backup: 1Password + Bitwarden
  • Disaster recovery playbook: docs/runbooks/cloudflare_account_loss.md

III.2.10 Multi-vendor outage matrix

CloudflareLiveblocksNeonSentryRecovery
DownUpUp*Migrate frontend + backend к Fly.io/Vercel в 48h. Yjs data intact via Liveblocks.
UpDownUp*Workflow editing degraded (no real-time). MD-edit via REST works. Viewers OK.
DownDownUp*Yjs backups к R2 restore. Frontend rebuild. Manual recovery ~1 week.
UpUpDown*API read-only mode (snapshot fallback). No writes. Viewers OK.
DownDownDown*Catastrophic — full restore from backups, est 1-2 weeks. Detailed playbook required.
UpUpUpDownObservability blind period. No alerting. Manual checking until recovery.

Documented playbooks для каждого scenario в docs/runbooks/multi_vendor_outage.md.

III.3 Monorepo structure

arno/
├── apps/
│   ├── web/                      # Next.js → Cloudflare Pages
│   ├── api/                      # Hono → Workers
│   ├── workers/                  # Queue consumers
│   ├── cron/
│   │   ├── frequent/             # 4 high-freq crons
│   │   └── scheduled/            # 1 hourly + dispatcher
│   └── cli/                      # @arno/cli
├── packages/
│   ├── shared/                   # Domain types, Zod schemas
│   ├── trpc/                     # tRPC routes
│   ├── db/                       # Drizzle schema + Auth.js adapter tables
│   ├── ui/                       # React components
│   ├── editor/                   # MD editor + workflow canvas
│   ├── observability/            # OTel + Sentry + log helpers
│   ├── git-provider/             # GitProvider interface + GitHub impl + rate limiter
│   ├── render-adapter/           # Adapter protocol
│   ├── auth/                     # Auth.js + JWT helpers + Device Flow
│   ├── queue/                    # Queue interface + CF + BullMQ impls
│   ├── cache/                    # Cache interface + KV + Redis impls
│   └── email/                    # Email interface + Resend + SendGrid
├── infra/
│   ├── terraform/                # IaC: Cloudflare + Neon + GitHub
│   └── neon/                     # Migration scripts
├── tools/
│   ├── migrate/                  # Drizzle Pool driver runner (CI only)
│   ├── eslint-config/
│   ├── tsconfig/
│   └── load-tests/               # k6 scenarios
├── docs/
│   ├── adr/                      # Architecture Decision Records
│   ├── runbooks/                 # Operational playbooks
│   ├── accessibility.md          # WCAG compliance documentation
│   └── development.md
├── turbo.json
├── pnpm-workspace.yaml
└── package.json

III.4 Testing

LayerToolCost
Unit/IntegrationVitestOSS
E2EPlaywrightOSS
API mockingMSWOSS
DB integrationTestcontainersOSS
Component isolationStorybookOSS
Load testingk6 (GHA weekly + local)OSS
Visual regressionChromatic free$0
Accessibilityaxe-core в PlaywrightOSS
Security scanSnyk free$0

Coverage: critical paths 90%+, new code 80%+.

III.5 CI/CD

GitHub Actions с manual approval gates для production-migration environment (prevent self-review). Bundle size measurement step. Sentry release tagging. Smoke tests post-deploy. Secrets 90-day rotation cadence.

III.6 Cost ladder honest

StageUsers$/moDrivers
Closed alpha<20$5Workers Paid
Open beta20-100$5-105+ Liveblocks Pro $99 likely (или startup credits)
Growing beta100-500$105-250+ Sentry Team $26
First paying500-2K$250-600+ Neon Launch $19
Validated2K-10K$600-2K+ Grafana paid + read replicas
Scale10K-100K$3K-12KAll paid tiers

Liveblocks decision tree: apply startup program week 2 → if approved free credits, if denied → Pro $99/mo OR Path A self-host DO ~$10-30/mo OR limit beta к 50 users.


Часть IV. Unified MVP Scope

КатегорияMVP
VisionCloud design editor поверх existing repos, multi-team collab
UISidebar (components/workflow), workflow-canvas read-only, screen edit-mode, MD smart-editor с fenced blocks + versions list, multi-tab coordination, share-link toggle, WCAG 2.1 AA compliance
RenderEdit-mode schematic + preview iframe-bridge + custom-entry adapter + arno init + tiered preview + drift CI (5 states) + bundleHosting (gh-pages/github-packages с backend proxy) + periodic visibility check
StoragePostgres (full schema §I.2.1, versions trigger-purge, GDPR retention §0.4). Liveblocks 3 channels + dual rooms. Cloudflare KV (10 caches incl push_lock mutex). R2 (bundles + backups)
SyncGitHub App + token refresh middleware + ETag + rate-limiter + webhook dedup + fan-out + DAG ordering + backoff + persisted queue + reconciliation + manual refresh + pre-push HEAD check с mutex + GraphQL bulk initial scan
CollabYjs (workflow) + Liveblocks 3 channels, presence, cursors, activity-based snapshot, canonical serializer (fenced), server-side mutation validation, subscription lifecycle. MD-edit REST + versions (no silent drop) + same-session fast-forward + multi-tab coordination + conflict UI + real-time awareness
AuthAuth.js v5 JWT mode (HS256) + Bearer Authorization + KV revocation + versioned JWT_SECRET (256-bit) + three OAuth Apps + Lucia fallback ready
IdentityUUID frontmatter + UUID props (fenced v1) + backfill CLI + collision detection + cross-project Redis lock + arno-check Action
WorkflowScreens + edges, action='navigate', edit FROM screen
SharingPrivate/unlisted + token, project & screen scope, anonymous viewer без Liveblocks (server snapshot + Redis)
Author gatePre-edit impact (this-project), gated by sync completion
OnboardingSplit flow + per-attempt session + lifecycle states + email reminders
LimitsByte-primary tracking + tiered warnings (80%) + hard cap (100%)
ObservabilityOTel SDK + Sentry + Grafana Cloud free + structured logs + metrics + traces + invariant probes (tiered) + 3-tier alerting + healthchecks + customer debug dashboard
TechTypeScript + Hono + Next.js + Cloudflare Workers Paid + Pages + KV + R2 + Queues + Cron + Neon Postgres + Drizzle (dual driver) + Auth.js + Liveblocks + Resend + Turborepo + GitHub Actions + Terraform
OperationalMulti-owner Cloudflare account + separate registrar + DNS TTL 300s + disaster recovery + secrets backup + 90-day rotation
Error contractUnified ApiError shape, HTTP status mapping (§III.2.2)
CryptoHS256 JWT, SHA-256 content/webhooks, HMAC-SHA-256 hashing с pepper (§0.5)
ComplianceGDPR retention matrix §0.4, accessibility WCAG 2.1 AA §I.4

Часть V. Unified Парковка

📌TopicTrigger возврата
📌Screen composition structure (tree, slots, depth, overrides, allowlist)Разработка screen edit-mode
📌Sharing model details (subgraph scope, require_login UI, public discovery, comments)Разработка share-фичи
📌Backend API binding (edge.action=apiCall, mock, conditional, OpenAPI)После валидации MVP
📌Component author gate full (staged rollout, visual regression)V2 после pilot
📌Workflow-canvas layout rulesЖдём правила
📌Project lifecycle / retention detailsРазработка settings / delete
📌Notifications (multi-team events)После core MVP
Small-company path / ARNO StudioUNPARKED v1.3 → URL-import onboarding feature, см docs/url_import_spec.md. Original trigger "Big-company MVP validated OR primary market shift" satisfied by product decision.
📌GitLab / Bitbucket providersПосле валидации GitHub
📌Multi-repo per projectЕсли бизнес-кейс critical
📌ARNO-side 3-way merge UIGitHub-fallback friction критичен
📌Yjs MD co-edit (vs current REST+versions)Demand на real-time MD spec collab
📌Real-time drift detectionCI-based latency проблема
📌Cross-project impact indexSingle-project impact недостаточен
📌Multi Y.Doc permission splitPath-allowlist enforcement недостаточен
📌Customer-hosted CDN (S3, etc.)По запросу
📌ARNO-managed S3 для private bundleEnterprise alternative
📌On-premise / enterprise hostingEnterprise demand
📌MD version history fancy UIPost-MVP
📌OTel Collector + tail samplingTraces approach 40GB/mo
📌Microservice extractionMonolith blocks team velocity
📌Multi-region deploymentLatency complaints
📌Read replicas PostgresRead load >70% capacity
📌Postgres shardingSingle instance write limit
📌Self-hosted Yjs (Path A или B)Liveblocks bill >$200/mo OR enterprise on-prem
📌Mobile appsCustomer demand
📌Docs site (Docusaurus/Mintlify)User-facing docs >10 pages
📌Status page (Statuspage.io)After first customer-facing incident
📌SOC 2 compliance auditEnterprise customer requires
📌Bug bounty programAfter SOC 2 maturity
📌Open source releaseBusiness decision
📌i18n implementationSecond language demanded
📌CLI binary distributionNon-Node users complain
📌Render adapter pen testBefore enterprise customers
📌Email auto-failoverAfter first email outage
📌Yjs anonymization в backupsEnterprise / longer retention
📌Access/refresh JWT splitPost-MVP optimization
📌Canvas library alternatives (vs react-flow)If bundle impact too large
📌Full a11y screen reader optimization для workflow canvasPost-MVP
📌High-contrast themePost-MVP
⏸️OTel SDK version policyDocument при имплементации

Часть VI. Open Questions / Week 1 Prototyping

Must verify empirically (week 1):

  1. Auth.js v5 на Cloudflare Pages Edge Runtime + Drizzle adapter + GitHub provider — full sign-in/sign-out/refresh. Fallback: Lucia.
  2. @hono/trpc-server adapter production-ready — batching, error handling. Fallback: direct Hono routes OR Fastify migration.
  3. Liveblocks Yjs Storage REST API access/v2/rooms/{roomId}/storage returns usable Y.Doc state. Fallback tiers: webhook-driven OR client-side periodic export.
  4. Bundle size measurement — actual size with all deps. Tree-shake if >4MB.
  5. Workflow-canvas layout rules (ждём от пользователя)

Operational (parallel, не blocker):

  • Runbook authoring (18+ runbooks)
  • k6 load test scenarios
  • .env.example всех env vars
  • Terraform initial setup
  • Liveblocks startup program application (week 2)
  • Domain registration + DNS TTL 300s
  • Three GitHub OAuth Apps creation
  • Sentry projects (FE+BE separate)
  • Cloudflare multi-owner setup
  • Healthchecks.io org-level accounts
  • 2 password manager backups

Часть VII. Дальше + Launch Readiness

VII.1 Sequence

Полный execution plan: см. Implementation_Workflow.md — 16 фаз с demo-driven milestones, acceptance criteria per phase, throwaway-or-keep notes.

Краткая последовательность:

  1. Week 1 — Empirical prototyping (3 critical verifications + bundle measurement) — частично покрыто Phase 9 (Auth.js Edge), Phase 8 (tRPC+Hono), Phase 11 (Liveblocks REST)
  2. Week 2 — Operational setup (parallel: Liveblocks application, Cloudflare account, domain, OAuth Apps, vendors)
  3. Week 3+ — Implementation phases per Implementation_Workflow.md
  4. Implementation phases (rough order):
    • Foundation: monorepo, CI/CD, basic infrastructure
    • Auth: Auth.js + JWT + Drizzle adapter
    • Data layer: Drizzle schema, migrations
    • GitHub integration: GitProvider, App, webhook handler
    • Liveblocks integration: workflow Y.Doc, broadcasts
    • MD editor: smart editor, versions, conflict UI
    • Workflow canvas: screens, edges, interactivity (accessibility built-in)
    • Render adapter: edit mode → preview mode, bundle proxy
    • Sharing: viewer mode (no Liveblocks), tokens
    • Onboarding: wizard, lifecycle, reminders
    • Observability: OTel, Sentry, Grafana, probes
    • CLI: arno init, backfill-ids, lint, device flow
    • Pre-launch: load testing, runbooks, status page, a11y audit

VII.2 Launch Readiness Checklist

Code:

  • All MVP scope §IV implemented
  • Test coverage critical paths >90%, new code >80%
  • axe-core CI passing на all pages
  • Lighthouse Accessibility score >90 per page
  • Lighthouse Performance score >85 (editor route)
  • Bundle size <4MB (Workers paid limit с buffer)
  • All P0 audit fixes applied across 3 specs

Verification:

  • Week 1 prototype results: Auth.js Edge, tRPC+Hono, Liveblocks REST, bundle — all confirmed
  • Load test passes — 100K simulated users
  • Invariant probes running и не triggering false positives
  • Drift detection working на all components
  • Disaster recovery playbook tested

Compliance:

  • ToS published (auto-generated template + custom for ARNO)
  • Privacy Policy published (covers GDPR retention §0.4)
  • Cookie consent banner implemented
  • DPAs signed с all vendors (Cloudflare, Neon, Liveblocks, Sentry, Grafana, Resend)
  • GDPR data export endpoint built
  • GDPR data delete endpoint built (анонимизация per §0.4)
  • Accessibility statement published

Operations:

  • Multi-owner Cloudflare account configured (2+ admins, hardware key 2FA)
  • DNS records с TTL 300s
  • All secrets backed up в 2 password managers
  • All runbooks for PAGE alerts written
  • On-call rotation defined
  • Status page operational (manual MVP OK)
  • Healthchecks.io configured (main + probe heartbeat + critical crons)
  • 90-day secret rotation reminders scheduled

Customer-facing:

  • First alpha customer onboarded successfully
  • Support email / channel operational
  • Bug report mechanism (Sentry user feedback)

Spec maintenance:

  • Master spec persisted к files ✅ (done)
  • arno-runbooks repo created
  • ADRs initialized
  • Decision log started

All checkboxes required для public launch. Soft launch (private beta) может skip некоторые (status page, public docs).


Master spec status: Implementation-ready после week 1 prototyping verification. Source of truth: этот документ. Individual specs (ARNO v6, Observability v3, Tech Stack v2) preserved в chat history для historical trace.