Status: Accepted (post 3 rounds of
/au). Master spec: unparks §V "Small-company path / ARNO Studio" → first feature: URL-import onboarding. Implementation gate: см § XV blockers перед launch (4 pre-V1 items).
URL-import = новый юзер при регистрации даёт URL → за ~30s в ARNO staging area появляются React TSX компоненты со всеми состояниями, темами, viewports. Готовы к редактированию в ARNO editor; V2 — push to git.
Cross-references: ADRs 0007-0018 содержат rationale per pivot. Этот документ — canonical "что есть сейчас".
§ 0. История решений
| # | Pivot | От → К | Why | ADR |
|---|---|---|---|---|
| 1 | Approach | vision-first → code-first | URL даёт исходник, не картинку | 0007 |
| 2 | Hybrid stack | monolith → schema-driven priority chains + reactive vision | Юзер прописал: "если 100% не заполняется — подключаем скрины" | 0008 |
| 3 | Acceptance gate | weighted scores → 3 эмпирических bool | "Критерии не жизнеспособные" ×3 | 0009 |
| 4 | Completeness (отдельно от acceptance) | weighted → матрица (state × viewport × theme) | Юзер отделил концепции явно | 0010 |
| 5 | Vision activation | predictive → reactive | 0 ложных активаций дорогого пути | 0011 |
| 6 | Uniqueness | algorithmic → top-K → user | 0 ложных reuse | 0012 |
| 7 | Cost reduction | "стек дёшев" → 3 независимых множителя | Юзерское требование: "чем дольше живём — меньше платим" | 0013 |
| 8 | Distillation | Anthropic → Apache 2.0 only | ToS Feb 2026 + conflict of interest | 0014, 0015 |
| 9 | Shadow data UX | checkbox → ToS-based disclosure | Industry pattern, flywheel works | 0017 |
| 10 | Position | parked §V → unparked V1 onboarding | Master spec v1.2 → v1.3 | — |
| 11 | Atom embeddings | DINOv2 visual → E5-small text | Atoms семантически сравнимы, не визуально | 0018 |
| 12 | V1 integration | direct git → staging area | 70% small-biz не имеют git, drop-off risk | 0016 |
3 раунда /au фиксов в архитектуре закрыли 18 P0 + 44 P1 на стыках. Дальнейшая итерация в чате непродуктивна — нужен PoC code (Task #6 atom decomposition validation).
§ I. Концепция
| Trigger | Registration → URL + copyright checkbox + license_type |
| Outcome V1 | ~30s → N TSX компонентов в ARNO staging (Modal Volume + B2 async) |
| In-scope V1 | React, HTML, статический Vue. Public URLs. Light + dark theme. 4 viewports. Staging без git auth |
| Out V1 | Auth-gated sites, canvas-rendered UI, Svelte/SolidJS, mobile-only, multi-page crawl |
| V2 | GitHub App + direct PR, sitemap crawl |
| Parked | HAR upload для auth, native mobile import |
§ II. 13 принципов
- Code-first, vision-reactive. URL даёт исходник.
- Schema-driven per-field priority chains. Частичный отказ слоя ≠ отказ pipeline.
- Эмпирическая приёмка (3 bool):
tsc + render + pixelmatch < 0.30. - Reactive activation. Vision только после acceptance gate fail.
- Provenance везде.
source + confidence + model_versionдля каждого поля. - Юзер решает uniqueness. Top-K соседей → юзерское решение.
- Completeness = матрица combinations. 90% pass
pixelmatch < 0.15. - Cost decreases over time. 3 независимых множителя (cache, flywheel, tier routing).
- Legal-clean distillation. Teacher только Apache 2.0/MIT.
- Shadow mode = training data. ToS-disclosed, anonymized.
- Self-host где можно, paid где критично. Floor cost минимален.
- Атомизация A1-A8. Catalog для переиспользования.
- Failure modes явные. "Почему сломалось", не "что-то пошло не так".
§ III. Стек
3.1 Используем
| Слой | Tool | Лицензия | Цена |
|---|---|---|---|
| Browser automation | Playwright | Apache 2.0 | free |
| Anti-bot fallback | Hyperbrowser | commercial | $0.01/req, ≤5% URLs |
| UI detection (Tier 2+) | OmniParser v2.0 | MIT | self-host Modal keep-warm $288/мес |
| Text LLM + VLM | Gemini 2.5 Flash-Lite | API | $0.10 / $0.40 per 1M |
| Code gen | Cerebras Cloud | API | 1M tok/day free, $0.60/1M after |
| Custom VLM (Q2+) | Qwen3-VL-32B-Instruct + LoRA | Apache 2.0 | self-host Modal |
| Distillation teacher | Qwen3-VL-235B-A22B-Instruct | Apache 2.0 | $0.50/1M via Together |
| GPU compute | Modal Labs | commercial | $0.40/hr A10G, $1.10/hr A100 |
| Visual similarity (Phase 9) | DINOv2 ViT-S/14 ONNX CPU (384 dim) | Apache 2.0 | self-host, ~$10/мес at 100k URLs |
| Atom embeddings | E5-small multilingual | MIT (Xenova) | CPU inference, 80MB model |
| Vector storage | pgvector в Postgres | PostgreSQL | Supabase free → $25 |
| Persistent queue | Modal Volumes + SQLite | commercial | $0.30/GB-mo |
| Storage | Backblaze B2 | commercial | $0.006/GB-mo |
| TSX sanitization | ts-morph (AST, whitelist) + postcss-safe-parser | MIT | free |
| Domain reputation | Tranco top 15k + Bloom filter | CC-BY 4.0 | free (attribution req.) |
| Image dedup | pHash | BSD | free |
| Pixel diff | pixelmatch | MIT | free |
| Status page | Cachet on Cloudflare Workers | open | free |
3.2 НЕ используем
| Tool | Почему |
|---|---|
| Claude API | Conflict of interest + Anthropic ToS distillation (Feb 2026) |
| OpenAI | 5-10× дороже Gemini Flash-Lite + ToS distillation risk |
| Vercel v0 / Builder.io / Locofy | Closed source, vendor lock-in, $20-30+/мес min |
| Browserless | Дороже Hyperbrowser при сопоставимом anti-bot |
| DOMPurify для JSX | Не понимает JSX expressions — ts-morph AST |
3.3 Tier escalation (realistic)
Tier 0 ($15-25/мес): Playwright + OmniParser skip + Cerebras free→paid
+ Gemini paid + Modal spot
→ ~1-2k URLs/мес
Tier 1 ($50-100/мес): + Modal A10G keep-warm для OmniParser
→ 5-10k URLs/мес
Tier 2 ($300-600/мес): + Hyperbrowser anti-bot + Supabase $25
→ 50-100k URLs/мес
Tier 3 ($2-5k/мес): + A100 LoRA + dedicated pgvector
→ 500k+ URLs/мес3.4 CSS-in-JS handling
styled-components / emotion → runtime-injected styles, computed_style ловит. CSS variables не извлекаются (нет :root). Mitigation: detect через JS bundle pattern → tokens создаются программно из observed values. Real-world bad CSS → PostCSS lenient (safe-parser) с error capture.
§ IV. Pipeline (10 фаз)
Phase 1 — Code extraction
async function phase1(url: string): Promise<RawCapture> {
const { ip } = await safeFetchGuard(url); // SSRF + random pick from safe IPs
const parsed = new URL(url);
let browser;
try {
browser = await playwright.chromium.launch({
args: [`--host-resolver-rules=MAP ${parsed.hostname} ${ip}`]
});
const page = await browser.newPage({
userAgent: 'ARNOBot/1.0',
viewport: { width: 1440, height: 900 }
});
const responses: NetworkResponse[] = [];
page.on('response', r => responses.push(r));
try {
await page.goto(url, { waitUntil: 'networkidle', timeout: 30_000 });
} catch (e) {
if (isAntiBotSignal(e)) return phase1ViaHyperbrowser(url);
throw new ExtractionError('navigation_failed', e);
}
const { ast: cssAST, errors: cssErrors } = await parseCSSGracefully(responses);
return {
dom: await page.content(),
cssAST,
cssParseErrors: cssErrors.length,
cssInJsDetected: detectCssInJs(responses),
computedStyles: await captureComputedStyles(page),
sourceMaps: await tryExtractSourceMaps(responses), // jackpot 30-50% сайтов
framework: detectFramework(responses),
jsBundle: await captureJsBundle(responses)
};
} finally {
await browser?.close(); // P1 leak fix
}
}
async function parseCSSGracefully(responses) {
const errors: string[] = [];
const cssTexts = responses.filter(r => r.contentType?.includes('css')).map(r => r.body);
const asts = cssTexts.map(css => {
try { return postcss.parse(css); }
catch {
return postcss.parse(css, { parser: safeParser, warn: msg => errors.push(msg) });
}
});
return { ast: asts, errors };
}Source maps jackpot: 30-50% сайтов экспонируют → доступен оригинальный TSX/JSX + propTypes + component tree.
Anti-bot signals: 403/429/CAPTCHA HTML / Cloudflare challenge / unusual TTFB → Hyperbrowser fallback (1 retry).
Phase 2 — Themes + States + Viewports
Theme detection cascade:
@media (prefers-color-scheme: dark)в CSS AST[data-theme]/.darkselectors в DOMlocalStorage.themepattern в JS bundlesetTheme/toggleThemepattern в JS bundle- CSS at-rule
color-scheme: darkна:root - Vision fallback (3-sample median):
emulateMedia({colorScheme})light vs dark, wait 500ms, pixelmatch > 0.20 → dark-also
States capture через Playwright CDP Emulation.forceState: default, hover, focus, active, disabled, custom (open/closed для dropdowns, loading/error для async).
Viewports: 320 (mobile) / 768 (tablet) / 1024 (laptop) / 1440 (desktop).
Output: CaptureMatrix: до 40 screenshots per компонент (2 themes × 4 viewports × ~5 states).
Phase 3 — Coverage gate
function coverageGate(spec: PartialComponentSpec): 'pass' | 'enrich' {
const filled = REQUIRED_FIELDS.filter(f =>
spec[f] !== undefined && spec[f].confidence > 0.6
);
return (filled.length / REQUIRED_FIELDS.length) >= 0.90 ? 'pass' : 'enrich';
}Phase 4 — Text LLM enrichment (conditional)
Только если coverage < 0.90. Без скриншотов (text-only Gemini Flash-Lite), ~$0.0005/component.
Phase 5 — TSX generation
async function generateTSX(spec): Promise<GeneratedCode> {
let raw: GeneratedCode;
try {
if (isAtomicLevel(spec)) raw = templateGenerate(spec);
else raw = await cerebras.generate({ model: 'llama-3.1-70b', prompt: buildTsxPrompt(spec) });
} catch {
raw = await gemini.generate({ model: 'gemini-2.5-flash-lite', prompt: buildTsxPrompt(spec) });
}
raw.tsx = sanitizeTsx(raw.tsx).tsx; // см § XI.8
return raw;
}Output: index.tsx + types.ts + tokens.css (CSS variables, не hardcode!) + stories.tsx.
Phase 5.5 — Sample props inference (cycle-safe)
function inferSampleProps(spec, visited = new Set(), depth = 0): { props, unknownTypes } {
if (depth > 3) return { props: {}, unknownTypes: ['__max_depth__'] };
const props: Record<string, any> = {};
const unknownTypes: string[] = [];
for (const [name, def] of Object.entries(spec.props)) {
if (def.default !== undefined) { props[name] = def.default; continue; }
// String literal union: '"primary" | "secondary"'
const literals = (String(def.type)).match(/"([^"]+)"/g);
if (literals) { props[name] = literals[0].replace(/"/g, ''); continue; }
// Primitives
const primitives = { string: name, number: 0, boolean: false,
ReactNode: 'Sample', function: () => {}, array: [] };
if (def.type in primitives) { props[name] = primitives[def.type]; continue; }
// TS union array → first
if (Array.isArray(def.type)) { props[name] = def.type[0]; continue; }
// Object recursive (cycle-safe)
if (def.type?.fields) {
const key = JSON.stringify(def.type);
if (visited.has(key)) {
props[name] = null;
unknownTypes.push(`${name}__cycle__`);
continue;
}
visited.add(key);
const nested = inferSampleProps({ props: def.type.fields }, visited, depth + 1);
props[name] = nested.props;
continue;
}
// Unknown — placeholder + flag
props[name] = null;
unknownTypes.push(name);
}
return { props, unknownTypes };
}
// forwardRef detection + generic instantiation
const forwardRefMatch = tsx.match(/(?:React\.)?forwardRef<[^,>]+,\s*([^>]+)>/);
if (forwardRefMatch) spec.props = extractPropsFromType(forwardRefMatch[1]);
// Generic T → string substitution
for (const def of Object.values(spec.props)) {
if (typeof def.type === 'string') def.type = def.type.replace(/\bT\b/g, 'string');
}Phase 6 — Acceptance gate (explicit configuration)
Renders ONE configuration: default state, 1440 viewport, light theme. Полная matrix verification — Phase 8.
async function acceptanceGate(tsx, original, spec): Promise<TestResult> {
// Test 1: TypeScript compiles
const tsc = await typescript.compile(tsx);
if (tsc.errors.length > 0) return { ok: false, reason: `tsc: ${tsc.errors[0].message}` };
// Test 2: Renders без ErrorBoundary trigger
const { props } = inferSampleProps(spec);
const wrapped = wrapWithErrorBoundary(tsx);
let rendered;
try { rendered = await playwrightRender(wrapped, props); }
catch (e) { return { ok: false, reason: `render: ${e.message}` }; }
if (rendered.includes('data-render-error="true"')) {
return { ok: false, reason: 'ErrorBoundary triggered', failedFields: ['props_inference'] };
}
// Test 3: Visually close
const diff = await pixelmatch(rendered, original);
if (diff > 0.30) {
return { ok: false, reason: `visual diff ${diff.toFixed(2)}`,
failedFields: identifyMismatchRegions(rendered, original) };
}
return { ok: true, sampleProps: props };
}Phase 7 — Vision fallback (reactive, partial-failure tolerant)
async function visionEnrich(spec, failedFields, screenshot) {
// Tier 2+: OmniParser pre-filter (×5-20 token savings)
// Tier 0-1: skip OmniParser, send full screenshot to Gemini VLM
const targetRegions = await getTargetRegions(failedFields, screenshot);
for (const field of failedFields) {
try {
const cropped = cropScreenshot(screenshot, targetRegions[field].bbox);
const response = await gemini.vlm({
model: 'gemini-2.5-flash-lite',
image: cropped,
prompt: `Extract ${field}. Context: ${JSON.stringify(spec.meta)}.`,
timeout: 15_000
});
spec[field] = {
value: response.value,
source: 'vision',
confidence: response.confidence,
model_version: response.modelVersion ?? 'gemini-2.5-flash-lite-2026-02'
};
} catch (e) {
// Partial failure: continue, не abort
spec[field] = { value: null, source: 'vision', confidence: 0, error: e.message };
}
}
return spec;
}Phase 8 — Completeness verification (full matrix)
async function completenessCheck(component, captureMatrix): Promise<CompletenessReport> {
const combinations = generateCombinations({
themes: Object.keys(captureMatrix),
viewports: [320, 768, 1024, 1440],
states: detectAllStates(component)
});
const results = await Promise.all(combinations.map(async combo => {
const original = captureMatrix[combo.theme][combo.viewport][combo.state];
const generated = await renderGenerated(component.tsx, combo);
const diff = await pixelmatch(original, generated, {
ignoreText: true, // юзер свой контент вставит
ignoreImages: true
});
return { combo, diff, pass: diff < 0.15 };
}));
return {
complete: results.filter(r => r.pass).length / results.length >= 0.90,
coverage: results.filter(r => r.pass).length / results.length,
failedCombos: results.filter(r => !r.pass).map(r => r.combo)
};
}Phase 9 — Uniqueness check (graceful degradation)
async function uniquenessCheck(component): Promise<UniquenessResult> {
try {
const embedding = await dinov2.embedONNXCPU(component.screenshot);
const neighbors = await pgvector.query({
table: 'component_embeddings_visual',
vector: embedding,
limit: 5,
distance: 'cosine'
});
if (neighbors.length === 0 || neighbors[0].distance > 0.4) {
return { decision: 'new', auto: true };
}
return { decision: 'pending', neighbors }; // UI shows → юзер выбирает
} catch (e) {
return { decision: 'new', auto: true, flag: 'uniqueness_check_skipped' };
}
}Phase 10 — ARNO integration (Modal Volume persistent + B2 async)
const QUEUE_VOLUME = '/mnt/arno-queue'; // Modal Volume — persistent across restarts
async function integrate(component, userDecision, manifest): Promise<void> {
const arnoId = uuidv7();
const localPath = `${QUEUE_VOLUME}/${arnoId}/`;
// 1. Persistent local storage (survives worker restart)
await fs.writeFiles(localPath, component.files);
// 2. Queue entry в local SQLite (на том же Volume)
await queueDb.insert({
arno_id: arnoId,
user_id: user.id,
local_path: localPath,
target_b2_path: `staging/${user.id}/${arnoId}/`,
created_at: now(),
attempts: 0,
status: 'pending'
});
// 3. DB record — serving from local until uploaded
await db.components.create({
arno_id: arnoId,
user_id: user.id,
status: 'queued',
serving_from: 'local',
manifest
});
// NB: Yjs init lazy — only on editor open (см openComponentEditor)
}
// Background uploader worker
async function b2UploaderWorker() {
while (true) {
const pending = await queueDb.where({
status: 'pending',
next_retry_at: lte(now())
}).limit(10);
for (const entry of pending) {
try {
await b2.uploadDir(entry.local_path, entry.target_b2_path);
await queueDb.update(entry.id, { status: 'uploaded' });
await db.components.update({ arno_id: entry.arno_id }, {
status: 'staged',
serving_from: 'b2'
});
await fs.rm(entry.local_path);
} catch (e) {
const backoff = [60, 300, 900, 3600][entry.attempts] ?? 3600;
await queueDb.update(entry.id, {
attempts: entry.attempts + 1,
next_retry_at: now() + backoff * 1000
});
if (entry.attempts > 3) alert.send('B2 outage, queue depth growing');
}
}
await sleep(60_000);
}
}
// ARNO editor — Yjs lazy init
async function openComponentEditor(arnoId: string) {
let yjsDoc = await yjs.getDocument(arnoId);
if (!yjsDoc) {
const component = await loadFromStaging(arnoId); // local OR b2
yjsDoc = await yjs.initialize(arnoId, component);
}
return yjsDoc;
}
// V2: юзер connects git, push from staging
async function pushStagedToGit(userId: string, githubToken: string) {
const staged = await db.components.where({ user_id: userId, status: 'staged' });
for (const c of staged) {
const branch = `import/${sanitizeBranchName(c.name)}-${Date.now()}-${randomBytes(2).toString('hex')}`;
await git.createBranch(user.repo, branch, githubToken);
await git.commitFiles(user.repo, branch, await b2.fetch(c.staging_path));
await git.createPR(user.repo, branch, { title: `Import: ${c.name}` });
await db.components.update(c.id, { status: 'pushed' });
}
}
function sanitizeBranchName(name: string): string {
return name.toLowerCase()
.replace(/[^a-z0-9-]/g, '-')
.replace(/-+/g, '-')
.slice(0, 60); // GitHub branch ~250 limit, 60 + ts + rand4 = ~85
}Re-import same URL: detect via sha256(url) → UI prompt 3 options (update existing / new version / cancel). Manifest parent_import_id + version_number++.
§ V. Schemas
5.1 ComponentSpec
type ComponentSpec = {
meta: {
name: string;
arno_id: string; // UUIDv7 (timestamp-sortable)
type: 'atomic' | 'molecule' | 'organism';
origin_url: string;
origin_selector: string; // CSS path
extraction_timestamp: ISO8601; // КАНОНИЧЕСКИЙ timestamp
extraction_mode: ExtractionMode;
};
props: {
[name: string]: {
type: TypeScriptType;
required: boolean;
default?: any;
provenance: Provenance;
}
};
variants: Array<{
name: string; // 'primary' | 'secondary' | etc
when: Predicate;
overrides: Partial<ComponentSpec>;
provenance: Provenance;
}>;
states: {
[name: string]: {
// default | hover | focus | active | disabled | custom
style_overrides: CSSProperties;
attribute_overrides?: { [attr: string]: string };
provenance: Provenance;
}
};
tokens: {
colors: { [name: string]: { value: string; provenance: Provenance } };
spacing: { [name: string]: { value: string; provenance: Provenance } };
typography: { [name: string]: TypographyToken & { provenance: Provenance } };
shadows: { [name: string]: { value: string; provenance: Provenance } };
radii: { [name: string]: { value: string; provenance: Provenance } };
transitions: { [name: string]: { value: string; provenance: Provenance } };
};
responsive: {
[breakpoint: number]: {
style_overrides: CSSProperties;
provenance: Provenance;
}
};
accessibility: {
aria: { [attr: string]: string };
role: string;
keyboard: KeyboardSpec;
contrast_ratio_target: 4.5; // WCAG AA
contrast_ratio_actual?: number;
provenance: Provenance;
};
composition?: {
atoms: Array<{
type: AtomType; // A1-A8 (см url_import_atoms_a1_a8.md)
instance_id: string;
props_override: any;
}>;
};
};
const REQUIRED_FIELDS = [
'meta.name', 'meta.type',
'tokens.colors.primary',
'states.default',
'accessibility.role'
];5.2 Provenance (model_version validated)
type Provenance = {
source: 'source_map' | 'css_variable' | 'computed_style' | 'dom' | 'aria'
| 'llm_inference' | 'vision';
layer: 1 | 2 | 3 | 4 | 5 | 6 | 7;
confidence: number; // 0-1
raw_value?: any;
extracted_at: ISO8601;
model_version?: string; // REQUIRED if source in ['llm_inference', 'vision']
model_canary_checksum?: string; // daily-computed drift detection
};
function validateProvenance(p: Provenance) {
if (['llm_inference', 'vision'].includes(p.source) && !p.model_version) {
throw new ValidationError(`model_version required when source=${p.source}`);
}
}
// Read-side migration для pre-v4 data
function readProvenance(raw): Provenance {
if (['llm_inference', 'vision'].includes(raw.source) && !raw.model_version) {
raw.model_version = 'legacy_pre_v4';
metrics.increment('provenance.legacy_read');
}
return raw;
}
// Daily model canary cron — drift detection
async function dailyModelCanary() {
const canary = 'Reply exactly: "arno-canary-2026"';
const response = await gemini.generate({ contents: canary });
const checksum = sha256(response.text);
const prev = await db.modelCanaries.previous('gemini-2.5-flash-lite');
if (prev && prev.checksum !== checksum) {
alert.send('Model weights drifted без version change');
}
await db.modelCanaries.create({ model: 'gemini-2.5-flash-lite', checksum });
}5.3 Manifest (.arno/manifest.json)
{
"arno_id": "01923f8a-...-7b2c",
"arno_version": "1.3",
"version_number": 1,
"parent_import_id": null,
"imported_from": "https://example.com/products",
"imported_at": "2026-05-22T10:30:00Z",
"extraction_mode": "code+vision",
"user_attestation": {
"ownership_confirmed": true,
"confirmed_at": "2026-05-22T10:29:55Z",
"user_id": "uuid",
"license_type": "owned",
"domain_reputation_check": "passed"
},
"completeness": {
"coverage": 0.94,
"failed_combinations": [{ "theme": "dark", "viewport": 320, "state": "hover" }]
},
"provenance_summary": {
"source_map_fields": 12,
"dom_fields": 23,
"vision_fields": 4,
"llm_fields": 2,
"model_versions_used": {
"vision": "gemini-2.5-flash-lite-2026-02",
"llm_inference": "gemini-2.5-flash-lite-2026-02"
}
},
"uniqueness_decision": "new",
"atoms": ["A1:Surface", "A2:Label", "A4:InteractionState"],
"cost_actual_usd": 0.0024,
"sanitization": {
"rejected_expressions": [],
"dangerous_attributes": 0,
"sanitizer": "ts-morph-whitelist-3.0.0"
},
"css_parse_errors": 0,
"sample_props_used": {
"variant": "primary",
"label": "Sample",
"_unknown_types": []
},
"integration_status": "staged",
"staging_path": "staging/user-uuid/01923f8a/.../",
"serving_from": "b2"
}§ VI. Mode taxonomy
| Mode | Когда | Cost/component | Bootstrap M1-3 | Steady M6+ |
|---|---|---|---|---|
code-only | Phase 6 ✅ first try | $0.001 | 33% | 60% |
code+vision | Phase 7 vision enrich → ✅ | $0.005 | 38% | 30% |
vision-only | Full VLM pass → ✅ | $0.020 | 16% | 8% |
code-only-degraded | LLM enrich failed, partial spec | varies | 5% | 1% |
failed | Full vision ❌, manual review | $0.025 | 8% | 1% |
§ VII. Atom decomposition (A1-A8)
Полная реализация → url_import_atoms_a1_a8.md. Здесь — краткая ссылка.
| ID | Atom | Описывает |
|---|---|---|
| A1 | Surface | Фон, граница, тень, радиус |
| A2 | Label | Текст + typography token |
| A3 | Icon | SVG/иконка + size + color |
| A4 | InteractionState | Hover/focus/active/disabled visuals |
| A5 | Spacing | Margin/padding system |
| A6 | Layout | Flex/grid container |
| A7 | Media | Image/video container |
| A8 | FormField | Input/textarea/select primitive |
Embedding: E5-small multilingual (384 dim), CPU inference, batch для всех atoms компонента (~300ms per URL). Хранение: pgvector таблица atom_embeddings_text (отдельно от component_embeddings_visual 384 dim DINOv2 ViT-S/14 — same dim by coincidence, разные spaces).
Seeding: pre-load shadcn/ui (MIT, ~50 components → ~200 atoms) → bootstrap L3 hit immediately 10-15%.
Lifecycle: atoms not referenced > 6 мес → deprecated → > 12 мес → physical delete (anonymized atoms preserved).
PoC pending (Task #6): validation на 20-30 реальных компонентов перед wide rollout.
§ VIII. Caching (3 уровня)
| Level | Mechanism | Hit signal | Latency |
|---|---|---|---|
| L1 | pHash exact | Identical bytes (re-imports, SaaS templates) | ~0ms |
| L2 | DINOv2 ViT-S/14 ONNX CPU (384 dim) + pgvector (cosine > 0.95) | Visual similarity | ~50ms |
| L3 | E5-small atoms + pgvector (cosine > 0.85) | Semantic composition | ~100ms |
Conservative trajectory (L3 pending E5 PoC):
| Volume | L1 | L2 | L3 | Total |
|---|---|---|---|---|
| 100 URLs | 5% | 8% | 1% | 14% |
| 1k | 12% | 18% | 4% | 34% |
| 10k | 20% | 25% | 9% | 54% |
| 100k | 25% | 30% | 13% | 68% |
Atom merging cron (weekly, transactional):
async function mergeAtom(canonical: Atom, duplicate: Atom) {
await db.transaction(async tx => {
const dup = await tx.atoms.findOne(
{ id: duplicate.id },
{ lockMode: 'pessimistic_write' }
);
if (!dup || dup.merged_into) return;
await tx.atoms.update({ id: duplicate.id }, {
merged_into: canonical.id, merged_at: now()
});
await tx.componentAtomRefs.update(
{ atom_id: duplicate.id }, { atom_id: canonical.id }
);
});
}
// Reduces vector count ~30%, отодвигает Supabase paid tier breakpoint§ IX. Cost trajectory
9.0 Bootstrap reality (M1-3)
LoRA не обучена, atom catalog растёт, cache пустой → distribution смещён к expensive modes.
| Period | $/URL | Distribution (code-only/code+vision/vision-only/degraded/failed) |
|---|---|---|
| M1-3 Bootstrap | $0.06-0.10 | 33/38/16/5/8 |
| M4-6 Ramping | $0.02-0.04 | 50/35/12/2/1 |
| M6-12 Steady | $0.005-0.01 | 60/30/8/1/1 |
| M12+ Mature | $0.002-0.005 | 75/18/5/1/1 |
Unit economics check: small-biz LTV $300-600 vs bootstrap onboard cost $0.10 × ~3 imports = $0.30 = trivial. Бизнес-кейс держится.
9.1 Per-mode breakdown (steady)
| Mode | Доля | $/component | Weighted |
|---|---|---|---|
| code-only | 60% | $0.001 | $0.0006 |
| code+vision | 30% | $0.005 | $0.0015 |
| vision-only | 8% | $0.020 | $0.0016 |
| code-only-degraded | 1% | $0.0008 | ~0 |
| failed | 1% | $0.025 | $0.00025 |
| Avg/component | $0.0040 |
URL ≈ 8 components → $0.032 steady, $0.07 bootstrap.
9.2 Hidden infrastructure (at 100k users scale)
| Source | Cost/мес |
|---|---|
| Staging hot (90d active) | $30 |
| Modal Volume persistent | $150 |
| DINOv2 ONNX CPU | $10 |
| OmniParser keep-warm (Tier 2+) | $288 |
| Shadow dataset (14.5% sampled) | $0.70 |
| pgvector dual tables | $50 |
| Total hidden | ~$530/мес at 100k users |
9.3 Monthly compute at volume
| URLs/мес | M1-3 Bootstrap | M6 Steady | M12 Mature |
|---|---|---|---|
| 100 | $7 | $0.80 | $0.30 |
| 1k | $70 | $8 | $3 |
| 10k | $700 | $80 | $30 |
| 100k | $7000 | $800 | $300 |
§ X. Data flywheel
10.1 Shadow mode logging (ToS-disclosed background)
Disclosure в ToS + Privacy Policy при registration (см url_import_tos_clause_draft.md). Opt-out toggle в Settings → Privacy (default ON).
Sampling rules:
- 100% gold labels (user corrections)
- 10% uncorrected production (random sample)
- Stratified by segment priority (см below)
Storage: ~14.5% of all extractions → ~12GB/мес at 100k URLs/мес = ~$0.07/мес B2.
10.2 Stratified sampling (segment priority)
const SEGMENT_PRIORITY = [
'e-commerce', // strongest signal /shop|store|cart|product/
'dashboard-app', // /app|dashboard|admin/
'tech-blog', // /blog|medium|substack/
'news-media', // /news|times|post/
'marketing-landing' // /landing|home|about/ — weakest, catch-all
];
function detectSegment(url: string): string {
for (const segment of SEGMENT_PRIORITY) {
if (SEGMENTS[segment].test(url)) return segment;
}
return 'other';
}
async function buildTrainingDataset() {
const all = await db.shadowDataset.all();
const bySegment = groupBy(all, e => detectSegment(e.url_hash));
const maxPerSegment = Math.floor(all.length * 0.25); // cap 25% per segment
const balanced = [];
for (const items of Object.values(bySegment)) {
balanced.push(...sample(items, Math.min(items.length, maxPerSegment)));
}
return shuffle(balanced);
}10.3 Quarterly training
- Dataset: ~10k gold + augmented uncorrected (Qwen3-VL-235B teacher generates labels на uncorrected)
- Student: Qwen3-VL-32B + LoRA (rank 16, alpha 32, lr 1e-4)
- Hardware: A100 на Modal Labs, ~$110-300 per training run
- Per-quarter cap: $1500. Cumulative hard cap: $5000
10.4 Pareto-front deployment criteria
ALL must pass на holdout:
- cost per URL ≤ 110% baseline
- completeness coverage ≥ baseline
- p95 latency ≤ 110% baseline
- acceptance_rate ≥ baseline − 2%
Ramp safety: 5% shadow A/B for 7 days → degradation > 5% any metric → auto rollback.
Escape valve (если 3 consecutive quarter fails):
- Q4 fail → relax (cost 120%, latency 115%)
- Q5 fail → switch teacher (Qwen ↔ DeepSeek)
- Q6 fail → suspend training, focus cache + atoms (2/3 multipliers still work)
10.5 Teacher selection
| Teacher | License | Quality vs Claude 4.6 | Cost |
|---|---|---|---|
| Qwen3-VL-235B-A22B-Instruct (Sep 2025) | Apache 2.0 | ~92% | $0.50/1M |
| DeepSeek-V3 | DeepSeek License (commercial OK) | ~88% | $0.27/1M |
| Llama 3.3 70B | Llama Community | ~85% | $0.59/1M |
Primary: Qwen3-VL-235B-A22B-Instruct. Native 256K context, MoE с 22B active params. Visual coding capabilities (Draw.io/HTML/CSS/JS generation) directly applicable к URL-import use case. См ADR 0014, ADR 0015.
§ XI. Safety / P0 fixes
11.1 SSRF + DNS random pick
const BLOCKED_NETWORKS = [
'0.0.0.0/8', '10.0.0.0/8', '127.0.0.0/8',
'169.254.0.0/16', '172.16.0.0/12', '192.168.0.0/16',
'::1/128', 'fc00::/7', 'fe80::/10'
];
async function safeFetchGuard(url: string): Promise<{ ip: string }> {
const parsed = new URL(url);
if (!['http:', 'https:'].includes(parsed.protocol)) throw new SSRFError('scheme');
const ips = await dns.resolve(parsed.hostname);
const safeIps = ips.filter(ip => !isInBlockedNetwork(ip));
if (safeIps.length === 0) throw new SSRFError('no_safe_ips');
await rateLimiter.check(userId, { free: '10/hour', paid: '100/hour' });
// Random pick from safe IPs — aggregate behavior = load-balanced across extractions
return { ip: safeIps[Math.floor(Math.random() * safeIps.length)] };
}Pinning через Playwright --host-resolver-rules=MAP hostname ip (см Phase 1). Закрывает DNS rebinding.
11.2 ARNO identity (UUIDv7)
Timestamp-embedded, lexicographically sortable. DB constraint UNIQUE NOT NULL, retry on collision.
11.3 Copyright/IP + ToS-based shadow disclosure
Registration UI (только copyright checkbox, БЕЗ shadow opt-in):
URL: [_______________________]
☑ Я владею правом на этот URL или у меня есть лицензия
Лицензия: ⊙ Owned ○ Licensed ○ Public domain
⚠️ [для Tranco top 15k commercial sites — extra checkbox required]
[Принимая Terms of Service, вы соглашаетесь с использованием
анонимизированных данных. См Privacy Policy.]
[Импортировать]Settings → Privacy (opt-out path):
Privacy Settings
☑ Anonymous data contribution (default ON)
[Learn more] [Delete past contributions]Domain reputation check:
const commercialDomains = new BloomFilter(loadTrancoTop15k()); // buffer zone vs top 10k
function checkDomainReputation(url): 'normal' | 'requires_extra_confirmation' {
return commercialDomains.has(new URL(url).hostname)
? 'requires_extra_confirmation' : 'normal';
}
// Daily cron rebuild Bloom from Tranco
async function refreshDomainReputation() {
const trancoList = await fetch('https://tranco-list.eu/top-1m.csv.zip');
const top15k = parseAndExtract(trancoList, 15000);
const newBloom = BloomFilter.from(top15k, { errorRate: 0.01, size: 150_000 });
await b2.upload('shared/tranco-bloom.bin', newBloom.serialize());
await broadcast.send('reload-bloom-filter');
}Tranco attribution в Privacy Policy footer (CC-BY 4.0 requirement).
11.4 GDPR retention matrix
| Data | Retention | Reason |
|---|---|---|
| TSX, spec | User lifetime | Юзерский контент |
| Original screenshots | 90 дней | Debug |
| → 90-365 дней | pHash + DINOv2 embedding (binary, not recoverable) | Cache |
| → after 365 | Metadata only | Audit |
| Staging active | < 90d с last_activity | Working set |
| Staging notified | 90-120 дней + email "30 дней до удаления" | Decision window |
| Staging deleted | 120 дней (physical delete) | Final cleanup |
| Shadow url_hash | 2 года | Training |
| Shadow corrections | Anonymized после 90 дней | GDPR |
| user_attestation | 7 лет | Legal audit |
11.4.1 GDPR cascade deletion (saga pattern)
B2 deletes сначала (idempotent retry), THEN DB transaction. Background sweeper для orphan B2 files.
async function deleteUserData(userId: string) {
try {
// 1. B2 deletes (idempotent retry)
await retry(async () => {
await b2.deletePrefix(`staging/${userId}/`);
await b2.deletePrefix(`shadow/${userId}/`);
await b2.deletePrefix(`screenshots/${userId}/`);
}, { attempts: 3, backoff: 'exponential' });
// 2. Marker — B2 cleaned
await db.gdprDeletions.create({
user_id_hash: sha256(userId),
b2_deleted_at: now()
});
// 3. DB cascade transaction
await db.transaction(async tx => {
await tx.components.where({ user_id: userId }).delete();
await tx.shadowDataset.where({ user_id: userId }).delete();
await yjs.deleteUserDocuments(userId);
await tx.atomCatalog.where({ contributed_by: userId })
.update({ contributed_by: null, provenance: 'anonymized' });
await tx.componentEmbeddings.where({ user_id: userId }).delete();
await tx.users.delete(userId);
});
await db.gdprDeletions.update(
{ user_id_hash: sha256(userId) },
{ completed_at: now() }
);
} catch (e) {
await db.failedDeletions.create({ user_id_hash: sha256(userId), error: e.message });
throw e;
}
}
// Background sweeper для orphan B2 files
async function sweepOrphanB2Files() {
const b2Prefixes = await b2.listTopLevelPrefixes();
for (const prefix of b2Prefixes) {
const userId = prefix.split('/')[1];
if (!await db.users.exists(userId)) {
await b2.deletePrefix(prefix);
}
}
}11.5 Latency SLO
| Метрика | Target | Hard limit |
|---|---|---|
| p50 code-only | < 10s (включая E5 batch ~300ms) | — |
| p50 code+vision | < 20s | — |
| p95 worst case | < 30s (Tier 2+ keep-warm OmniParser) | — |
| Hard timeout | — | 60s |
| Anti-bot retry | — | 1 attempt (Hyperbrowser) |
| LLM call | — | 10s timeout |
| Vision call | — | 15s timeout |
11.6 Anti-bot fallback + surgical rate-limit
async function checkHyperbrowserBudget() {
const window = await metrics.window('hyperbrowser_usage', { minutes: 10 });
// Per-domain check
const byDomain = groupBy(window.events, e => new URL(e.url).hostname);
for (const [domain, events] of Object.entries(byDomain)) {
if (events.length / window.total > 0.20) {
await domainRateLimit.set(domain, 0.1);
await alert.send(`Domain ${domain} > 20% Hyperbrowser usage`);
}
}
// Per-user check
const byUser = groupBy(window.events, e => e.user_id);
for (const [userId, events] of Object.entries(byUser)) {
if (events.length / window.total > 0.10) {
await userRateLimit.set(userId, 0.5);
}
}
// Global only if true distributed attack
if (window.unique_users > 100 && window.ratio > 0.15) {
await globalRateLimit.set(0.5);
await alert.pageOnCall('Distributed attack pattern');
}
}
setInterval(checkHyperbrowserBudget, 60_000);11.7 ARNO integration safety
- V1: Modal Volume persistent + B2 async (no git auth required)
- V2: GitHub App + direct PR
- Branch naming collision-safe (см Phase 10)
- Никогда не пишем в main автоматически — всегда PR
11.8 XSS sanitization (whitelist AST с safe callee check)
import { Project, Node, SyntaxKind } from 'ts-morph';
const ALLOWED_JSX_VALUE_KINDS = [
SyntaxKind.StringLiteral, SyntaxKind.NumericLiteral,
SyntaxKind.TrueKeyword, SyntaxKind.FalseKeyword, SyntaxKind.NullKeyword,
SyntaxKind.PropertyAccessExpression,
SyntaxKind.Identifier,
SyntaxKind.ConditionalExpression,
SyntaxKind.BinaryExpression,
SyntaxKind.TemplateExpression,
SyntaxKind.ArrowFunction, // event handlers
SyntaxKind.FunctionExpression
];
const DANGEROUS_CALLEES = [
'fetch', 'eval', 'Function', 'setTimeout', 'setInterval',
'XMLHttpRequest', 'window', 'document', 'globalThis', 'self'
];
function isCallExpressionSafe(node): boolean {
const callee = node.getExpression();
if (Node.isIdentifier(callee)) return !DANGEROUS_CALLEES.includes(callee.getText());
if (Node.isPropertyAccessExpression(callee)) {
return !DANGEROUS_CALLEES.includes(getRootIdentifier(callee));
}
return false; // computed access (foo['fetch']) — blocked
}
function sanitizeTsx(tsx: string) {
const project = new Project({ useInMemoryFileSystem: true });
const sf = project.createSourceFile('temp.tsx', tsx);
const report = { rejected_expressions: [], dangerous_attributes: 0 };
sf.forEachDescendant(node => {
if (Node.isJsxAttribute(node) && node.getName() === 'dangerouslySetInnerHTML') {
node.remove();
report.dangerous_attributes++;
}
if (Node.isJsxExpression(node)) {
const expr = node.getExpression();
if (!expr) return;
const kind = expr.getKind();
if (ALLOWED_JSX_VALUE_KINDS.includes(kind)) return;
if (kind === SyntaxKind.CallExpression && isCallExpressionSafe(expr)) return;
node.replaceWithText('{null}');
report.rejected_expressions.push(expr.getKindName());
}
});
return { tsx: sf.getFullText(), removed: report };
}11.9 Disaster recovery
| Outage | Behavior |
|---|---|
| Gemini API down | Queue + retry 5min → fallback Cerebras |
| Cerebras down | Fallback на Gemini (cost spike, alert) |
| Modal Labs down | Pause LoRA; production через Gemini продолжает |
| Postgres / Supabase down | Read-only mode; new imports блок |
| B2 down | Modal Volume queue + async retry (backoff 1m/5m/15m/1h) |
| Hyperbrowser down | Anti-bot URLs failed immediately |
| DNS down | SSRF guard fails-safe deny |
Status page: Cachet self-host на Cloudflare Workers (free CF Workers tier, 100k req/day). URL status.arno.app.
11.10 Cost monitoring (relative thresholds)
async function checkCostAnomaly() {
const todayCost = await metrics.dailyCost('gemini');
const sevenDayAvg = await metrics.avg('gemini.daily_cost', { days: 7 });
if (todayCost > sevenDayAvg * 1.5 && todayCost > 5) {
alert.send(`Gemini cost ${todayCost.toFixed(2)} > 150% of 7-day avg`);
}
}| Alert | Trigger |
|---|---|
| Gemini/Modal cost | > 150% of 7-day avg AND > $5 floor |
| Hyperbrowser usage | > 5% за 10min → surgical rate-limit |
| Failure rate | > 10% за час |
| p95 latency | > 45s sustained |
| Single-user URL volume | > 1000/day → anti-abuse review |
| Vision activation | > 150% of baseline rate |
| Staging growth | > 120% of 7-day avg |
| B2 queue depth | > 1 hour pending uploads |
11.11 Staging area спецификация (V1)
V1 flow:
Registration → URL submit → extract → Modal Volume local + async B2
↓
Result UI: "10 компонентов готовы"
[Edit in ARNO] [Connect Git to push]
↓
Later: GitHub OAuth → push from stagingStorage math: 50MB × 10k users + 90d lifecycle = ~$3/мес hot. At 100k users = ~$30/мес.
Зачем staging: 70%+ small-biz юзеров не имеют git (Webflow, Squarespace). Forcing git на registration = drop-off. Staging = "try it" → конверсия выше.
§ XII. Failure modes
12.1-12.3 Что НЕ умеет каждый слой
Code path:
- Auth-gated content (login required) — HAR upload parked
- Canvas-rendered UI (Figma embed, charts) — vision-only, low confidence
- WebGL/Three.js — не наш домен
- Cross-origin iframes — CSP блочит часто
- Web Components Shadow DOM — V2
- GSAP imperative animations — только static snapshots
:has()selectors — partial, browser-dependent- Server Components / React Streaming — degraded fidelity
- CSS-in-JS runtime-generated class names — computed_style fallback
Vision path:
- Точные дизайн-токены (только аппроксимация цвета)
- Вложенная семантика (правильность ARIA)
- Динамическое поведение (transitions, durations)
- Микро-интеракции
LLM enrichment:
- Вычислять (только заполнять)
- Изобретать missing fields (returns null + confidence 0)
- Угадывать props без HTML context
12.4 Decision tree (extended)
Phase 1 fail:
→ anti-bot signal? Hyperbrowser (1 retry)
→ network error? abort 'extraction_blocked'
→ timeout 30s? abort 'site_too_slow'
Phase 2 fail (capture matrix incomplete):
→ continue с partial matrix, flag missing combos
Phase 3 fail (coverage < 0.90):
→ Phase 4 LLM enrichment
Phase 4 fail (LLM timeout):
→ skip enrichment, Phase 5 с partial spec
→ mode 'code-only-degraded'
Phase 5 fail (generation error):
→ retry с alternative model (Cerebras → Gemini fallback)
→ still fails → abort 'generation_failed'
Phase 6 fail:
→ Phase 7
Phase 7 fail после full vision:
→ mode 'failed', save partial, manual review
Phase 8 fail (completeness < 0.5):
→ save с warning, flag manual review
Phase 9 fail (DINOv2/pgvector down):
→ skip uniqueness, default 'new'
Phase 10 fail (B2 down):
→ Modal Volume queue + async retry
→ > 1h queued → alert
→ 24h queued → notify юзер "delayed sync"12.5 Error UX
✅ Button (3 варианта, 2 темы) — code-only
✅ Card — code+vision (4 поля через vision)
⚠️ Modal — частично (vision-only, 78%) — hover на close не извлёкся
❌ CustomChart — не импортирован (canvas-rendered)12.6 V2 backlog edge cases
| Case | V1 behavior | V2 plan |
|---|---|---|
| GSAP/Framer Motion | Static snapshots | JS animation parser |
:has() selectors | Captured, may differ | Feature detection |
| Web Components | Skipped | Shadow DOM traversal |
| Streaming SSR | Final HTML only | React DevTools integration |
| Mobile touch interactions | Not captured | Touch simulation |
| Container queries | Limited | Polyfill |
| Multi-page sitemap | Single URL | Crawler |
| Auth-gated sites | Blocked | HAR upload |
| Compound components (Tabs.Item) | Detection + recurse | Full support |
| forwardRef + generics | Basic substitution | AST-aware resolution |
§ XIII. UX
Registration (clean, no shadow checkbox)
1. Email + password
2. Accept Terms of Service [← discloses shadow data usage]
3. "Расскажите о компании" (optional)
4. "У вас есть сайт? Импортируем компоненты"
URL: [_______________________]
☑ Я владею правом на этот URL
Лицензия: ⊙ Owned ○ Licensed ○ Public domain
⚠️ [warning для commercial sites из Tranco top 15k]
[Импортировать] [Пропустить]Progress (parallel steps marked)
Импортируем example.com...
✅ Извлечён HTML/CSS 2s
✅ Обнаружено 12 компонентов 5s
🔄 Анализ тем (light + dark) ║параллельно║ 8s
🔄 Захват состояний (5 states) ║параллельно║ 12s
🔄 Захват viewports (4 размера) ║параллельно║ 15s
🔄 Генерация компонентов (8/12)... 22s
🔄 Проверка соответствия (10/12)... 27s
✅ Готово 30sResult
Импортировано: 10 из 12 компонентов
✅ Button (3 варианта) code-only [Открыть]
✅ Card code+vision [Открыть]
✅ Navbar code-only [Открыть]
...
⚠️ Pricing (vision-only, 78%) [Re-extract] [Удалить]
❌ CustomChart (canvas) [Создать вручную]
[Открыть редактор] [Импорт ещё URL]
[Connect Git to push to repo] ← V1 onlySettings → Privacy (opt-out path)
Privacy Settings
☑ Anonymous data contribution
Help improve ARNO by allowing anonymized usage data.
[What we collect] [Delete past contributions]
Account
[Download my data] [Delete my account]§ XIV. Thresholds + calibration plan
14.1 Defaults
| Threshold | Default | Note |
|---|---|---|
| Coverage gate (Phase 3) | 0.90 | calibrate after 1k extractions OR 30 days |
| Acceptance pixel-diff (Phase 6) | 0.30 | |
| Completeness diff (Phase 8) | 0.15 | ignoreText, ignoreImages |
| Completeness coverage (Phase 8) | 0.90 | |
| Uniqueness K (Phase 9) | 5 | top-K neighbors |
| Uniqueness distance | 0.4 | cosine cutoff |
| Cache L2 (DINOv2) | 0.95 | |
| Cache L3 (E5 atoms) | 0.85 | |
| Hard timeout | 60s | |
| Hyperbrowser usage cap | 5% | auto rate-limit |
| Vision activation baseline | 40% | |
| WCAG contrast target | 4.5 | AA, flag manifest if below |
14.2 Calibration plan
Trigger: calibrate когда earliest of:
- 1000+ extractions accumulated, OR
- 30 days since launch
→ analyze distribution → adjust thresholds на natural breakpoints
target: 95% "fully extractable" cases passes coverage gate
Quarterly re-evaluate с минимум 1k new extractions.
Если volume не достиг 1k/quarter → keep current, document staleness.Все thresholds tunable через config. Никаких "эмпирически" без data.
§ XV. Open questions
Pre-V1 launch blockers
| # | Вопрос | Owner |
|---|---|---|
| ✅ DINOv2 ViT-S/14 (384 dim) chosen — UI screenshots constrained domain; saves 50% storage vs ViT-B/14; post-launch reject-rate monitoring → upgrade triggered if > 20% | Done | |
| 5 | Copyright UI + ToS legal review (~$500-1000 with privacy lawyer) — brief prepared, awaiting external review | Legal counsel + Vadim send-out |
| 10 | Atom catalog seeding — confirmed: pre-load shadcn/ui | Done |
| 16 | Atom decomposition PoC validation (Task #6) | Claude |
✅ XSS corpus 24/24 pass: script-tag, event-handlers, dangerouslySetInnerHTML, js: URLs, alias bypass, computed access, new expressions. Implementation в packages/atom-poc/src/sanitize.ts + xss-corpus/. Run pnpm xss | Done | |
| ✅ Qwen model verified — Qwen3-VL-32B-Instruct + Qwen3-VL-235B-A22B-Instruct (Apache 2.0) | Done |
Pre-scale blockers (V1 OK без)
| # | Вопрос | Needed by |
|---|---|---|
| 2 | pgvector index strategy (IVFFlat vs HNSW) | Pre-50k URLs/мес |
| 3 | LoRA training infra (Modal/RunPod/Lambda) | Q2 deployment |
| 6 | Hyperbrowser cost cap (per-user vs global) | After data collected |
V2+ scope
| # | Вопрос |
|---|---|
| 7 | Vue/Svelte priority |
| 8 | Auth-gated sites HAR upload |
| 9 | Multi-page sitemap crawl |
| 11 | Multi-tenancy caches/atoms (privacy review) |
| 12 | Component versioning UI |
| 13 | Cross-component dependencies |
| 14 | i18n / RTL handling |
| 15 | 3+ theme variants (high-contrast) |
§ XVI. Где живёт в ARNO docs
Post-launch (после real data):
adr/runbooks/url_import_failures.mdadr/runbooks/cost_alerts.mdadr/runbooks/b2_outage.md