ADR 0013 — "Чем дольше живём — тем меньше платим" как hard requirement

Date: 2026-05-22
Status: Accepted
Feature: URL-import
Affects: url_import_spec.md § IX (cost trajectory), § VIII (caching), § X (flywheel)

Context

При обсуждении стека и cost trajectory юзер сформулировал product requirement:

"Нужно придумать решение на перспективу — чем дольше живём, тем меньше платим. Это не просьба — это требование к фиче."

Это не nice-to-have, это hard requirement. URL-import не должен иметь constant cost — он должен активно дешеветь с usage scale и time.

3 независимых cost reduction множителя (если один пробуксует, остальные работают):

Три уровня caching:

Conservative trajectory:

Volume	L1	L2	L3	Total
100 URLs	5%	8%	1%	14%
1k	12%	18%	4%	34%
10k	20%	25%	9%	54%
100k	25%	30%	13%	68%

Catalog растёт → больше hits → каждый последующий URL дешевле.

Каждая extraction в production пишет в shadow_dataset (ToS-disclosed, см ADR 0017). Юзерские corrections = gold labels.

Quarterly LoRA fine-tune Qwen3-VL-32B (Apache 2.0, см ADR 0014) на этом dataset → in-house модель замещает paid Gemini calls.

Cheap operations route through free tier; expensive only when необходимо.

Если каждый множитель зависит от других → катастрофа одного топит всё. Например:

LoRA training fails (data quality issue) → if cost reduction depended only on LoRA → стоимость не падает
Cache invalidation бag → if depended only on cache → стоимость не падает

Independent design: cache работает даже без LoRA, tier routing работает даже без cache, LoRA работает даже без atoms catalog seeded.

3-30× total reduction over 12 months. Hard requirement satisfied.

Pros:

Long-term competitive cost advantage
LTV economics scale (small-biz LTV $300-600 vs cost $0.002 = excellent unit economics)
Marketing message: "ARNO gets cheaper the more you use it"

Cons:

Acceptable because юзер explicit prefer this trade-off.

Risk	Mitigation
LoRA training never reaches pareto deployment	Escape valve: relax criteria, switch teacher, then suspend (§ X.4)
Cache hit rate plateaus low (юзеры импортируют very diverse sites)	Atom-level cache (L3) catches composition reuse даже когда visuals diverse
Catalog поллютится seeded atoms	Pre-load shadcn только (MIT, high quality), не all-of-npm

❌ Static cost — может быть competitive сегодня, через 2 года competitors с flywheel дешевле
❌ Не satisfies hard requirement

❌ Single point of failure — если оптимизация broken, no fallback
❌ Lower total reduction (cache → ~50% saving, LoRA → ~30% saving, combined > 70%)