Animaya v2
Modular AI Agent Platform
OverviewContext
Animaya currently deploys per-client OpenClaw (Node.js) containers on a VPS behind Traefik. Each client gets a Telegram bot, a workspace with flat .md files for memory, and bash scripts for management.
Problems Driving This Rebuild
- Updating clients requires restarting containers or sending CLI messages to each one
- No skill/tool marketplace — everything is baked into config
- No billing or usage tracking — all API calls use our keys with no limits
- Memory is flat unstructured
.mdfiles — no indexing, no search, no automatic learning - No testing infrastructure at all
- Skills/tools can't be controlled per-client (no access tiers)
Goal
Build a modular agent runtime from scratch where every component (model, memory, tools, skills, triggers, responses) is pluggable, testable, and independently updatable.
OverviewKey Decisions
- Rust core + Python tools — Rust for the agent loop, memory, triggers, responses. Python for tools executed as subprocesses via JSON protocol
- Container per client — maintains Docker isolation, ~50MB image per container
- Multi-provider LLM with billing — token balance system + BYOK (bring your own key) option
- Tree memory with librarian — hierarchical files, a cheap model periodically organizes and indexes
- Shared read-only volumes — update tools/skills/prompts once, all clients see changes instantly. No restart
- TDD from the start — unit, integration, behavioral, and E2E tests before features
OverviewComponent Model
Every component is defined as a Rust trait, making it pluggable. New variants can be added without changing the core.
🔒 Shared (read-only)
📝 Per-Client (read-write)
Section 1Agent Loop
The central state machine. A trigger arrives, context is assembled, the model is called. The model either produces a final response or requests tool calls. Tool results feed back into the loop until the model responds with text or a safety limit is reached.
Telegram, Cron, Webhook, CLI
system prompt + identity + memory TOC + skills + tools + history
LLM generates response
Telegram / Owner / File
record usage, deduct credits
Python subprocess
Safety Limits
- Max 10 tool-call turns per run (configurable)
- Model API failures: retry 3x with exponential backoff
- Tool failures: error reported back to model as tool result
- If turn limit reached: force final response via injected system message
Section 2Memory Tree
Memory is a file tree on disk. The model doesn't see the full tree — it gets a table of contents and can request specific files via tools.
Memory Tools
| Tool | Description |
|---|---|
memory_read(path) | Read a memory file |
memory_write(path, content, summary) | Write/update + auto-update TOC entry |
memory_list(prefix?) | List entries by path prefix |
memory_search(query) | Keyword search against summaries |
memory_delete(path) | Delete a memory file |
The Librarian
A background process inside each container that maintains memory quality. Uses a cheap fast model (GLM-4.7-FlashX at ~1/20th the cost of the primary model).
When it runs
After every 5 agent runs, OR when total memory exceeds 50K tokens, OR daily at 3 AM
Cost per run
~2,000 credits (~667 primary tokens). Nearly free.
Librarian Operations (5 steps)
- Rebuild TOC — scan all files, regenerate
_toc.jsonwith summaries and token counts - Merge duplicates — if two
learned/files cover the same topic, merge them - Prune conversations — summarize logs older than 7 days, move summaries to
clients/{user}.md, delete raw logs - Compress large files — if any file exceeds 2000 tokens, compress while preserving all facts
- Promote scratch — if a scratch file has been referenced 3+ times, move to
knowledge/
Section 3Tool Protocol
Tools are Python scripts that communicate with the Rust runtime over stdin/stdout JSON-RPC.
📦 Tool Layout
tools/web_search/
manifest.json — schema
main.py — implementation
requirements.txt — deps
🛡 Sandboxing
Timeout enforced by Rust (kills process group). No env var inheritance. No memory filesystem access. Non-root user.
Manifest Example (web_search)
{
"name": "web_search",
"description": "Search the web using Brave Search API",
"parameters": {
"type": "object",
"properties": {
"query": { "type": "string", "description": "Search query" }
},
"required": ["query"]
},
"timeout_seconds": 30,
"requires_secrets": ["BRAVE_API_KEY"],
"tier": "standard"
}
Protocol: Request & Response
Rust → Python (stdin)
{
"method": "execute",
"id": "call_abc123",
"params": {
"arguments": { "query": "weather in Moscow" },
"secrets": { "BRAVE_API_KEY": "..." }
}
}
Python → Rust (stdout)
{
"id": "call_abc123",
"result": { "status": "success", "output": "Results for: weather..." }
}
Section 4Skill System
Skills are markdown instruction files that tell the model when and how to use tools. Unlike tools (code), skills are pure prompt engineering.
Activation Modes
| Mode | Behavior |
|---|---|
auto | Instructions always included in context. Model decides when to follow them. |
manual | Listed in TOC but not loaded. Model requests via skill_load tool. |
disabled | Turned off for this client. |
Skill Example: Appointment Scheduling
# Appointment Scheduling ## When to Activate - Client asks to book/schedule an appointment - Client asks about availability ## Steps 1. Read knowledge/services.md to check available services 2. Read knowledge/schedule.md to check working hours 3. Ask client which service they need 4. Ask for preferred date/time 5. Notify owner via owner_notify with requires_approval: true 6. Tell client: "I've sent your request to [owner]." ## Required Tools - memory_read - owner_notify
Per-Client Access Control
Each client has skills.json with overrides. Admins control which skills each client can access — this is the monetization lever: basic skills included, premium skills require a higher tier.
{
"overrides": {
"appointment_scheduling": { "activation": "auto" },
"crm_integration": { "activation": "disabled" }
}
}
Section 5Billing & Model Router
Each client has a credit balance. 1 credit = 1 token at base model price. Better models cost more credits.
| Model | Input (credits/token) | Output (credits/token) |
|---|---|---|
| GLM-4.7 (primary) | 1.0 | 3.0 |
| GLM-4.7-FlashX (cheap) | 0.1 | 0.3 |
| GLM-4.6V Flash (vision) | 1.5 | 3.0 |
| Whisper/Groq (audio) | 0.5/second | — |
BYOK (Bring Your Own Key)
Power users provide their own API key. When active: their key is used, no credits charged, usage still logged for analytics, key stored encrypted in .env.
When Balance Runs Out
- Before each call, estimate cost. If insufficient → block
- User sees: "I've reached my usage limit. Contact [owner] to add credits."
- Owner gets notification: "Your assistant's balance is exhausted."
- Low-balance warnings at 10% threshold
Section 6Owner Communication
"Texting the owner" is a first-class response channel — distinct from regular Telegram messages. It supports structured notifications and approval workflows.
| Type | Description | Example |
|---|---|---|
| Info | No response needed | "Client @maria asked about pricing" |
| Approval | Owner taps Confirm/Decline | "@maria wants Tue 15:00 — [Confirm] [Decline]" |
| Alert | System events | "Token balance below 10%" |
Approval Flow
owner_notifyrequires_approval: true
[Confirm] [Decline] buttons
owner_notify is a built-in tool (Rust, not Python) because it needs direct Telegram API access. Pending approvals timeout after 12h with a configurable default action.
Section 11Parallel Task Execution
The agent handles multiple tasks concurrently. A user can send a long-running request, then immediately send another. Tasks are queued, executed in parallel, and their progress is visible in real time.
bounded, max 5 queued
Concurrency
Max 3 parallel agent runs per client (configurable). Prevents runaway costs.
Queue Limit
Max 5 queued tasks. If full: "I'm busy, try again in a moment."
Priority
Owner messages > Client messages > System triggers (heartbeat, cron).
Cancellation
Owner can cancel via /cancel in Telegram or the Web UI.
Telegram Status Messages
Status messages are edited in-place as the task progresses:
1. ClinicA — full-service...
2. TherapyPro — online...
Section 12Web UI
A web dashboard for owners to monitor their assistant. Served from {slug}.animaya.me behind the existing Telegram auth.
| Feature | Description |
|---|---|
| Live Tasks | Queued/running/completed tasks in real time |
| Agent Thinking | Expandable view: prompt assembled, model responses, tool calls, tool results |
| Conversations | Full conversation history with client metadata |
| Memory Browser | Navigate the memory tree, read/edit files, see TOC |
| Skill Manager | Toggle skills on/off, request new skills |
| Usage Dashboard | Token usage charts, credit balance, cost breakdown |
| Settings | Model preferences, language, notification prefs |
Architecture
Browser ──WebSocket──> animaya-server (Rust)
│
├── /ws/tasks → live task updates
├── /ws/thinking → agent thinking stream
│
├── /api/conversations
├── /api/memory
├── /api/skills
├── /api/usage
└── /api/settings
Agent Thinking Stream (WebSocket events)
// Context assembled { "event": "context_assembled", "data": { "skills_loaded": ["scheduling", "greeting"], "total_tokens": 8500 }} // Model response with tool call { "event": "model_response", "data": { "tool_calls": [{"name": "web_search", "args": {"query": "..."}}] }} // Tool result { "event": "tool_result", "data": { "tool": "web_search", "status": "success", "duration_ms": 1200 }}
The static SPA is built at Docker image build time (Svelte/Preact/vanilla JS). No Node.js runtime needed in the container. Auth reuses existing Telegram Login Widget — same domain, same cookie.
Section 8Testing Strategy
~5 tests, real Telegram
~50 tests, mocked model
~100 tests, real subprocess
~500 tests, pure functions
Unit (Rust)
Token counting, context assembly, memory TOC parsing, billing, config, skill activation
Integration
Tool protocol end-to-end, memory read/write, tool timeout handling
Behavioral
Mocked model. Verify prompts, tool calls, response routing, owner notifications
E2E
Real Telegram API. Send /start, text, voice. Trigger booking, verify owner notification
Behavioral Test Example (Rust)
#[test] fn test_appointment_triggers_owner_notification() { let mock = MockModelRouter::with_responses(vec![ response_with_tool_call("owner_notify", ...), response_text("I've notified the doctor!"), ]); let trigger = telegram_message("@maria", "Book me for Tue 3pm"); let result = runtime.run(trigger); assert!(result.tool_calls[0].name == "owner_notify"); }
Section 9Project Structure
animaya-v2/ crates/ animaya-core/ # Traits, types, interfaces animaya-runtime/ # Agent loop + context assembly animaya-memory/ # Filesystem memory + TOC + librarian animaya-tools/ # Subprocess tool executor animaya-models/ # Model router + billing animaya-triggers/ # Telegram (teloxide), cron, webhook animaya-responses/ # Telegram, owner notifications, file animaya-server/ # Main binary tools/ # Python tools (shared volume) web_search/ audio_transcribe/ image_analyze/ ... skills/ # Markdown skills (shared volume) appointment_scheduling/ greeting/ reminder_setting/ ... shared/ # System prompts, guardrails, pricing docker/ Dockerfile # Multi-stage: Rust build → Python slim docker-compose.yml # Traefik + auth + onboarding docker-compose.client.yml scripts/ tests/ auth/ onboarding/
Docker Image (Multi-stage)
rust:1.77-slim → cargo build --release → python:3.12-slim + binary + tool deps
Result: ~50MB image (vs ~200MB for Python-only)
Section 10Client Configuration
{
"slug": "drsmith",
"agent": {
"model": "zai/glm-4.7",
"max_turns": 10,
"temperature": 0.7,
"language": "ru"
},
"owner": {
"telegram_username": "@drivanov",
"telegram_chat_id": 123456789
},
"billing": {
"mode": "prepaid",
"balance_credits": 500000
},
"heartbeat": { "interval_minutes": 30 },
"librarian": {
"enabled": true,
"run_every_n_turns": 5,
"model": "zai/glm-4.7-flashx"
}
}
RolloutMigration from v1
| v1 (OpenClaw) | v2 (Animaya) |
|---|---|
workspace/SOUL.md | memory/identity/soul.md |
workspace/OWNER.md | memory/identity/owner.md |
workspace/BOOTSTRAP.md | Replaced by onboarding skill |
workspace/HEARTBEAT.md | Replaced by built-in cron |
workspace/*.md | memory/learned/{name}.md |
openclaw.json | config.json (simpler) |
Migration Phases
- Build v2 with feature parity to current OpenClaw setup
- Run v2 in parallel for one test client, compare behavior
- Migrate existing clients (automated script)
- Remove OpenClaw dependency
RolloutImplementation Timeline
Total: ~14.5 weeks for one developer. Phases 2-4 can run in parallel with multiple devs.
Phase 1 — animaya-core
All traits and types (Trigger, Response, Memory, Tool, Skill, Model, Billing)
1 weekPhase 2 — animaya-memory
Filesystem store + TOC generation. Can parallelize with phases 3-4.
1 weekPhase 3 — animaya-tools
Subprocess executor + manifest loader. Can parallelize.
1 weekPhase 4 — animaya-models
OpenAI-compat client + billing ledger. Can parallelize.
1 weekPhase 5 — animaya-runtime
Agent loop + context assembly + task queue
2 weeksPhase 6 — animaya-triggers
Telegram (teloxide), cron scheduler
1 weekPhase 7 — animaya-responses
Telegram (with status msg editing), owner notifications
1 weekPhase 8 — animaya-server
Wire everything + Docker + HTTP/WebSocket API
1.5 weeksPhase 9 — Skills system
Skill registry, activation modes, per-client access control
1 weekPhase 10 — Librarian
Background memory organization process
1 weekPhase 11 — Web UI
SPA dashboard with WebSocket live updates
2 weeksPhase 12 — Migration
Migration tooling + deploy scripts
1 weekRolloutEdge Cases & Risks
⚠ Race Conditions
Heartbeat + Telegram arriving simultaneously. Solution: per-agent mutex, one trigger at a time.
⚠ Tool Hangs
Python subprocess stuck. Solution: kill process group (not just process) after timeout.
⚠ TOC Staleness
memory_write updates TOC immediately, don't wait for librarian.
⚠ Context Overflow
Too many auto-skills. Solution: sort by priority, 3K token budget, rest on-demand.
⚠ Telegram Rate Limits
~20 msg/min/chat. Solution: response channel implements queue with rate limiting.
⚠ Task Cost Explosion
3 parallel tasks burning credits. Solution: per-task token budget.
⚠ Parallel Memory Writes
Concurrent tasks writing same file. Solution: file-level locks in memory store.
⚠ BYOK Validation
Invalid API key. Solution: test call before accepting.
Generated with Claude Code