Animaya v2

Modular AI Agent Platform

Rust Core Python Tools Docker Isolated Multi-Provider LLM Token Billing

OverviewContext

Animaya currently deploys per-client OpenClaw (Node.js) containers on a VPS behind Traefik. Each client gets a Telegram bot, a workspace with flat .md files for memory, and bash scripts for management.

Problems Driving This Rebuild

Goal

Build a modular agent runtime from scratch where every component (model, memory, tools, skills, triggers, responses) is pluggable, testable, and independently updatable.

OverviewKey Decisions

OverviewComponent Model

Every component is defined as a Rust trait, making it pluggable. New variants can be added without changing the core.

Triggers Telegram, Cron, Webhook
🧠 AI Model Multi-provider Router
📚 Memory Tree + Librarian
🔧 Tools Python Subprocess
🎯 Skills Markdown Prompts
💬 Responses Telegram, Owner, File
Container Architecture (per client)

🔒 Shared (read-only)

system_prompt.md
guardrails.md
skills/
tools/
pricing.json
version.txt

📝 Per-Client (read-write)

config.json
skills.json
memory/
billing/
.env (secrets)

Section 1Agent Loop

The central state machine. A trigger arrives, context is assembled, the model is called. The model either produces a final response or requests tool calls. Tool results feed back into the loop until the model responds with text or a safety limit is reached.

⚡ Trigger
Telegram, Cron, Webhook, CLI
Context Assembler
system prompt + identity + memory TOC + skills + tools + history
Model Call
LLM generates response
Has tool calls?
No
💬 Response Router
Telegram / Owner / File
Billing Ledger
record usage, deduct credits
Yes
Execute Tools
Python subprocess
↻ append results, loop back to Model Call

Safety Limits

Section 2Memory Tree

Memory is a file tree on disk. The model doesn't see the full tree — it gets a table of contents and can request specific files via tools.

memory/
_toc.json auto-generated
identity/
soul.md always-load
rules.md always-load
owner.md always-load
knowledge/
services.md
pricing.md
faq.md
schedule.md
clients/
{username}.md
conversations/
{session_id}.jsonl
learned/
{topic}.md
scratch/

Memory Tools

ToolDescription
memory_read(path)Read a memory file
memory_write(path, content, summary)Write/update + auto-update TOC entry
memory_list(prefix?)List entries by path prefix
memory_search(query)Keyword search against summaries
memory_delete(path)Delete a memory file

The Librarian

A background process inside each container that maintains memory quality. Uses a cheap fast model (GLM-4.7-FlashX at ~1/20th the cost of the primary model).

When it runs

After every 5 agent runs, OR when total memory exceeds 50K tokens, OR daily at 3 AM

Cost per run

~2,000 credits (~667 primary tokens). Nearly free.

Librarian Operations (5 steps)
  1. Rebuild TOC — scan all files, regenerate _toc.json with summaries and token counts
  2. Merge duplicates — if two learned/ files cover the same topic, merge them
  3. Prune conversations — summarize logs older than 7 days, move summaries to clients/{user}.md, delete raw logs
  4. Compress large files — if any file exceeds 2000 tokens, compress while preserving all facts
  5. Promote scratch — if a scratch file has been referenced 3+ times, move to knowledge/

Section 3Tool Protocol

Tools are Python scripts that communicate with the Rust runtime over stdin/stdout JSON-RPC.

📦 Tool Layout

tools/web_search/
  manifest.json — schema
  main.py — implementation
  requirements.txt — deps

🛡 Sandboxing

Timeout enforced by Rust (kills process group). No env var inheritance. No memory filesystem access. Non-root user.

Manifest Example (web_search)
{
  "name": "web_search",
  "description": "Search the web using Brave Search API",
  "parameters": {
    "type": "object",
    "properties": {
      "query": { "type": "string", "description": "Search query" }
    },
    "required": ["query"]
  },
  "timeout_seconds": 30,
  "requires_secrets": ["BRAVE_API_KEY"],
  "tier": "standard"
}
Protocol: Request & Response

Rust → Python (stdin)

{
  "method": "execute",
  "id": "call_abc123",
  "params": {
    "arguments": { "query": "weather in Moscow" },
    "secrets": { "BRAVE_API_KEY": "..." }
  }
}

Python → Rust (stdout)

{
  "id": "call_abc123",
  "result": { "status": "success", "output": "Results for: weather..." }
}

Section 4Skill System

Skills are markdown instruction files that tell the model when and how to use tools. Unlike tools (code), skills are pure prompt engineering.

Activation Modes

ModeBehavior
autoInstructions always included in context. Model decides when to follow them.
manualListed in TOC but not loaded. Model requests via skill_load tool.
disabledTurned off for this client.
Skill Example: Appointment Scheduling
# Appointment Scheduling

## When to Activate
- Client asks to book/schedule an appointment
- Client asks about availability

## Steps
1. Read knowledge/services.md to check available services
2. Read knowledge/schedule.md to check working hours
3. Ask client which service they need
4. Ask for preferred date/time
5. Notify owner via owner_notify with requires_approval: true
6. Tell client: "I've sent your request to [owner]."

## Required Tools
- memory_read
- owner_notify

Per-Client Access Control

Each client has skills.json with overrides. Admins control which skills each client can access — this is the monetization lever: basic skills included, premium skills require a higher tier.

{
  "overrides": {
    "appointment_scheduling": { "activation": "auto" },
    "crm_integration": { "activation": "disabled" }
  }
}

Section 5Billing & Model Router

Agent Loop
ModelRouter
BalanceChecker
LLM API
BillingLedger

Each client has a credit balance. 1 credit = 1 token at base model price. Better models cost more credits.

ModelInput (credits/token)Output (credits/token)
GLM-4.7 (primary)1.03.0
GLM-4.7-FlashX (cheap)0.10.3
GLM-4.6V Flash (vision)1.53.0
Whisper/Groq (audio)0.5/second

BYOK (Bring Your Own Key)

Power users provide their own API key. When active: their key is used, no credits charged, usage still logged for analytics, key stored encrypted in .env.

When Balance Runs Out

  1. Before each call, estimate cost. If insufficient → block
  2. User sees: "I've reached my usage limit. Contact [owner] to add credits."
  3. Owner gets notification: "Your assistant's balance is exhausted."
  4. Low-balance warnings at 10% threshold

Section 6Owner Communication

"Texting the owner" is a first-class response channel — distinct from regular Telegram messages. It supports structured notifications and approval workflows.

TypeDescriptionExample
InfoNo response needed"Client @maria asked about pricing"
ApprovalOwner taps Confirm/Decline"@maria wants Tue 15:00 — [Confirm] [Decline]"
AlertSystem events"Token balance below 10%"

Approval Flow

Client asks to book appointment
Agent calls owner_notify
requires_approval: true
Owner gets Telegram msg
[Confirm] [Decline] buttons
Owner taps "Confirm"
ApprovalResponse trigger fires
Client gets confirmation

owner_notify is a built-in tool (Rust, not Python) because it needs direct Telegram API access. Pending approvals timeout after 12h with a configurable default action.

Section 11Parallel Task Execution

The agent handles multiple tasks concurrently. A user can send a long-running request, then immediately send another. Tasks are queued, executed in parallel, and their progress is visible in real time.

Task Queue & Executor
Incoming Triggers
Task Queue
bounded, max 5 queued
Worker 1
Worker 2
Worker 3

Concurrency

Max 3 parallel agent runs per client (configurable). Prevents runaway costs.

Queue Limit

Max 5 queued tasks. If full: "I'm busy, try again in a moment."

Priority

Owner messages > Client messages > System triggers (heartbeat, cron).

Cancellation

Owner can cancel via /cancel in Telegram or the Web UI.

Telegram Status Messages

Status messages are edited in-place as the task progresses:

Research the top 5 competitors in Moscow
⏳ Queued (position 1)
↓ edited
🔄 Working... Searching the web
↓ edited
🔄 Working... Analyzing 5 results
↓ edited
✅ Here are the top 5 competitors in Moscow:
1. ClinicA — full-service...
2. TherapyPro — online...

Section 12Web UI

A web dashboard for owners to monitor their assistant. Served from {slug}.animaya.me behind the existing Telegram auth.

FeatureDescription
Live TasksQueued/running/completed tasks in real time
Agent ThinkingExpandable view: prompt assembled, model responses, tool calls, tool results
ConversationsFull conversation history with client metadata
Memory BrowserNavigate the memory tree, read/edit files, see TOC
Skill ManagerToggle skills on/off, request new skills
Usage DashboardToken usage charts, credit balance, cost breakdown
SettingsModel preferences, language, notification prefs

Architecture

Browser ──WebSocket──> animaya-server (Rust)
                           │
                           ├── /ws/tasks     → live task updates
                           ├── /ws/thinking  → agent thinking stream
                           │
                           ├── /api/conversations
                           ├── /api/memory
                           ├── /api/skills
                           ├── /api/usage
                           └── /api/settings
Agent Thinking Stream (WebSocket events)
// Context assembled
{ "event": "context_assembled",
  "data": { "skills_loaded": ["scheduling", "greeting"],
            "total_tokens": 8500 }}

// Model response with tool call
{ "event": "model_response",
  "data": { "tool_calls": [{"name": "web_search",
            "args": {"query": "..."}}] }}

// Tool result
{ "event": "tool_result",
  "data": { "tool": "web_search", "status": "success",
            "duration_ms": 1200 }}

The static SPA is built at Docker image build time (Svelte/Preact/vanilla JS). No Node.js runtime needed in the container. Auth reuses existing Telegram Login Widget — same domain, same cookie.

Section 7Shared Volume Strategy

What ChangedAction Needed
Tools, skills, promptsUpdate files in shared/, bump version.txt. Runtime detects on next run. No restart.
Client configUpdate config.json. Runtime watches file. No restart.
Runtime binaryBuild new Docker image, rolling-restart containers

Section 8Testing Strategy

E2E
~5 tests, real Telegram
Behavioral
~50 tests, mocked model
Integration
~100 tests, real subprocess
Unit Tests
~500 tests, pure functions

Unit (Rust)

Token counting, context assembly, memory TOC parsing, billing, config, skill activation

Integration

Tool protocol end-to-end, memory read/write, tool timeout handling

Behavioral

Mocked model. Verify prompts, tool calls, response routing, owner notifications

E2E

Real Telegram API. Send /start, text, voice. Trigger booking, verify owner notification

Behavioral Test Example (Rust)
#[test]
fn test_appointment_triggers_owner_notification() {
    let mock = MockModelRouter::with_responses(vec![
        response_with_tool_call("owner_notify", ...),
        response_text("I've notified the doctor!"),
    ]);
    let trigger = telegram_message("@maria", "Book me for Tue 3pm");
    let result = runtime.run(trigger);
    assert!(result.tool_calls[0].name == "owner_notify");
}

Section 9Project Structure

animaya-v2/
  crates/
    animaya-core/          # Traits, types, interfaces
    animaya-runtime/       # Agent loop + context assembly
    animaya-memory/        # Filesystem memory + TOC + librarian
    animaya-tools/         # Subprocess tool executor
    animaya-models/        # Model router + billing
    animaya-triggers/      # Telegram (teloxide), cron, webhook
    animaya-responses/     # Telegram, owner notifications, file
    animaya-server/        # Main binary

  tools/                   # Python tools (shared volume)
    web_search/ audio_transcribe/ image_analyze/ ...

  skills/                  # Markdown skills (shared volume)
    appointment_scheduling/ greeting/ reminder_setting/ ...

  shared/                  # System prompts, guardrails, pricing

  docker/
    Dockerfile             # Multi-stage: Rust build → Python slim
    docker-compose.yml     # Traefik + auth + onboarding
    docker-compose.client.yml

  scripts/ tests/ auth/ onboarding/

Docker Image (Multi-stage)

rust:1.77-slimcargo build --releasepython:3.12-slim + binary + tool deps
Result: ~50MB image (vs ~200MB for Python-only)

Section 10Client Configuration

{
  "slug": "drsmith",
  "agent": {
    "model": "zai/glm-4.7",
    "max_turns": 10,
    "temperature": 0.7,
    "language": "ru"
  },
  "owner": {
    "telegram_username": "@drivanov",
    "telegram_chat_id": 123456789
  },
  "billing": {
    "mode": "prepaid",
    "balance_credits": 500000
  },
  "heartbeat": { "interval_minutes": 30 },
  "librarian": {
    "enabled": true,
    "run_every_n_turns": 5,
    "model": "zai/glm-4.7-flashx"
  }
}

RolloutMigration from v1

v1 (OpenClaw)v2 (Animaya)
workspace/SOUL.mdmemory/identity/soul.md
workspace/OWNER.mdmemory/identity/owner.md
workspace/BOOTSTRAP.mdReplaced by onboarding skill
workspace/HEARTBEAT.mdReplaced by built-in cron
workspace/*.mdmemory/learned/{name}.md
openclaw.jsonconfig.json (simpler)

Migration Phases

  1. Build v2 with feature parity to current OpenClaw setup
  2. Run v2 in parallel for one test client, compare behavior
  3. Migrate existing clients (automated script)
  4. Remove OpenClaw dependency

RolloutImplementation Timeline

Total: ~14.5 weeks for one developer. Phases 2-4 can run in parallel with multiple devs.

Phase 1 — animaya-core

All traits and types (Trigger, Response, Memory, Tool, Skill, Model, Billing)

1 week

Phase 2 — animaya-memory

Filesystem store + TOC generation. Can parallelize with phases 3-4.

1 week

Phase 3 — animaya-tools

Subprocess executor + manifest loader. Can parallelize.

1 week

Phase 4 — animaya-models

OpenAI-compat client + billing ledger. Can parallelize.

1 week

Phase 5 — animaya-runtime

Agent loop + context assembly + task queue

2 weeks

Phase 6 — animaya-triggers

Telegram (teloxide), cron scheduler

1 week

Phase 7 — animaya-responses

Telegram (with status msg editing), owner notifications

1 week

Phase 8 — animaya-server

Wire everything + Docker + HTTP/WebSocket API

1.5 weeks

Phase 9 — Skills system

Skill registry, activation modes, per-client access control

1 week

Phase 10 — Librarian

Background memory organization process

1 week

Phase 11 — Web UI

SPA dashboard with WebSocket live updates

2 weeks

Phase 12 — Migration

Migration tooling + deploy scripts

1 week

RolloutEdge Cases & Risks

⚠ Race Conditions

Heartbeat + Telegram arriving simultaneously. Solution: per-agent mutex, one trigger at a time.

⚠ Tool Hangs

Python subprocess stuck. Solution: kill process group (not just process) after timeout.

⚠ TOC Staleness

memory_write updates TOC immediately, don't wait for librarian.

⚠ Context Overflow

Too many auto-skills. Solution: sort by priority, 3K token budget, rest on-demand.

⚠ Telegram Rate Limits

~20 msg/min/chat. Solution: response channel implements queue with rate limiting.

⚠ Task Cost Explosion

3 parallel tasks burning credits. Solution: per-task token budget.

⚠ Parallel Memory Writes

Concurrent tasks writing same file. Solution: file-level locks in memory store.

⚠ BYOK Validation

Invalid API key. Solution: test call before accepting.

Animaya v2 Architecture Document — February 2026
Generated with Claude Code