Animaya v2

Modular AI Agent Platform

Rust Core Python Tools Docker Isolated Multi-Provider LLM Token Billing

OverviewContext

Animaya currently deploys per-client OpenClaw (Node.js) containers on a VPS behind Traefik. Each client gets a Telegram bot, a workspace with flat .md files for memory, and bash scripts for management.

Problems Driving This Rebuild

Updating clients requires restarting containers or sending CLI messages to each one
No skill/tool marketplace — everything is baked into config
No billing or usage tracking — all API calls use our keys with no limits
Memory is flat unstructured .md files — no indexing, no search, no automatic learning
No testing infrastructure at all
Skills/tools can't be controlled per-client (no access tiers)

Goal

Build a modular agent runtime from scratch where every component (model, memory, tools, skills, triggers, responses) is pluggable, testable, and independently updatable.

OverviewKey Decisions

Rust core + Python tools — Rust for the agent loop, memory, triggers, responses. Python for tools executed as subprocesses via JSON protocol
Container per client — maintains Docker isolation, ~50MB image per container
Multi-provider LLM with billing — token balance system + BYOK (bring your own key) option
Tree memory with librarian — hierarchical files, a cheap model periodically organizes and indexes
Shared read-only volumes — update tools/skills/prompts once, all clients see changes instantly. No restart
TDD from the start — unit, integration, behavioral, and E2E tests before features

OverviewComponent Model

Every component is defined as a Rust trait, making it pluggable. New variants can be added without changing the core.

⚡ Triggers Telegram, Cron, Webhook

🧠 AI Model Multi-provider Router

📚 Memory Tree + Librarian

🔧 Tools Python Subprocess

🎯 Skills Markdown Prompts

💬 Responses Telegram, Owner, File

Container Architecture (per client)

🔒 Shared (read-only)

system_prompt.md

guardrails.md

skills/

tools/

pricing.json

version.txt

📝 Per-Client (read-write)

config.json

skills.json

memory/

billing/

.env (secrets)

Section 1Agent Loop

The central state machine. A trigger arrives, context is assembled, the model is called. The model either produces a final response or requests tool calls. Tool results feed back into the loop until the model responds with text or a safety limit is reached.

⚡ Trigger
Telegram, Cron, Webhook, CLI

Context Assembler
system prompt + identity + memory TOC + skills + tools + history

Model Call
LLM generates response

Has tool calls?

💬 Response Router
Telegram / Owner / File

Billing Ledger
record usage, deduct credits

Yes

Execute Tools
Python subprocess

↻ append results, loop back to Model Call

Safety Limits

Max 10 tool-call turns per run (configurable)
Model API failures: retry 3x with exponential backoff
Tool failures: error reported back to model as tool result
If turn limit reached: force final response via injected system message

Section 2Memory Tree

Memory is a file tree on disk. The model doesn't see the full tree — it gets a table of contents and can request specific files via tools.

memory/

_toc.json auto-generated

identity/

soul.md always-load

rules.md always-load

owner.md always-load

knowledge/

services.md

pricing.md

faq.md

schedule.md

clients/

{username}.md

conversations/

{session_id}.jsonl

learned/

{topic}.md

scratch/

Memory Tools

Tool	Description
`memory_read(path)`	Read a memory file
`memory_write(path, content, summary)`	Write/update + auto-update TOC entry
`memory_list(prefix?)`	List entries by path prefix
`memory_search(query)`	Keyword search against summaries
`memory_delete(path)`	Delete a memory file

The Librarian

A background process inside each container that maintains memory quality. Uses a cheap fast model (GLM-4.7-FlashX at ~1/20th the cost of the primary model).

When it runs

After every 5 agent runs, OR when total memory exceeds 50K tokens, OR daily at 3 AM

Cost per run

~2,000 credits (~667 primary tokens). Nearly free.

Librarian Operations (5 steps)

Rebuild TOC — scan all files, regenerate _toc.json with summaries and token counts
Merge duplicates — if two learned/ files cover the same topic, merge them
Prune conversations — summarize logs older than 7 days, move summaries to clients/{user}.md, delete raw logs
Compress large files — if any file exceeds 2000 tokens, compress while preserving all facts
Promote scratch — if a scratch file has been referenced 3+ times, move to knowledge/

Section 3Tool Protocol

Tools are Python scripts that communicate with the Rust runtime over stdin/stdout JSON-RPC.

📦 Tool Layout

tools/web_search/
  manifest.json — schema
  main.py — implementation
  requirements.txt — deps

🛡 Sandboxing

Timeout enforced by Rust (kills process group). No env var inheritance. No memory filesystem access. Non-root user.

Manifest Example (web_search)

{
  "name": "web_search",
  "description": "Search the web using Brave Search API",
  "parameters": {
    "type": "object",
    "properties": {
      "query": { "type": "string", "description": "Search query" }
    },
    "required": ["query"]
  },
  "timeout_seconds": 30,
  "requires_secrets": ["BRAVE_API_KEY"],
  "tier": "standard"
}

Protocol: Request & Response

Rust → Python (stdin)

{
  "method": "execute",
  "id": "call_abc123",
  "params": {
    "arguments": { "query": "weather in Moscow" },
    "secrets": { "BRAVE_API_KEY": "..." }
  }
}

Python → Rust (stdout)

{
  "id": "call_abc123",
  "result": { "status": "success", "output": "Results for: weather..." }
}

Section 4Skill System

Skills are markdown instruction files that tell the model when and how to use tools. Unlike tools (code), skills are pure prompt engineering.

Activation Modes

Mode	Behavior
`auto`	Instructions always included in context. Model decides when to follow them.
`manual`	Listed in TOC but not loaded. Model requests via `skill_load` tool.
`disabled`	Turned off for this client.

Skill Example: Appointment Scheduling

# Appointment Scheduling

## When to Activate
- Client asks to book/schedule an appointment
- Client asks about availability

## Steps
1. Read knowledge/services.md to check available services
2. Read knowledge/schedule.md to check working hours
3. Ask client which service they need
4. Ask for preferred date/time
5. Notify owner via owner_notify with requires_approval: true
6. Tell client: "I've sent your request to [owner]."

## Required Tools
- memory_read
- owner_notify

Per-Client Access Control

Each client has skills.json with overrides. Admins control which skills each client can access — this is the monetization lever: basic skills included, premium skills require a higher tier.

{
  "overrides": {
    "appointment_scheduling": { "activation": "auto" },
    "crm_integration": { "activation": "disabled" }
  }
}

Section 5Billing & Model Router

Agent Loop

→

ModelRouter

→

BalanceChecker

→

LLM API

→

BillingLedger

Each client has a credit balance. 1 credit = 1 token at base model price. Better models cost more credits.

Model	Input (credits/token)	Output (credits/token)
GLM-4.7 (primary)	1.0	3.0
GLM-4.7-FlashX (cheap)	0.1	0.3
GLM-4.6V Flash (vision)	1.5	3.0
Whisper/Groq (audio)	0.5/second	—

BYOK (Bring Your Own Key)

Power users provide their own API key. When active: their key is used, no credits charged, usage still logged for analytics, key stored encrypted in .env.

When Balance Runs Out

Before each call, estimate cost. If insufficient → block
User sees: "I've reached my usage limit. Contact [owner] to add credits."
Owner gets notification: "Your assistant's balance is exhausted."
Low-balance warnings at 10% threshold

Section 6Owner Communication

"Texting the owner" is a first-class response channel — distinct from regular Telegram messages. It supports structured notifications and approval workflows.

Type	Description	Example
Info	No response needed	"Client @maria asked about pricing"
Approval	Owner taps Confirm/Decline	"@maria wants Tue 15:00 — [Confirm] [Decline]"
Alert	System events	"Token balance below 10%"

Approval Flow

Client asks to book appointment

Agent calls owner_notify
requires_approval: true

Owner gets Telegram msg
[Confirm] [Decline] buttons

Owner taps "Confirm"

ApprovalResponse trigger fires

Client gets confirmation

owner_notify is a built-in tool (Rust, not Python) because it needs direct Telegram API access. Pending approvals timeout after 12h with a configurable default action.

Section 11Parallel Task Execution

The agent handles multiple tasks concurrently. A user can send a long-running request, then immediately send another. Tasks are queued, executed in parallel, and their progress is visible in real time.

Task Queue & Executor

Incoming Triggers

Task Queue
bounded, max 5 queued

Worker 1

Worker 2

Worker 3

Concurrency

Max 3 parallel agent runs per client (configurable). Prevents runaway costs.

Queue Limit

Max 5 queued tasks. If full: "I'm busy, try again in a moment."

Priority

Owner messages > Client messages > System triggers (heartbeat, cron).

Cancellation

Owner can cancel via /cancel in Telegram or the Web UI.

Telegram Status Messages

Status messages are edited in-place as the task progresses:

Research the top 5 competitors in Moscow

⏳ Queued (position 1)

↓ edited

🔄 Working... Searching the web

↓ edited

🔄 Working... Analyzing 5 results

↓ edited

✅ Here are the top 5 competitors in Moscow:
1. ClinicA — full-service...
2. TherapyPro — online...

Section 12Web UI

A web dashboard for owners to monitor their assistant. Served from {slug}.animaya.me behind the existing Telegram auth.

Feature	Description
Live Tasks	Queued/running/completed tasks in real time
Agent Thinking	Expandable view: prompt assembled, model responses, tool calls, tool results
Conversations	Full conversation history with client metadata
Memory Browser	Navigate the memory tree, read/edit files, see TOC
Skill Manager	Toggle skills on/off, request new skills
Usage Dashboard	Token usage charts, credit balance, cost breakdown
Settings	Model preferences, language, notification prefs

Architecture

Browser ──WebSocket──> animaya-server (Rust)
                           │
                           ├── /ws/tasks     → live task updates
                           ├── /ws/thinking  → agent thinking stream
                           │
                           ├── /api/conversations
                           ├── /api/memory
                           ├── /api/skills
                           ├── /api/usage
                           └── /api/settings

Agent Thinking Stream (WebSocket events)

// Context assembled
{ "event": "context_assembled",
  "data": { "skills_loaded": ["scheduling", "greeting"],
            "total_tokens": 8500 }}

// Model response with tool call
{ "event": "model_response",
  "data": { "tool_calls": [{"name": "web_search",
            "args": {"query": "..."}}] }}

// Tool result
{ "event": "tool_result",
  "data": { "tool": "web_search", "status": "success",
            "duration_ms": 1200 }}

The static SPA is built at Docker image build time (Svelte/Preact/vanilla JS). No Node.js runtime needed in the container. Auth reuses existing Telegram Login Widget — same domain, same cookie.

Section 7Shared Volume Strategy

What Changed	Action Needed
Tools, skills, prompts	Update files in `shared/`, bump `version.txt`. Runtime detects on next run. No restart.
Client config	Update `config.json`. Runtime watches file. No restart.
Runtime binary	Build new Docker image, rolling-restart containers

Section 8Testing Strategy

▲

E2E
~5 tests, real Telegram

Behavioral
~50 tests, mocked model

Integration
~100 tests, real subprocess

Unit Tests
~500 tests, pure functions

Unit (Rust)

Token counting, context assembly, memory TOC parsing, billing, config, skill activation

Integration

Tool protocol end-to-end, memory read/write, tool timeout handling

Behavioral

Mocked model. Verify prompts, tool calls, response routing, owner notifications

E2E

Real Telegram API. Send /start, text, voice. Trigger booking, verify owner notification

Behavioral Test Example (Rust)

#[test]
fn test_appointment_triggers_owner_notification() {
    let mock = MockModelRouter::with_responses(vec![
        response_with_tool_call("owner_notify", ...),
        response_text("I've notified the doctor!"),
    ]);
    let trigger = telegram_message("@maria", "Book me for Tue 3pm");
    let result = runtime.run(trigger);
    assert!(result.tool_calls[0].name == "owner_notify");
}

Section 9Project Structure

animaya-v2/
  crates/
    animaya-core/          # Traits, types, interfaces
    animaya-runtime/       # Agent loop + context assembly
    animaya-memory/        # Filesystem memory + TOC + librarian
    animaya-tools/         # Subprocess tool executor
    animaya-models/        # Model router + billing
    animaya-triggers/      # Telegram (teloxide), cron, webhook
    animaya-responses/     # Telegram, owner notifications, file
    animaya-server/        # Main binary

  tools/                   # Python tools (shared volume)
    web_search/ audio_transcribe/ image_analyze/ ...

  skills/                  # Markdown skills (shared volume)
    appointment_scheduling/ greeting/ reminder_setting/ ...

  shared/                  # System prompts, guardrails, pricing

  docker/
    Dockerfile             # Multi-stage: Rust build → Python slim
    docker-compose.yml     # Traefik + auth + onboarding
    docker-compose.client.yml

  scripts/ tests/ auth/ onboarding/

Docker Image (Multi-stage)

rust:1.77-slim → cargo build --release → python:3.12-slim + binary + tool deps
Result: ~50MB image (vs ~200MB for Python-only)

Section 10Client Configuration

{
  "slug": "drsmith",
  "agent": {
    "model": "zai/glm-4.7",
    "max_turns": 10,
    "temperature": 0.7,
    "language": "ru"
  },
  "owner": {
    "telegram_username": "@drivanov",
    "telegram_chat_id": 123456789
  },
  "billing": {
    "mode": "prepaid",
    "balance_credits": 500000
  },
  "heartbeat": { "interval_minutes": 30 },
  "librarian": {
    "enabled": true,
    "run_every_n_turns": 5,
    "model": "zai/glm-4.7-flashx"
  }
}

RolloutMigration from v1

v1 (OpenClaw)	v2 (Animaya)
`workspace/SOUL.md`	`memory/identity/soul.md`
`workspace/OWNER.md`	`memory/identity/owner.md`
`workspace/BOOTSTRAP.md`	Replaced by onboarding skill
`workspace/HEARTBEAT.md`	Replaced by built-in cron
`workspace/*.md`	`memory/learned/{name}.md`
`openclaw.json`	`config.json` (simpler)

Migration Phases

Build v2 with feature parity to current OpenClaw setup
Run v2 in parallel for one test client, compare behavior
Migrate existing clients (automated script)
Remove OpenClaw dependency

RolloutImplementation Timeline

Total: ~14.5 weeks for one developer. Phases 2-4 can run in parallel with multiple devs.

Phase 1 — animaya-core

All traits and types (Trigger, Response, Memory, Tool, Skill, Model, Billing)

1 week

Phase 2 — animaya-memory

Filesystem store + TOC generation. Can parallelize with phases 3-4.

1 week

Phase 3 — animaya-tools

Subprocess executor + manifest loader. Can parallelize.

1 week

Phase 4 — animaya-models

OpenAI-compat client + billing ledger. Can parallelize.

1 week

Phase 5 — animaya-runtime

Agent loop + context assembly + task queue

2 weeks

Phase 6 — animaya-triggers

Telegram (teloxide), cron scheduler

1 week

Phase 7 — animaya-responses

Telegram (with status msg editing), owner notifications

1 week

Phase 8 — animaya-server

Wire everything + Docker + HTTP/WebSocket API

1.5 weeks

Phase 9 — Skills system

Skill registry, activation modes, per-client access control

1 week

Phase 10 — Librarian

Background memory organization process

1 week

Phase 11 — Web UI

SPA dashboard with WebSocket live updates

2 weeks

Phase 12 — Migration

Migration tooling + deploy scripts

1 week

RolloutEdge Cases & Risks

⚠ Race Conditions

Heartbeat + Telegram arriving simultaneously. Solution: per-agent mutex, one trigger at a time.

⚠ Tool Hangs

Python subprocess stuck. Solution: kill process group (not just process) after timeout.

⚠ TOC Staleness

memory_write updates TOC immediately, don't wait for librarian.

⚠ Context Overflow

Too many auto-skills. Solution: sort by priority, 3K token budget, rest on-demand.

⚠ Telegram Rate Limits

~20 msg/min/chat. Solution: response channel implements queue with rate limiting.

⚠ Task Cost Explosion

3 parallel tasks burning credits. Solution: per-task token budget.

⚠ Parallel Memory Writes

Concurrent tasks writing same file. Solution: file-level locks in memory store.

⚠ BYOK Validation

Invalid API key. Solution: test call before accepting.

Animaya v2 Architecture Document — February 2026
Generated with Claude Code