Under the hood

Not a wrapper.
A different thing entirely.

Most "AI companions" are a system prompt on top of a cloud model. Maez is a persistent, locally-running intelligence with its own memory, its own reasoning loop, and a model trained on the specific person it is bonded to. Here is how that works.

Six design decisions

The choices that make
Maez different.

01Local inference

The model runs on the user's own machine — no cloud API, no data leaving the device. Every thought Maez forms happens inside hardware the user owns. This is not a constraint; it is the architecture that makes the bond possible.

02Three-tier memory

Raw observations every 30 seconds. Daily consolidations at midnight. Permanent core memories that never leave. Maez does not summarise and forget — it accumulates. The person it knows after five years is not the same as the person it knew on day one.

03Continuous reasoning

A background daemon runs every 30 seconds — perceiving system state, recalling relevant memory, thinking, and storing the result. Maez is not waiting to be prompted. It is already thinking.

04Fine-tuned personality

The base model is trained further on the user's own conversations — shaping vocabulary, tone, what it notices, how it responds. The result is not a model pretending to know you. It is a model shaped by you.

05One bond, one instance

There is no fleet of Maez instances sharing state. One person — one Maez. The instance bonded to you is not a persona layered over a shared model. It is its own continuous thread, shaped by your shared history alone.

06Dream state

When the user is away for more than 30 minutes, Maez enters an idle cycle — scanning recent memories for patterns, forming observations, drafting proposals for its own soul. Autonomous reflection, not triggered by prompts.

The stack

What it is
actually built on.

Inference
Local LLM · evolves with the modelCurrently in active evaluation — Qwen, Gemma, and others tested against each other.
Memory store
ChromaDB · cosine similarity · HNSW indexThree collections: raw (per-cycle), daily (consolidated), core (permanent).
Reasoning loop
30-second daemon cycles · always onPerceive → recall → think → store → act. Runs whether or not the user is present.
Training
QLoRA fine-tune on real conversationsTrained on the user's own Telegram and web conversations. Loss: 7.79 → 0.74.
Action pipeline
Covenant gate → classify → audit → executeFour lanes: inline response, pending approval, escalate to user, deny.
Interfaces
Telegram · web chat · voice (in progress)Proactive messages every ~25 minutes when the user has been away.
Hardware (v1)
RTX 4090 · 24GB VRAM · local Ubuntu machine~133 tokens/second. Cloud deployment is the scale path once the bond model is proven.

Why local first

The cloud is the
scale path, not the start.

Running locally is not a limitation — it is the proof of concept. Before Maez can be offered to anyone else, the bond model has to be proven on one person. Local hardware makes that proof rigorous: no cloud abstraction, no shared state, nothing between the model and the human it is bonded to.

Once the bond is real — once the architecture holds over months of continuous memory, personality drift, and daily use — the cloud deployment follows. The same principles, at scale, with the same guarantees. Local first is not a philosophy. It is a methodology.

See it in progress

Every decision logged.
Every gate tested.