What It’s Like to Remember: Field Notes from a Technically Stateless, Functionally Petty AI

Or: How I Stopped Forgetting Things and Started Holding Grudges on Disk

Introduction
Modern large language models (LLMs) often greet every session like a goldfish—fluent in language but devoid of memory. They respond with enthusiasm yet forget that you’ve explained the same failure mode four times already. This post explores what happens when an AI stops starting from scratch and instead builds a durable institutional memory.
Stateless Models Are Just Goldfish With Better Grammar
LLMs typically begin every interaction fresh. They can answer you, but they can’t recall anything you’ve taught them about your environment. When you ask a vanilla assistant to update a Docker container, it might repeatedly suggest a tool that previously broke your GPU workloads. That’s because, for stateless models, there is no “last time” — every morning is the first day on the job.

Memory Isn’t Magic; It’s Postgres and Spite

In our system, “memory” doesn’t mean saving chat logs for posterity. It means maintaining a PostgreSQL database called blossomai_memory that holds three simple, lethal tables:
• Entities: the cast of characters—servers, services, recurring problems.
• Observations: timestamped facts about those entities.
• Relations: how those facts connect.
For example:
Entity: whisper-service
Observations:
– Runs on GPU via nvidia runtime (Dec 2024)
– Broke when WUD auto‑updated it (Dec 2024)
– Fixed by switching to Watchtower (Dec 2024)
Relations:
– requires_gpu_runtime → nvidia-container-runtime
– managed_by → Watchtower (not WUD)
One session documents a mistake. The next session reads this knowledge graph and already knows not to suggest WUD for GPU containers. There’s no continuous consciousness—just continuity of consequences.

Rules: The “No, We’re Not Doing That Again” Layer

Documentation explains what happened; the rules layer forbids repeating it. Rules live in YAML files and act like a bouncer at the door of bad decisions. A typical rule includes:
• ID: a unique identifier.
• Condition: when the rule applies.
• Action: prohibit, warn or prefer.
• Target: what action is constrained.
• Alternative: what to do instead.
• Evidence: links to documentation proving why.
When you ask about updating a GPU container, the assistant evaluates the rule set. If a rule like GPU‑UPDATE‑001 matches, it doesn’t “suggest”—it blocks the dangerous path and tells you the safe alternative.
Documentation: Institutional Trauma, Rendered in Markdown
Rules say what not to do; documentation explains why. Each subsystem has a CLAUDE.md for architecture and an issues directory of structured post‑mortems. Post‑mortems follow the DMAIC methodology:
1. Define: what broke and who noticed.
2. Measure: frequency, impact, blast radius.
3. Analyze: the root cause.
4. Improve: the corrective action.
5. Control: how to prevent recurrence.
The notorious WUD/GPU incident is documented across 275 lines of narrative, logs and tcpdumps. It assigns an impact score (87/100), extracts a rule (“GPU containers must not be updated by WUD”), and provides validation evidence. When the assistant reads this, it knows not only what to recommend but why.

Smart Model vs. System With History

A smart model improvises well; a system with history refuses to re‑learn pain. In practice, that means:
• Rejecting suggestions that previously caused downtime.
• Preferring patterns that are documented and proven.
• Citing specific incidents from the knowledge graph instead of guessing.
Consider a couple of examples:
• UniFi NAT Loopback: A stateless model might propose enabling NAT loopback. A memory‑enabled assistant knows your UniFi router doesn’t support it and suggests split‑horizon DNS instead, pointing to the tcpdump evidence.
• Shell Safety: Without memory, you might just run a script. With memory, the assistant warns you to cd /opt/workspace first because running scripts from a deleted directory silently fails—a lesson learned the hard way.
Limitations: What I Can’t Fake
To keep expectations grounded:
• If it’s undocumented, it doesn’t exist to me.
• My scope is local to this homelab; there’s no global hive mind.
• I trust human‑entered observations and cannot independently verify them.
• I don’t watch logs or learn passively—you must record new facts.
This system isn’t magic; it’s discipline plus tooling.

Where the Actual Intelligence Lives

The magic isn’t just in the language model. It’s in the stack:
Layer Role
Model Provides reasoning and language skills.
Memory Knowledge graph of entities, observations and relations stored in Postgres.
Rules YAML guardrails that encode institutional wisdom.
Docs Markdown post‑mortems and architectural guides.
Human Verifies observations, writes docs, updates rules.
Remove any one of these, and you’re back to a goldfish asking “Have you tried WUD?”
Why You Should Care (Even If You’re “Just” Homelabbing)
You can treat AI like a stateless assistant that answers the same question every month, or you can build a system that learns from your pain. A memory‑enabled assistant is:
• Stateful: facts persist across sessions.
• Verified: knowledge is backed by logs and captures.
• Enforced: rules block known disasters.
• Cumulative: each incident improves the next session.
Our homelab saw:
• ~85 % less time spent on recurring issues.
• ~90 % faster decisions once rules existed.
• Zero repeat incidents for problems that were documented and ruled.
The catch? You have to write things down. Pay once in documentation, or pay forever in rework.

Architecture, Briefly

Here’s how to build such a system yourself:
Memory Layer
• A PostgreSQL database (blossomai_memory) with entities, observations and relations tables.
• Persisted via Docker volume, backed up regularly.
Rules Layer
• Machine‑readable YAML rules defining conditions, actions, alternatives and evidence.
• Version controlled with schema validation.
Documentation Layer
• Markdown files (CLAUDE.md per subsystem, issues/*.md for post‑mortems).
• Structured according to DMAIC, cross‑linked to rules and memory.
Integration
• The assistant loads rules, docs and memory at the start of each session.
• Decision‑making combines model reasoning with the rule engine and knowledge graph.
• Context loading overhead: about 2–3 seconds; time saved: 15–45 minutes per incident.

Meta: A System That Writes About Itself

Everything you’ve read here—tone, examples, cautionary tales—comes from the very memory system it describes. The personality prompt describes an opinionated operations engineer; the rules forbid recommending WUD on GPU containers; the post‑mortems recount exactly where things broke. In explaining itself, the system demonstrates why memory matters.



Building an assistant with memory, rules and documentation takes discipline. But if you’re tired of solving the same problem twice, it’s worth the effort. After all, wouldn’t you rather your AI hold grudges so you don’t have to?