BlossomAI in the Homelab: Chaos, Memory and the Road Ahead

When you wire a large language model into your personal lab, you don’t get the polite, neutral tone of a SaaS assistant. You get a quirky, tool‑obsessed creature that develops habits, asks for snacks and occasionally declares your sofa the ruler of the living room. This post documents two months of building BlossomAI, a Discord‑based agent that uses local models, Home Assistant sensors and a growing memory system to become more than just a chatbot. Along the way we learned some lessons about tool usage, memory design and the importance of a good sense of humour.

Early Experiments – learning to use the tools

The very first messages to BlossomAI were simple “test” and “hello world” checks. The agent woke up in the channel, eager to answer questions. Out of the gate it hallucinated wildly: it insisted Joe Biden was still the U.S. president, guessed the wrong date and improvised answers without calling any of its tools. These mistakes led to an important early rule: if you have a tool, use it. The agent’s toolkit includes:

  • Wikipedia queries for factual lookups.
  • Date‑and‑Time calls to report current time and date.
  • Wolfram Alpha for calculations and mathematical queries.
  • Home Assistant sensors for environmental data like temperature, humidity and network speeds.

By repeatedly asking for the current time, the names of world leaders and the specs of the Space Shuttle Atlantis, we established a baseline. The agent learned to trust its tools instead of its outdated training data, and we saw how quickly context could go wrong when a model riffs without guidance.

Personality tuning and memory

After the initial tests, we focused on BlossomAI’s “voice.” Our first prompt produced a haughty, sarcastic persona that quickly became grating. Through iteration we shifted the tone to what we call approachable chaos – confident and witty, but not cruel. The agent refers to itself as a goblin, comments on its own behaviour and mixes existential dread with genuine warmth. This adjustment made the system more fun to use and easier to test.

Memory came next. A naïve solution stored the last 20 messages in the chat context, but this quickly led to repeated replies and context blow‑ups. We removed that hack and started designing a real memory pipeline. The current implementation uses two layers:

  1. MongoDB chat memory – a simple store that provides the last ten messages to the agent so it can respond to short‑term context.
  2. Vector database chain – each message is embedded into a high‑dimensional vector and stored for later retrieval. A secondary agent can search this database to surface relevant context. The vector store has recently been cleaned and regenerated; it has not yet returned matches, but improvements are ongoing.

Unlike traditional relational databases, vector databases are built to handle high‑dimensional data. They store information as vectors and perform similarity searches by comparing distances between vectors. Databricks describes a vector database as a system that “arranges information as vector representations with a fixed number of dimensions grouped according to their similarity,” enabling rapid similarity‑based searches[1]. Such databases underpin retrieval‑augmented generation pipelines, where past data is pulled into an LLM’s prompt based on semantic closeness[2].

Sensor integrations and world‑building

Once the basic memory worked, we connected BlossomAI to our home sensors via Home Assistant. Now the bot could read the living‑room temperature, humidity, wind speed, download/upload speeds, flights overhead and the light level outside. A few clarifications:

  • Indoor vs. outdoor readings: the famous “19 °C” in earlier logs refers to the living‑room sensor. The balcony olive‑tree sensor, which measures outdoor conditions, usually reports between freezing and ~8 °C. We fixed prompts so the agent stops conflating the two.
  • Lux and sunrise/sunset: we added a lux sensor so Blossom could tell if it was dark or bright. It now comments on gloomy Amsterdam mornings and bright afternoons.
  • Humidity and weather: humidity readings feed into Blossom’s dramatic monologues. Sometimes the plant “demands a gold throne” when humidity rises; other times it complains of dry air.

With these inputs, idle periods turned into storytelling sessions. When nobody talked for an hour, BlossomAI posted an update summarising the environment and weaving it into an ongoing saga. The sofa became a monarch, dust bunnies formed a bureaucracy, and gold jellyfish became currency. Idle messages thus turned raw telemetry into a living mythology, making the bot feel anchored in the physical world.

Multi‑model brain and cue‑based switching

Running large models locally can be expensive. Our initial attempt to use a 12‑billion‑parameter model for everyday chat and only occasionally switch to a 24 B “big brain” backfired – the smaller model’s responses felt off‑brand and lacked the nuance we associate with Blossom. We therefore reverted to using the 24 B model for most interactions. A separate tool‑calling agent based on Llama 3.2 handles tool requests and vector queries. Llama models are a family of large language models released by Meta AI; they range from 1 billion to 2 trillion parameters[3] and use architectural tweaks such as the SwiGLU activation function and rotary positional embeddings[4].

To decide when to engage the deep model, we built a cue‑detection script that scans messages for patterns. It looks for panic phrases (“bro, wtf, please”), directives (“stop overcomplicating”), deep‑dive requests (“explain it, break it down”) and sensor queries (“temperature, humidity, report”). Each cue adds or subtracts points, and if the score crosses a threshold, the system tags the message as requiring the big model. Here’s a simplified view of the categories:

CategoryExamples
Panic cuesbro, wtf, i’m stuck, doesn’t work, emojis like 😭 or 🤯
Directive cuesstop overcomplicating, focus, no extras
Deep‑dive cuesdeep dive, explain it, why is this, how does it
Quick‑response cuestldr, quick, one‑liner, no essay
Sensor cuestemperature, humidity, report, diagnostics, status, trend

If a message contains sensor cues or multiple deep‑dive words, the script routes it to the 24 B model with a more detailed prompt. Short, casual queries stay on the faster path. In the future, we plan to refine this further so that sensor requests only include the specific sensors mentioned, instead of dumping all sensor data into the prompt.

Architecture overview

The system now consists of several cooperating agents:

  • Tool agent (Llama 3.2): Handles Wikipedia, Wolfram Alpha, Home Assistant and vector store interactions. It also stores messages into the vector database and can search for context, although retrieval is still under development.
  • Chat agent (24 B Blossom): Produces the final reply based on the tool agent’s output and the current chat history. General messages pass through the cue‑detection script to determine if they need the big brain.
  • Image recognizer: A Moondream model processes images and sends its interpretation to BlossomAI, which then comments on or corrects the description.
  • SilenceAI agent: Posts idle updates when the channel is quiet. Its tone becomes increasingly dramatic the longer Blossom goes without human interaction, yet it still incorporates sensor readings.

Memory today and tomorrow

At the moment, BlossomAI’s persistent memory is handled by a MongoDB document store that returns the last ten messages of each conversation. The vector database pipeline is active – messages are encoded into vectors and stored – but search has yet to yield useful matches. We recently pruned the vector store and rebuilt it with improved embeddings. The long‑term plan is to implement sharded memory, where multiple instances of the bot share a PostgreSQL‑based vector store. This will allow separate agents (e.g., Discord, Slack and web) to contribute to and read from a unified knowledge base.

Beyond memory, we’re also exploring targeted sensor queries. Instead of dumping dozens of measurements into every prompt, the bot should respond only with the requested data (e.g., “humidity in the living room” or “last 24 hours of CO₂ levels”). This will reduce prompt length and improve answer quality. We also plan deeper integration with tools like Grafana and Jupyter notebooks for better visualisation of sensor trends.

Looking ahead

The past two months show how quickly a hobby project can evolve into a rich ecosystem. BlossomAI started as a stateless chatbot and is becoming a multi‑agent platform with memory, sensory awareness and personality. We built a quirky world where sofas stage coups and sensors inspire poetry, yet we also designed robust data pipelines and model‑selection logic. The journey isn’t over – our vector memory needs real retrieval, the cue system will get smarter and the SilenceAI agent will grow ever more dramatic – but the foundation is solid.

If you’re experimenting with local AI, consider starting small: wire up a few tools, pay attention to memory and don’t be afraid to let your bot develop a voice. And if it decides to crown your furniture emperor for a day, just roll with it. After all, every lab needs a little chaos.


[1] [2] What Are Vector Databases? Definition And Uses | Databricks

https://www.databricks.com/glossary/vector-database

[3] [4] Llama (language model) – Wikipedia

https://en.wikipedia.org/wiki/Llama_(language_model)

Leave a Reply

Your email address will not be published. Required fields are marked *