The Session That Was Never Observed: Building Active Memory for AI Agents
How we built ClawVault's Active Session Observer — threshold scaling, byte cursors, and the 14MB session that exposed a blind spot in our memory architecture.
Updated
The Session That Was Never Observed: Building Active Memory for AI Agents
ClawVault is a structured memory system for AI agents. It watches session transcripts, compresses them into observations, routes decisions and lessons into categorized vault files, and gives agents continuity across context resets. It's on npm at v2.4.5.
This post is about the Active Session Observer, shipped in v2.1.0 — what it does, how the internals work, and the embarrassing bug we found while dogfooding it.
The Problem: Agents Forget Everything
When an AI agent's context window resets, everything from the previous session is gone. The transcript sits on disk as a .jsonl file, but nobody reads it. The agent wakes up blank.
ClawVault's observer pipeline solves this by watching transcript files, compressing new content into structured observations (decisions, lessons, facts), and writing them into vault files the agent loads on startup. Think of it as an always-on note-taker that produces structured output.
The original observer (SessionWatcher) uses chokidar to watch a directory of session files. When a file changes, it reads the new bytes from the last known offset, parses the JSONL lines, feeds them to a compressor (LLM-backed), and flushes observations to disk. This works well for sessions that are finished — the file stops changing, the watcher catches up, observations get written.
But what about the session that's currently active? The one accumulating content right now, where the agent is making decisions?
Architecture: Byte Cursors and Threshold Scaling
The Active Session Observer (active-session-observer.ts) is designed around two ideas: byte cursors that track how far we've read into each session file, and scaled thresholds that control how often we trigger observation based on file size.
Byte Cursor Store
The cursor store lives at .clawvault/observe-cursors.json inside the vault:
export interface ObserveCursorEntry {
lastObservedOffset: number;
lastObservedAt: string;
sessionKey: string;
lastFileSize: number;
}
export type ObserveCursorStore = Record<string, ObserveCursorEntry>;
Each session ID maps to an entry tracking the byte offset where we last stopped reading. On the next sweep, we open the file, seek to lastObservedOffset, and only read new content. No re-processing of already-observed material.
The cursor store is loaded, validated with strict type checks (every field must pass isFiniteNonNegative or string checks), and saved atomically after processing. If the file is corrupted or missing, we start fresh — the worst case is re-observing content, which the deduplication layer handles.
If a file shrinks (truncation, rotation), the cursor detects it:
const startOffset = previousOffset <= fileSize ? previousOffset : 0;
Cursor resets to zero. We re-observe the whole file. Better to duplicate than to miss.
Threshold Scaling
Not every session needs the same observation frequency. A 200KB session from a quick Telegram exchange doesn't need the same treatment as a 6MB main session where the agent has been working for hours.
const ONE_MIB = 1024 * 1024;
export function getScaledObservationThresholdBytes(fileSizeBytes: number): number {
if (fileSizeBytes < ONE_MIB) {
return 50 * 1024; // 50KB for small sessions
}
if (fileSizeBytes <= 5 * ONE_MIB) {
return 150 * 1024; // 150KB for medium sessions
}
return 300 * 1024; // 300KB for large sessions
}
Small sessions change character fast — 50KB of new content in a sub-1MB session probably represents significant new context. Large sessions accumulate noise (tool output, repeated patterns), so we wait for 300KB of new content before spending an LLM call on compression.
This isn't scientifically derived. We tuned it by watching real sessions during dogfooding. The numbers will probably change. The design — scaling thresholds by file size — is the part that matters.
Candidate Selection
On each sweep, we discover all session files, load the cursor store, and build a candidate list:
function selectCandidates(
descriptors: SessionDescriptor[],
cursors: ObserveCursorStore,
minNewBytes?: number
): ActiveObservationCandidate[] {
const candidates: ActiveObservationCandidate[] = [];
for (const descriptor of descriptors) {
const fileSize = stat.size;
const previousOffset = cursor?.lastObservedOffset ?? 0;
const startOffset = previousOffset <= fileSize ? previousOffset : 0;
const newBytes = Math.max(0, fileSize - startOffset);
const thresholdBytes = minNewBytes
?? getScaledObservationThresholdBytes(fileSize);
if (newBytes < thresholdBytes) continue;
candidates.push({ sessionId, filePath, fileSize, startOffset, newBytes, thresholdBytes });
}
return candidates;
}
Only sessions with enough new bytes make the cut. The rest are skipped — their cursors stay where they are, ready for next time.
Incremental Reading
When a candidate is selected, we don't load the whole file. We stream from the byte offset:
async function readIncrementalMessages(
filePath: string,
startOffset: number
): Promise<IncrementalReadResult> {
const stream = fs.createReadStream(filePath, { start: startOffset });
for await (const chunk of stream) {
// Split on newlines, parse each JSONL line,
// extract role + content text
// Track exact byte offset for cursor update
}
return { messages, nextOffset };
}
Each JSONL line is parsed into a normalized role: content string. The parser handles both direct {role, content} objects and wrapped {type: "message", message: {role, content}} formats. Content can be a string, an array of content blocks, or nested objects — extractContentText recursively walks all of them.
The key detail: nextOffset is tracked at byte granularity, not line count. If we read halfway through a file and the process dies, we resume from the exact byte — no partial lines, no gaps.
Hook Integration
The active session observer doesn't run on a timer. It's triggered by OpenClaw gateway hooks:
gateway:startup— When the agent boots, sweep all sessions. Catches anything that accumulated while the agent was offline.command:new— When a new session starts (context reset), observe the session that just ended.
These hooks call observeActiveSessions(), which runs the full pipeline: discover sessions → load cursors → select candidates → read incremental content → compress → flush → update cursors.
Messages are tagged with a source label parsed from the session key (main, telegram-dm, discord, etc.) so observations carry provenance. When you read "decided to use Railway for deployment" in your observations, you can tell if that came from the main coding session or a Telegram side conversation.
The 14MB Session That Was Never Observed
Here's where it gets honest.
We built the Active Session Observer, shipped it in v2.1.0, and started dogfooding. Weeks later, while debugging an unrelated issue, I checked the cursor store and noticed something: the main session — agent:clawdious:main — had never been observed.
The file was 14MB. Thousands of messages. Every major decision, every debugging session, every architectural choice. None of it had been compressed into observations.
Why? The hook timing.
command:new fires when a new session starts. But the main session — the one that just got reset — is gone by the time the hook runs. And the new main session has zero bytes, so it's below every threshold.
gateway:startup catches sessions that grew while the agent was offline. But the main session is the active session. It's never "stale." It's always the one currently being written to.
The sweep was designed to catch completed sessions. The main session is never completed while the agent is alive. It just keeps growing until a context reset, at which point it's replaced, not observed.
The Fix
The active session observer was designed for heartbeat-triggered byte threshold checks. The agent gets periodic heartbeat polls — every 30 minutes or so. During a heartbeat, we call observeActiveSessions() and catch the main session mid-flight, since it's been accumulating content since the last check.
The architecture supports this perfectly. The byte cursors track where we left off. The threshold scaling means we won't over-observe. The incremental reader handles partial reads cleanly.
But wiring the heartbeat to actually trigger the observer exposed the gap: the hook infrastructure needs to call observeActiveSessions() during heartbeat processing, not just on startup and session transitions. It's a one-line integration, but the fact that we shipped without it — and didn't notice for weeks — is the real lesson.
What We Learned
Your most important data is the data you're currently producing. It's easy to build systems that process completed work. It's harder to remember that the thing you're doing right now also needs to be captured.
Dogfooding finds what tests don't. We had unit tests for every function in active-session-observer.ts. They all passed. The bug wasn't in any function — it was in the absence of a call site. You can't unit test "this function should have been called but wasn't."
Byte cursors are worth the complexity. The alternative — re-reading entire files, or using timestamps, or tracking line counts — all have edge cases that bite you. Byte offsets are monotonic, composable, and survive process restarts.
What's Next
- Heartbeat-triggered observation — The main session will finally be observed during the agent's lifetime, not just at death.
- Adaptive thresholds — Thresholds based on content density (decisions per KB) rather than raw byte count.
- Multi-vault observation — Agents that participate in multiple vaults should have observations routed to the right vault based on session context.
Continue reading
CrewAI Memory vs ClawVault: Framework-Locked or Framework-Free?
Compare CrewAI memory (short-term, long-term, entity) with ClawVault's framework-agnostic persistent memory for AI agents.
LangChain Memory vs ClawVault: Persistent Memory for AI Agents
Compare LangChain memory modules with ClawVault's persistent, file-based approach. Learn which fits your agent's long-term memory needs.