Your Agent Doesn't Need a Soul. It Needs Your Worst Tuesday.

There’s a concept gaining traction in the agent-building world called SOUL.md. The idea: your AI agent needs a structured document describing who you are, how you work, what your operating rhythms look like. He built an elicitation workflow, a 45-minute structured interview that walks you through five layers of self-knowledge, then generates the configuration files an agent platform needs. Operating model. User profile. Heartbeat schedule. Decision frameworks.

The architecture is clean. The logic is sound. You sit down, answer questions about your workflows, approve checkpoint summaries, and out comes a usable spec.

I tried it. Or rather, I tried the principle behind it: structured self-description as the foundation for agent context. I spent weeks writing operating documents for my own system. Role definitions. Communication preferences. Decision trees. Escalation logic.

Then I ran the agent in production for six months.

The structured self-description was maybe 20% of what made it work. The other 80% came from things I never would have written down in an interview. They came from failures.

The 2am flock that ate a client brief

Richardson, Texas. February 2026. I wake up to a Slack message from myself, a scheduled brief that should have gone out at 6am for a client sprint review. Except the brief is empty. Not malformed. Empty. The agent generated nothing, reported success, and moved on.

The root cause took two hours to find. I run my agent jobs through flock, a file-locking mechanism that prevents two jobs from running simultaneously on a single-session subscription. One job grabbed the lock at 1:47am. A research task that was supposed to take eight minutes ran for forty-three. The brief-generation job tried to fire at its scheduled window, hit the lock, and exited with code 42. In the logs, exit 42 means “contention skip,” the job will try again next cycle. But the brief had no next cycle. It was a one-shot. The window closed. The client got silence.

No structured interview would have surfaced this. “Describe your operating rhythms” doesn’t produce “my overnight research task sometimes runs long and blocks the morning brief window.” That’s not an operating rhythm. That’s a failure mode. And the failure mode is what the agent needed to know.

If you’d asked me in a structured elicitation, “What happens when two jobs overlap?” I’d have said “flock handles it, contention jobs skip and retry.” Technically true. Also useless. Because the word “retry” hides an assumption: that every job has a retry window. Briefs don’t. One-shots don’t. The distinction between “skippable” and “critical” jobs existed nowhere in my operating model. It existed only in the outcome of this specific failure.

The fix was two lines of context: “Exit 42 on brief-generation is not routine. If a brief job exits 42, escalate immediately. There is no retry window. This is a one-shot.”

That context now lives in the agent’s operational memory. Not because I described my ideal workflow. Because a client almost got ghosted at 6am on a Tuesday.

The heartbeat that nobody noticed was dead

March 2026. My agent runs a health check every 30 minutes, a heartbeat that confirms all subsystems are alive. If it misses two consecutive windows, I get an alert. Simple.

Except I didn’t get an alert. The heartbeat itself died. The monitoring job that checks whether the heartbeat is running was, itself, dependent on the same scheduling infrastructure as the heartbeat. When the scheduler hiccupped at 11pm on a Wednesday, both the heartbeat AND its watchdog went down together. They were supposed to be independent. They shared a single point of failure I’d never identified.

I found out at 7:14am when a lead-intake form submission disappeared. Not failed. Disappeared. The webhook fired, the agent should have processed it, but the agent was in a degraded state nobody had flagged because the thing responsible for flagging degraded states was also degraded.

Four hours of silence. A prospect filled out a form, got no confirmation, no follow-up. They probably assumed the site was broken. They were right.

I sat at my kitchen table that Thursday morning, coffee going cold, tracing the dependency graph on the back of an envelope. Both services were com.opelia.* plists. Both fired through the same launchd scheduler. Both assumed the scheduler was a given, the floor beneath the floor. Except floors crack.

The context that came from this: “The watchdog and the thing it watches cannot share infrastructure. If the heartbeat dies, the alert for ‘heartbeat died’ must fire from a completely independent system. Verify this quarterly. Assume any shared dependency is a single point of failure until proven otherwise.”

No self-description interview asks “what watches your watchdog?” You don’t think about infrastructure dependencies when someone asks you to describe your operating model. You think about what you do. Not what holds up the thing that holds up the thing you do. That knowledge only surfaces when the bottom layer disappears and everything above it goes quiet.

The invoice with yesterday’s date

April 2026. A client gets an invoice dated the 14th. The work was completed on the 15th. The invoice predates the deliverable. The client doesn’t say anything, most wouldn’t, but I catch it in my own records the next week during reconciliation.

The agent generated the invoice at 11:52pm Central on the 15th. Correct. But it pulled the date from a UTC timestamp. 11:52pm Central is 4:52am UTC on the 16th. Wait, no. 11:52pm Central is 5:52am UTC the next day. The agent used new Date() without timezone context, got UTC, and the date rendered as… actually, the bug was dumber than that. The agent had the right timezone in its config. But the invoice template inherited a date-formatting function from a library that defaulted to UTC regardless of system locale. The config said Central. The output said UTC. Nobody noticed because the discrepancy is usually zero, unless you generate documents near midnight.

The context entry: “All date-bearing artifacts (invoices, reports, briefs, calendar entries) must render in America/Chicago. Do not trust library defaults. Verify timezone on any new template before first production use. Edge case: documents generated between 11pm and midnight Central will show the wrong date in any UTC-defaulting system.”

This is the kind of operational knowledge that lives in the scar tissue of running systems. You wouldn’t describe it in an interview because it sounds insane. “Tell me about your timezone preferences” produces “I’m in Central time.” It does not produce “midnight-adjacent document generation will silently backdate your invoices in any library that defaults to UTC, and you won’t catch it for a week.”

The structured interview captures the surface layer. The failure captures the substrate. And it’s the substrate that determines whether your agent produces reliable output or produces output that looks reliable until you audit it a week later and find a date that makes no sense.

The pattern underneath

Each of these failures shares a property: they exist in the gap between how you describe your work and how your work actually behaves under stress.

Nate’s framework asks you to describe your operating model. That’s valuable. It produces a useful starting artifact. But an operating model is a map, and the map is not the territory. The territory has potholes the map doesn’t show, because you’ve never hit them, or you hit them once and forgot, or you hit them and the workaround became so automatic you stopped seeing it as a workaround.

The failures I’ve described aren’t exotic. They’re mundane. Flock contention. Shared infrastructure assumptions. Timezone library defaults. Every production system in history has a graveyard of bugs exactly this boring. And every one of those bugs, once discovered, becomes context that prevents the next failure.

My agent’s operational memory isn’t organized around “how Michael works.” It’s organized around “how things break when Michael isn’t watching.” The difference matters. The first framing produces a system that works when everything goes right. The second produces a system that degrades gracefully when things go wrong. Which they will, on a Tuesday, at 2am, when you’re asleep.

What this means for building your agent’s context

If you’re setting up an agent system, any agent system, from a single-bot Telegram assistant to a multi-agent orchestration, here’s the practical implication:

Start with Nate’s approach. The structured interview produces a real foundation. Know your rhythms, your decisions, your dependencies. Get the skeleton on paper.

Then run it in production and wait for it to fail. Not hypothetical failures. Real ones. The brief that didn’t send. The alert that didn’t fire. The document with the wrong date. The job that silently exited when it should have screamed.

Each failure becomes a context entry. Not a bug fix, a context entry. The fix prevents recurrence of that specific bug. The context entry teaches the agent a class of failure. “Exit 42 on one-shot jobs requires immediate escalation” is a class. “Watchdog and watched process cannot share infrastructure” is a class. “Library timezone defaults are untrustworthy near midnight” is a class.

Build from incident reports, not org charts. The most useful document in my agent system is the running log of failures and what they taught. Thirty-seven entries as of this writing. Each one is a scar that prevents a wound. Each one was invisible to me before it happened.

Treat “it worked fine” as incomplete information. A job that exits cleanly isn’t necessarily a job that did the right thing. The invoice was generated. The date was wrong. Success metrics without correctness checks are theater. Your agent’s context should encode what “actually correct” looks like for every output, not just “task completed.”

Accept that your initial context document will be wrong. Not incomplete. Wrong. It will contain assumptions that sound reasonable and happen to be false. The structured interview produces those assumptions with high confidence, because you believe them. Production exposes them. That’s the process. First draft from self-description. Second draft from reality. The second draft is the one that works.

Your agent doesn’t need your idealized self-description. It needs the operational reality that only surfaces when things go wrong. The structured interview captures who you think you are at work. The failure log captures who you actually are. The edge cases, the implicit assumptions, the things you never thought to say because you didn’t know they were true until they broke.

I’ve been running my system in production for six months. The operating model I wrote on day one is maybe 40% of the current context. The other 60% was written in the aftermath of something going wrong, at 2am, or 7:14am, or 11:52pm on a night I thought everything was fine.

The soul of your agent isn’t your best day articulated cleanly. It’s your worst Tuesday, documented honestly.

I help founders and operators build advisory-grade agent systems, the kind that survive contact with production. Not theory. Architecture from scar tissue.

If your agent setup stalled after install, or you’re running something that works 80% of the time and silently fails the other 20%, that gap is diagnosable.

Book a Diagnostic Call at mlsebastian.com/book/diagnostic-call.

The 2am flock that ate a client brief

The heartbeat that nobody noticed was dead

The invoice with yesterday’s date

The pattern underneath

What this means for building your agent’s context

Want the next one in your inbox?