Morgan's system prompt is now generated at build time and embedded directly in the code, making it available in Cloudflare Worker environments where file system access isn't available. Changes: - Add scripts/generate-morgan-prompt.js to generate TypeScript constant from markdown - Generate src/lib/agents/morgan-system-prompt.ts with full Fortura Agent Bundle - Update agent definitions to import and use the embedded constant - Update package.json build scripts to generate prompt before building - Remove runtime file system access (readFileSync) that failed on Cloudflare This ensures Morgan agent has full system prompt capabilities on all deployments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
6.0 KiB
Full report on how streaming actually works with webhooks in n8n
(what is happening under the hood, why the versions matter, and how you wire this into your Correspondents stack)
I’m going to walk through this in a very plain, conversational way rather than repeating docs. The goal is to give you the “real mental model” of how n8n handles streaming so you can build a robust agent API for Correspondents.
H1 - What “streaming” means inside n8n
n8n does not stream in the sense of WebSockets or Server Sent Events.
It uses plain HTTP chunked transfer - basically the node writes multiple res.write() chunks to the webhook connection until the workflow ends, then does a final res.end().
So your frontend - agents.nicholai.work - needs to be able to read the chunks as they come in. Libraries like fetch-with-streaming, readable streams, or SSE-like wrappers work fine.
There is no buffering on n8n’s side once streaming is enabled. Each node that supports streaming emits pieces of data as they are produced.
H1 - Why version 1.105.2+ matters
Before ~1.105.x, the Webhook node hard-terminated the response early and the AI Agent node didn’t expose the streaming flag publicly.
After 1.105.2:
- The Webhook node gained a true “Streaming” response mode that keeps the HTTP response open.
- The AI Agent node gained support for chunked output and a
stream: trueflag internally. - n8n’s runtime gained a proper
pushChunkpipeline - meaning nodes can flush data without waiting for the workflow to finish.
Your Correspondents architecture depends on this new runtime. If you're under that version, the workflow waits until completion and dumps one JSON blob.
H1 - The real mechanics: how the Webhook node streams
When you set the Webhook node to “Response mode: Streaming”, three things happen:
H2 - 1. n8n tells Express not to auto-close the response
This stops the default behavior where a workflow finishes and n8n auto-sends the output.
H2 - 2. The node switches into “manual response mode”
res.write() becomes available to downstream nodes.
H2 - 3. The workflow execution channel is kept alive
n8n's internal worker uses a duplex stream so that downstream nodes can emit arbitrary numbers of chunks.
That is the entire magic. It’s simple once you know what's going on.
H1 - How the AI Agent node streams
The AI Agent node is built on top of the new n8n LLM abstraction layer (which wraps provider SDKs like OpenAI, Anthropic, Mistral, Groq, etc).
When you enable streaming in the AI Agent node:
- The node uses the provider’s native streaming API
- Each token or chunk triggers a callback
- The callback uses
this.sendMessageToUIfor debugging andthis.pushOutputfor the webhook stream - The Webhook node emits each chunk to the client as a separate write
So the data goes like this:
Provider → AI Agent Node → n8n chunk buffer → Webhook → your client
Nothing sits in memory waiting for completion unless the model provider itself has that behavior.
H1 - The correct wiring for your Correspondents architecture
Your workflow needs to be shaped like this:
Webhook (Streaming) → Parse Request → AI Agent (streaming enabled) → (optional) transforms → Webhook Respond (or not needed if streaming is active)
You do not use a "Webhook Respond" node in streaming mode. The Webhook node itself ends the connection when the workflow finishes.
So your workflow ends with the AI Agent node, or a final “completion” function, but no explicit response node.
H1 - What your client must do
Since the n8n webhook responses are plain HTTP chunks, your client needs to read a ReadableStream.
Your frontend will look something like this (shortened for clarity):
const response = await fetch(url, { method: "POST", body: payload });
const reader = response.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = new TextDecoder().decode(value);
// handle chunk...
}
That is literally all streaming requires on your side.
H1 - Known pitfalls that bite real production workflows
H2 - 1. Using the old AI nodes
If you created your workflow before 1.105.x, you need to delete and re-add:
- Webhook node
- AI Agent node
n8n hard-caches node versions per-workflow.
H2 - 2. Returning JSON inside a streaming workflow
You cannot stream and then return JSON at the end. Streaming means the connection ends when the workflow ends - no trailing payload allowed.
H2 - 3. Host reverse-proxies sometimes buffer chunks
Cloudflare, Nginx, Traefik, Caddy can all buffer unless explicitly configured not to. n8n’s own Cloud-hosted version solves this for you, but self-host setups need:
proxy_buffering off;
or equivalent.
H2 - 4. AI Agent streaming only works for supported providers
Anthropic, OpenAI, Groq, Mistral etc. If you use a provider that n8n wraps via HTTP only, streaming may be faked or disabled.
H1 - How this ties directly into your Correspondents repo
Your architecture is:
agents.nicholai.work → webhook trigger (stream) → agent logic (custom) → n8n AI Agent node (stream) → stream back to client until agent finishes
This means you can implement:
- GPT style token streaming
- Multi-agent streaming
- Stream partial tool results
- Stream logs or “thoughts” like OpenAI Logprobs / reasoning
As long as each chunk is sent as plain text, the client sees it instantly.
If you want to multiplex multiple channels (logs, events, tokens), you can prefix chunks:
event:token Hello
event:log Running step 1
event:token world
And your client router can handle it on your end.
H1 - Final summary in normal English, no fluff
Streaming in n8n is just chunked HTTP responses. The Webhook node keeps the HTTP connection open. The AI Agent node emits tokens as they arrive from the model provider. Your client reads chunks. No magic beyond that.
This gives you a fully ChatGPT-like real time experience inside n8n workflows, including multi-agent setups like Correspondents.