Morgan's system prompt is now generated at build time and embedded directly in the code, making it available in Cloudflare Worker environments where file system access isn't available. Changes: - Add scripts/generate-morgan-prompt.js to generate TypeScript constant from markdown - Generate src/lib/agents/morgan-system-prompt.ts with full Fortura Agent Bundle - Update agent definitions to import and use the embedded constant - Update package.json build scripts to generate prompt before building - Remove runtime file system access (readFileSync) that failed on Cloudflare This ensures Morgan agent has full system prompt capabilities on all deployments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
185 lines
6.0 KiB
Markdown
185 lines
6.0 KiB
Markdown
# Full report on how streaming actually works with webhooks in n8n
|
||
|
||
(what is happening under the hood, why the versions matter, and how you wire this into your Correspondents stack)
|
||
|
||
I’m going to walk through this in a very plain, conversational way rather than repeating docs. The goal is to give you the “real mental model” of how n8n handles streaming so you can build a robust agent API for Correspondents.
|
||
|
||
---
|
||
|
||
# H1 - What “streaming” means inside n8n
|
||
|
||
n8n does not stream in the sense of WebSockets or Server Sent Events.
|
||
It uses plain HTTP chunked transfer - basically the node writes multiple `res.write()` chunks to the webhook connection until the workflow ends, then does a final `res.end()`.
|
||
|
||
So your frontend - agents.nicholai.work - needs to be able to read the chunks as they come in. Libraries like fetch-with-streaming, readable streams, or SSE-like wrappers work fine.
|
||
|
||
There is no buffering on n8n’s side once streaming is enabled. Each node that supports streaming emits pieces of data as they are produced.
|
||
|
||
---
|
||
|
||
# H1 - Why version 1.105.2+ matters
|
||
|
||
Before ~1.105.x, the Webhook node hard-terminated the response early and the AI Agent node didn’t expose the streaming flag publicly.
|
||
|
||
After 1.105.2:
|
||
|
||
* The Webhook node gained a true “Streaming” response mode that keeps the HTTP response open.
|
||
* The AI Agent node gained support for chunked output and a `stream: true` flag internally.
|
||
* n8n’s runtime gained a proper `pushChunk` pipeline - meaning nodes can flush data without waiting for the workflow to finish.
|
||
|
||
Your Correspondents architecture depends on this new runtime. If you're under that version, the workflow waits until completion and dumps one JSON blob.
|
||
|
||
---
|
||
|
||
# H1 - The real mechanics: how the Webhook node streams
|
||
|
||
When you set the Webhook node to “Response mode: Streaming”, three things happen:
|
||
|
||
## H2 - 1. n8n tells Express not to auto-close the response
|
||
|
||
This stops the default behavior where a workflow finishes and n8n auto-sends the output.
|
||
|
||
## H2 - 2. The node switches into “manual response mode”
|
||
|
||
`res.write()` becomes available to downstream nodes.
|
||
|
||
## H2 - 3. The workflow execution channel is kept alive
|
||
|
||
n8n's internal worker uses a duplex stream so that downstream nodes can emit arbitrary numbers of chunks.
|
||
|
||
That is the entire magic. It’s simple once you know what's going on.
|
||
|
||
---
|
||
|
||
# H1 - How the AI Agent node streams
|
||
|
||
The AI Agent node is built on top of the new n8n LLM abstraction layer (which wraps provider SDKs like OpenAI, Anthropic, Mistral, Groq, etc).
|
||
|
||
When you enable streaming in the AI Agent node:
|
||
|
||
* The node uses the provider’s native streaming API
|
||
* Each token or chunk triggers a callback
|
||
* The callback uses `this.sendMessageToUI` for debugging and `this.pushOutput` for the webhook stream
|
||
* The Webhook node emits each chunk to the client as a separate write
|
||
|
||
So the data goes like this:
|
||
|
||
Provider → AI Agent Node → n8n chunk buffer → Webhook → your client
|
||
|
||
Nothing sits in memory waiting for completion unless the model provider itself has that behavior.
|
||
|
||
---
|
||
|
||
# H1 - The correct wiring for your Correspondents architecture
|
||
|
||
Your workflow needs to be shaped like this:
|
||
|
||
Webhook (Streaming)
|
||
→ Parse Request
|
||
→ AI Agent (streaming enabled)
|
||
→ (optional) transforms
|
||
→ Webhook Respond (or not needed if streaming is active)
|
||
|
||
You **do not** use a "Webhook Respond" node in streaming mode.
|
||
The Webhook node itself ends the connection when the workflow finishes.
|
||
|
||
So your workflow ends with the AI Agent node, or a final “completion” function, but no explicit response node.
|
||
|
||
---
|
||
|
||
# H1 - What your client must do
|
||
|
||
Since the n8n webhook responses are plain HTTP chunks, your client needs to read a **ReadableStream**.
|
||
|
||
Your frontend will look something like this (shortened for clarity):
|
||
|
||
```
|
||
const response = await fetch(url, { method: "POST", body: payload });
|
||
const reader = response.body.getReader();
|
||
|
||
while (true) {
|
||
const { done, value } = await reader.read();
|
||
if (done) break;
|
||
const text = new TextDecoder().decode(value);
|
||
// handle chunk...
|
||
}
|
||
```
|
||
|
||
That is literally all streaming requires on your side.
|
||
|
||
---
|
||
|
||
# H1 - Known pitfalls that bite real production workflows
|
||
|
||
## H2 - 1. Using the old AI nodes
|
||
|
||
If you created your workflow before 1.105.x, you need to delete and re-add:
|
||
|
||
* Webhook node
|
||
* AI Agent node
|
||
|
||
n8n hard-caches node versions per-workflow.
|
||
|
||
## H2 - 2. Returning JSON inside a streaming workflow
|
||
|
||
You cannot stream and then return JSON at the end.
|
||
Streaming means the connection ends when the workflow ends - no trailing payload allowed.
|
||
|
||
## H2 - 3. Host reverse-proxies sometimes buffer chunks
|
||
|
||
Cloudflare, Nginx, Traefik, Caddy can all buffer unless explicitly configured not to.
|
||
n8n’s own Cloud-hosted version solves this for you, but self-host setups need:
|
||
|
||
`proxy_buffering off;`
|
||
|
||
or equivalent.
|
||
|
||
## H2 - 4. AI Agent streaming only works for supported providers
|
||
|
||
Anthropic, OpenAI, Groq, Mistral etc.
|
||
If you use a provider that n8n wraps via HTTP only, streaming may be faked or disabled.
|
||
|
||
---
|
||
|
||
# H1 - How this ties directly into your Correspondents repo
|
||
|
||
Your architecture is:
|
||
|
||
agents.nicholai.work
|
||
→ webhook trigger (stream)
|
||
→ agent logic (custom)
|
||
→ n8n AI Agent node (stream)
|
||
→ stream back to client until agent finishes
|
||
|
||
This means you can implement:
|
||
|
||
* GPT style token streaming
|
||
* Multi-agent streaming
|
||
* Stream partial tool results
|
||
* Stream logs or “thoughts” like OpenAI Logprobs / reasoning
|
||
|
||
As long as each chunk is sent as plain text, the client sees it instantly.
|
||
|
||
If you want to multiplex multiple channels (logs, events, tokens), you can prefix chunks:
|
||
|
||
```
|
||
event:token Hello
|
||
event:log Running step 1
|
||
event:token world
|
||
```
|
||
|
||
And your client router can handle it on your end.
|
||
|
||
---
|
||
|
||
# H1 - Final summary in normal English, no fluff
|
||
|
||
Streaming in n8n is just chunked HTTP responses.
|
||
The Webhook node keeps the HTTP connection open.
|
||
The AI Agent node emits tokens as they arrive from the model provider.
|
||
Your client reads chunks.
|
||
No magic beyond that.
|
||
|
||
This gives you a fully ChatGPT-like real time experience inside n8n workflows, including multi-agent setups like Correspondents.
|
||
|