Correspondents/docs/n8n-webhook-streaming.md

# Full report on how streaming actually works with webhooks in n8n

(what is happening under the hood, why the versions matter, and how you wire this into your Correspondents stack)

I’m going to walk through this in a very plain, conversational way rather than repeating docs. The goal is to give you the “real mental model” of how n8n handles streaming so you can build a robust agent API for Correspondents.

---

# H1 - What “streaming” means inside n8n

n8n does not stream in the sense of WebSockets or Server Sent Events.
It uses plain HTTP chunked transfer - basically the node writes multiple `res.write()` chunks to the webhook connection until the workflow ends, then does a final `res.end()`.

So your frontend - agents.nicholai.work - needs to be able to read the chunks as they come in. Libraries like fetch-with-streaming, readable streams, or SSE-like wrappers work fine.

There is no buffering on n8n’s side once streaming is enabled. Each node that supports streaming emits pieces of data as they are produced.

---

# H1 - Why version 1.105.2+ matters

Before ~1.105.x, the Webhook node hard-terminated the response early and the AI Agent node didn’t expose the streaming flag publicly.

After 1.105.2:

* The Webhook node gained a true “Streaming” response mode that keeps the HTTP response open.
* The AI Agent node gained support for chunked output and a `stream: true` flag internally.
* n8n’s runtime gained a proper `pushChunk` pipeline - meaning nodes can flush data without waiting for the workflow to finish.

Your Correspondents architecture depends on this new runtime. If you're under that version, the workflow waits until completion and dumps one JSON blob.

---

# H1 - The real mechanics: how the Webhook node streams

When you set the Webhook node to “Response mode: Streaming”, three things happen:

## H2 - 1. n8n tells Express not to auto-close the response

This stops the default behavior where a workflow finishes and n8n auto-sends the output.

## H2 - 2. The node switches into “manual response mode”

`res.write()` becomes available to downstream nodes.

## H2 - 3. The workflow execution channel is kept alive

n8n's internal worker uses a duplex stream so that downstream nodes can emit arbitrary numbers of chunks.

That is the entire magic. It’s simple once you know what's going on.

---

# H1 - How the AI Agent node streams

The AI Agent node is built on top of the new n8n LLM abstraction layer (which wraps provider SDKs like OpenAI, Anthropic, Mistral, Groq, etc).

When you enable streaming in the AI Agent node:

* The node uses the provider’s native streaming API
* Each token or chunk triggers a callback
* The callback uses `this.sendMessageToUI` for debugging and `this.pushOutput` for the webhook stream
* The Webhook node emits each chunk to the client as a separate write

So the data goes like this:

Provider → AI Agent Node → n8n chunk buffer → Webhook → your client

Nothing sits in memory waiting for completion unless the model provider itself has that behavior.

---

# H1 - The correct wiring for your Correspondents architecture

Your workflow needs to be shaped like this:

Webhook (Streaming)
→ Parse Request
→ AI Agent (streaming enabled)
→ (optional) transforms
→ Webhook Respond (or not needed if streaming is active)

You **do not** use a "Webhook Respond" node in streaming mode.
The Webhook node itself ends the connection when the workflow finishes.

So your workflow ends with the AI Agent node, or a final “completion” function, but no explicit response node.

---

# H1 - What your client must do

Since the n8n webhook responses are plain HTTP chunks, your client needs to read a **ReadableStream**.

Your frontend will look something like this (shortened for clarity):

```
const response = await fetch(url, { method: "POST", body: payload });
const reader = response.body.getReader();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = new TextDecoder().decode(value);
  // handle chunk...
}
```

That is literally all streaming requires on your side.

---

# H1 - Known pitfalls that bite real production workflows

## H2 - 1. Using the old AI nodes

If you created your workflow before 1.105.x, you need to delete and re-add:

* Webhook node
* AI Agent node

n8n hard-caches node versions per-workflow.

## H2 - 2. Returning JSON inside a streaming workflow

You cannot stream and then return JSON at the end.
Streaming means the connection ends when the workflow ends - no trailing payload allowed.

## H2 - 3. Host reverse-proxies sometimes buffer chunks

Cloudflare, Nginx, Traefik, Caddy can all buffer unless explicitly configured not to.
n8n’s own Cloud-hosted version solves this for you, but self-host setups need:

`proxy_buffering off;`

or equivalent.

## H2 - 4. AI Agent streaming only works for supported providers

Anthropic, OpenAI, Groq, Mistral etc.
If you use a provider that n8n wraps via HTTP only, streaming may be faked or disabled.

---

# H1 - How this ties directly into your Correspondents repo

Your architecture is:

agents.nicholai.work
→ webhook trigger (stream)
→ agent logic (custom)
→ n8n AI Agent node (stream)
→ stream back to client until agent finishes

This means you can implement:

* GPT style token streaming
* Multi-agent streaming
* Stream partial tool results
* Stream logs or “thoughts” like OpenAI Logprobs / reasoning

As long as each chunk is sent as plain text, the client sees it instantly.

If you want to multiplex multiple channels (logs, events, tokens), you can prefix chunks:

```
event:token Hello
event:log Running step 1
event:token world
```

And your client router can handle it on your end.

---

# H1 - Final summary in normal English, no fluff

Streaming in n8n is just chunked HTTP responses.
The Webhook node keeps the HTTP connection open.
The AI Agent node emits tokens as they arrive from the model provider.
Your client reads chunks.
No magic beyond that.

This gives you a fully ChatGPT-like real time experience inside n8n workflows, including multi-agent setups like Correspondents.