Correspondents/docs/n8n-streaming-toolcalls.md

# Streaming Tool-Call Management for Correspondents

### A technical write-up on methodology, architecture, and logic

---

## H1 - Overview

The Correspondents framework currently relies on n8n’s **AI Agent node** to generate structured messages, including `regular_message` and `tool_call` JSON objects, which are emitted as plaintext by the agents. This schema is correct and well-designed: it enforces predictability, simplicity, and easy parsing on the client side.

However, when **streaming** is enabled, the single JSON object gets fragmented across multiple chunks of the HTTP stream. Traditional parsers—especially those that assume complete messages per line—fail to reassemble these fragments, leading to broken tool-call handling and misinterpreted data.

The solution is architectural: maintain the current schema, but adjust the **stream handling logic** so that JSON tool-calls survive chunking and tool executions are surfaced to the frontend in real time.

---

## H1 - Core Logic and Methodology

### H2 - 1. Preserve the Flat JSON Contract

Your agent schema—flat, non-nested, and explicit—is ideal for deterministic streaming. It avoids complex serializations and ambiguous encodings. By keeping this contract, you ensure that all agents remain backward-compatible with both synchronous and streaming workflows.

The key to streaming is not changing the schema—it’s changing **how** the stream is interpreted and **when** events are emitted.

---

### H2 - 2. Introduce a Streaming JSON-Capture Layer

When responses stream from n8n or from an upstream AI model, each data chunk may represent only part of a JSON structure. To maintain structural integrity, a **state machine** (not a regex parser) must reconstruct the message before classification.

Conceptually, this layer tracks:

* Whether a JSON object is currently being captured
* The brace depth (`{` and `}`) to know when an object starts and ends
* String boundaries to avoid misinterpreting braces inside quoted text

This ensures that even if a tool-call payload is streamed over dozens of chunks, it’s recognized as a single valid object once complete.

Once a full JSON object is detected and parsed, it is dispatched as a structured event to the client (e.g., `type: "tool_call"` or `type: "regular_message"`). Any text outside a valid JSON boundary continues to stream normally as `type: "content"` events.

---

### H2 - 3. Extend n8n Workflow Emission with Synthetic Tool Events

The n8n **AI Agent node** does not natively broadcast tool-call events during streaming—it executes them internally and returns final results once complete. To make tool execution transparent, modify the workflow by adding a **Code node** (or similar hook) before and after tool execution steps.

* **Before execution:** Emit a `"tool_call"` event announcing which tool is being invoked, along with its parameters or context.
* **After execution:** Emit a `"tool_result"` event containing summarized results or confirmation that the tool completed successfully.

These intermediate signals give your frontend visibility into what the system is doing in real time. They can also be used for progress indicators, logging, or auditing.

This pattern—emitting “start” and “end” markers—is equivalent to structured tracing within streaming workflows. It allows you to observe latency, measure execution time, and handle failure gracefully (e.g., showing “tool failed” messages).

---

### H2 - 4. Monitor n8n for Native Support

n8n’s maintainers are actively working on **native streaming for tool-calls** (see PR #20499). This will likely introduce built-in event types such as `tool_call_start` and `tool_call_end`, similar to OpenAI’s event stream semantics.

Until this functionality lands, your custom emitters act as a compatibility layer—matching what the official API will eventually provide. This forward-compatible design means you can easily migrate to native events later without rewriting your frontend.

---

### H2 - 5. Consider Stream Partitioning

Streaming everything through a single connection can create noisy event traffic—especially when mixing token-level text and tool telemetry. A clean architecture can **partition the stream** into two logical channels:

* **Main Assistant Stream:** carries incremental text deltas (`content` events, message tokens, etc.)
* **Tool Call Stream:** carries system-level events (`tool_call`, `tool_result`, errors, and logs)

This can be implemented as separate **Server-Sent Event (SSE)** endpoints or separate NDJSON channels multiplexed within the same connection.

Partitioning provides better observability and reduces frontend coupling: you can display conversation flow independently of background tool actions, yet synchronize them via IDs or timestamps.

---

### H2 - 6. Production Validation and Observability

Before deployment, stress-test the entire flow with simulated long-running tools (e.g., artificial delays or heavy operations). During these tests:

* Verify that `"tool_call"` events appear as soon as the tool starts.
* Ensure `"tool_result"` events arrive even when network latency or retries occur.
* Confirm that JSON objects remain intact regardless of chunk size or proxy buffering.
* Test browser-side resilience—if a user refreshes mid-stream, does the system resume cleanly or gracefully fail?

Integrate verbose logging at the stream layer to record:

* Timing between event types
* Average duration of tool calls
* Frequency of malformed JSON (useful for debugging agent misbehavior)

This telemetry will inform later performance optimizations and model-prompt adjustments.

---

## H1 - Technical Underpinnings

### H2 - Streaming Transport

The underlying transport is **HTTP chunked transfer encoding**. The Webhook node in n8n keeps the connection open, writing partial data with `res.write()` and closing it with `res.end()`. Each chunk arrives at the frontend as part of a continuous readable stream.

This design requires a decoder capable of handling partial UTF-8 boundaries, incomplete JSON, and asynchronous event emission.

### H2 - Event Representation

Most streaming ecosystems (OpenAI, Anthropic, Cohere, Vercel AI SDK) use event-based framing—either NDJSON (newline-delimited JSON) or SSE (Server-Sent Events). Both allow discrete events to be interpreted incrementally.

By framing your events this way, you gain interoperability with existing streaming libraries and avoid ambiguity between textual output and structured control messages.

### H2 - Workflow Integration

Within n8n:

* The **Webhook node** provides the live output channel.
* The **AI Agent node** generates streaming data.
* Custom **Code nodes** or **Function items** can intercept and re-emit synthetic events.

Because n8n allows direct access to the HTTP response object when using “Response mode: Streaming,” you can write custom control frames in exactly the same stream, ensuring chronological integrity between text and tool events.

### H2 - State Management and Error Handling

The capture layer functions as a **finite-state machine** managing transitions:

* `idle → capturing → emitting → idle`
* With sub-states for quoted strings, escapes, and brace balancing.

When malformed JSON or premature stream closure occurs, the system should reset state, log the anomaly, and optionally emit a `type:"error"` event for client awareness. This ensures stability under unpredictable streaming conditions.

---

## H1 - Future Integration and Scaling

When native n8n support for tool streaming becomes available, your current architecture will need only minor adjustments:

* Replace the custom “before/after” emitters with n8n’s built-in event hooks.
* Switch to official event types (`tool_call_start`, `tool_call_result`) without changing your frontend logic.
* Optionally drop the JSON-capture layer if n8n begins emitting well-framed event data by default.

Long term, this foundation supports parallel streams, multi-agent collaboration, and event-driven orchestration—making Correspondents extensible for real-time AI systems that mix reasoning and automation.

---

## H1 - Summary

1. **Preserve your schema.** Flat JSON remains the most reliable format for deterministic streaming.
2. **Reassemble fragmented JSON.** Introduce a lightweight capture layer to maintain structural integrity.
3. **Expose tool events manually.** Emit start/result signals around tool executions for transparency.
4. **Monitor native updates.** Track n8n’s release progress for built-in tool-call streaming.
5. **Partition streams.** Keep text and system events logically distinct for clarity and reliability.
6. **Validate continuously.** Use long-running simulations and logging to confirm real-time correctness.

This strategy balances robustness with forward compatibility—ensuring Correspondents can stream dynamic agent behavior today while seamlessly aligning with future n8n capabilities.

**Tags:** #streaming #n8n #architecture #toolcalls #Correspondents