Disclaimer after encountering this issue I used AI (claude code) to find the issue and debug the issue, I did manually test the workarounds (disabling streaming or disabling parallel_tool_calls) and they do seem to work. I also asked claude code to suggest an actual fix, however I find it difficult to judge whether this is a good fix or not so I will defer to your expertise for that
Bug: Streaming + parallel tool calls causes infinite request loop when using the Responses API
When using openAI.chat (Responses API) with stream: true and a model that emits more than one function_call item in a single response, deep-chat enters an infinite request loop. The chat shows a mix of valid responses and repeated "Error, please try again." messages, and the original user message keeps being re-submitted.
Prerequisites
directConnection.openAI.chat configured with tools and function_handler
connect.stream: true
- A model/prompt combination where the model makes two or more tool calls in a single response (i.e.
parallel_tool_calls is not disabled on the API side)
Steps to reproduce
Configure deep-chat with the Responses API, at least two tools, and stream: true:
directConnection = {
openAI: {
key: '...',
chat: {
tools: [
{ type: 'function', strict: true, name: 'get_user_name', ... },
{ type: 'function', strict: true, name: 'get_user_goals', ... },
],
function_handler: (calls) => calls.map(({ name }) => ({ response: '...' })),
}
}
};
connect = { stream: true };
Send a message that causes the model to invoke two tools in the same response (e.g. "What is my name and what are my open goals?").
Observe the chat UI.
Expected behaviour
The two tool calls are resolved, a single follow-up request is made with both results, and the model returns a final text answer.
Actual behaviour
Two separate follow-up SSE streams are started concurrently — one per tool call. Each follow-up body only contains one of the two tool results (missing the other's context). The API responds to each incomplete context, potentially triggering more tool calls. The process repeats: the original question re-appears in the chat, "Error, please try again." messages accumulate, and the cycle continues until the component is destroyed or the browser tab is closed.
Suspected Cause
handleStreamedResponsesFunctionCall calls handleResponsesFunctionCalls — and therefore makeAnotherRequest — once per response.output_item.done event:
// openAIChatIO.ts
if (result[TYPE] === `${RESPONSE}.output_item.done`) {
this._functionStreamInProgress = false;
if (result.item?.[TYPE] === FUNCTION_CALL) {
return this.handleResponsesFunctionCalls([result.item], prevBody); // fires per event
}
}
For a response containing two parallel function calls, the SSE stream delivers two output_item.done events, so makeAnotherRequest is called twice with separate, incomplete bodies. The non-streaming path in extractResult does not have this problem — it receives the complete result.output array and calls handleResponsesFunctionCalls once with all function calls.
Suggested fix
Accumulate function calls during streaming and flush them all at once when response.completed arrives, matching the non-streaming path's atomic behaviour. See the detailed fix in the comments below.
Workaround
Set parallel_tool_calls: false on the chat config. This prevents the model from emitting more than one function_call item per response, so makeAnotherRequest is only called once per round. It does not fix the underlying library bug.
Environment
deep-chat version: 9.0.370 (latest at time of writing)
Further Details
Root cause in the source
The bug is in 1handleStreamedResponsesFunctionCall1 (openAIChatIO.ts:218):
private async handleStreamedResponsesFunctionCall(result: OpenAIResult, prevBody?: OpenAIChat) {
if (result[TYPE] === `${RESPONSE}.output_item.done`) {
this._functionStreamInProgress = false;
if (result.item?.[TYPE] === FUNCTION_CALL) {
return this.handleResponsesFunctionCalls([result.item], prevBody); // ← fires per event
}
} else if (result[TYPE] === `${RESPONSE}.output_item.added`) {
this._functionStreamInProgress = true;
}
return {[TEXT]: ''};
}
The SSE stream for a response with parallel tool calls looks like:
response.output_item.added (call_1)
response.function_call_arguments.delta ...
response.output_item.done (call_1) ← handleResponsesFunctionCalls fires → makeAnotherRequest #1
response.output_item.added (call_2)
response.function_call_arguments.delta ...
response.output_item.done (call_2) ← handleResponsesFunctionCalls fires → makeAnotherRequest #2
response.completed ← currently ignored / falls through to {text:''}
Two concurrent follow-up streams are started. Each only carries one tool call's context. The API responds to each partial context — possibly triggering more tool calls — and the loop begins.
The non-streaming path in extractResult does it right: it receives result.output (the complete output[] array from response.completed) and calls handleResponsesFunctionCalls(allCalls, prevBody) once. The streaming path needs to match that behaviour.
Suggested fix
Two small changes to OpenAIChatIO:
1. Add a pending-calls accumulator and reset it on each new request:
// new field
private _pendingStreamedFunctionCalls: ResponsesFunctionCall[] = [];
override async callServiceAPI(messages: Messages, pMessages: MessageContentI[]) {
this._pendingStreamedFunctionCalls = []; // ← reset; guards against interrupted streams
this.messages ??= messages;
// ... rest unchanged
}
2. Stop processing immediately on output_item.done — accumulate instead:
private async handleStreamedResponsesFunctionCall(result: OpenAIResult, prevBody?: OpenAIChat) {
if (result[TYPE] === `${RESPONSE}.output_item.done`) {
this._functionStreamInProgress = false;
if (result.item?.[TYPE] === FUNCTION_CALL) {
this._pendingStreamedFunctionCalls.push(result.item as ResponsesFunctionCall); // ← collect
}
} else if (result[TYPE] === `${RESPONSE}.output_item.added`) {
this._functionStreamInProgress = true;
}
return {[TEXT]: ''};
}
3. Flush the accumulated calls on response.completed in extractResult:
private async extractResult(result: OpenAIResult, prevBody?: OpenAIChat): Promise<ResponseI> {
if (result[ERROR]) throw result[ERROR].message;
if (result.status) {
// ... non-streaming path unchanged
}
// NEW: flush all parallel tool calls atomically when the response stream ends
if (result[TYPE] === `${RESPONSE}.completed` && this._pendingStreamedFunctionCalls[LENGTH] > 0) {
const pending = this._pendingStreamedFunctionCalls;
this._pendingStreamedFunctionCalls = [];
return this.handleResponsesFunctionCalls(pending, prevBody) as Promise<ResponseI>;
}
if (result.item?.[TYPE] === FUNCTION_CALL && result[TYPE]) {
return this.handleStreamedResponsesFunctionCall(result, prevBody);
}
// ... rest unchanged
}
Why this is correct
The response.completed SSE event fires before the server closes the connection. So the timing of asyncCallInProgress = true (set inside callToolFunction, called from handleResponsesFunctionCalls) is still guaranteed to happen before handleClose fires — which is exactly what it needs to suppress the original stream's close and hand off to the follow-up stream.
The non-streaming path already does this correctly: it receives the whole output[] from result.output and calls handleResponsesFunctionCalls(allCalls) once. This fix makes the streaming path identical in semantics: collect all function calls from the stream, then process them together in one makeAnotherRequest.
Disclaimer after encountering this issue I used AI (claude code) to find the issue and debug the issue, I did manually test the workarounds (disabling streaming or disabling parallel_tool_calls) and they do seem to work. I also asked claude code to suggest an actual fix, however I find it difficult to judge whether this is a good fix or not so I will defer to your expertise for that
Bug: Streaming + parallel tool calls causes infinite request loop when using the Responses API
When using
openAI.chat(Responses API) withstream: trueand a model that emits more than onefunction_callitem in a single response, deep-chat enters an infinite request loop. The chat shows a mix of valid responses and repeated "Error, please try again." messages, and the original user message keeps being re-submitted.Prerequisites
directConnection.openAI.chatconfigured withtoolsandfunction_handlerconnect.stream: trueparallel_tool_callsis not disabled on the API side)Steps to reproduce
Configure deep-chat with the Responses API, at least two tools, and
stream: true:Send a message that causes the model to invoke two tools in the same response (e.g. "What is my name and what are my open goals?").
Observe the chat UI.
Expected behaviour
The two tool calls are resolved, a single follow-up request is made with both results, and the model returns a final text answer.
Actual behaviour
Two separate follow-up SSE streams are started concurrently — one per tool call. Each follow-up body only contains one of the two tool results (missing the other's context). The API responds to each incomplete context, potentially triggering more tool calls. The process repeats: the original question re-appears in the chat, "Error, please try again." messages accumulate, and the cycle continues until the component is destroyed or the browser tab is closed.
Suspected Cause
handleStreamedResponsesFunctionCallcallshandleResponsesFunctionCalls— and thereforemakeAnotherRequest— once perresponse.output_item.doneevent:For a response containing two parallel function calls, the SSE stream delivers two
output_item.doneevents, somakeAnotherRequestis called twice with separate, incomplete bodies. The non-streaming path inextractResultdoes not have this problem — it receives the completeresult.outputarray and callshandleResponsesFunctionCallsonce with all function calls.Suggested fix
Accumulate function calls during streaming and flush them all at once when
response.completedarrives, matching the non-streaming path's atomic behaviour. See the detailed fix in the comments below.Workaround
Set
parallel_tool_calls: falseon the chat config. This prevents the model from emitting more than onefunction_callitem per response, somakeAnotherRequestis only called once per round. It does not fix the underlying library bug.Environment
deep-chat version: 9.0.370 (latest at time of writing)
Further Details
Root cause in the source
The bug is in 1handleStreamedResponsesFunctionCall1 (openAIChatIO.ts:218):
The SSE stream for a response with parallel tool calls looks like:
Two concurrent follow-up streams are started. Each only carries one tool call's context. The API responds to each partial context — possibly triggering more tool calls — and the loop begins.
The non-streaming path in
extractResultdoes it right: it receivesresult.output(the completeoutput[]array fromresponse.completed) and callshandleResponsesFunctionCalls(allCalls, prevBody)once. The streaming path needs to match that behaviour.Suggested fix
Two small changes to
OpenAIChatIO:1. Add a pending-calls accumulator and reset it on each new request:
2. Stop processing immediately on
output_item.done— accumulate instead:3. Flush the accumulated calls on
response.completedinextractResult:Why this is correct
The
response.completedSSE event fires before the server closes the connection. So the timing ofasyncCallInProgress = true(set insidecallToolFunction, called fromhandleResponsesFunctionCalls) is still guaranteed to happen beforehandleClosefires — which is exactly what it needs to suppress the original stream's close and hand off to the follow-up stream.The non-streaming path already does this correctly: it receives the whole
output[]fromresult.outputand callshandleResponsesFunctionCalls(allCalls)once. This fix makes the streaming path identical in semantics: collect all function calls from the stream, then process them together in onemakeAnotherRequest.