Bug: Streaming + parallel tool calls causes infinite request loop when using the Responses API

_Disclaimer after encountering this issue I used AI (claude code) to find the issue and debug the issue, I did manually test the workarounds (disabling streaming or disabling parallel_tool_calls) and they do seem to work. I also asked claude code to suggest an actual fix, however I find it difficult to judge whether this is a good fix or not so I will defer to your expertise for that_

# Bug: Streaming + parallel tool calls causes infinite request loop when using the Responses API
When using `openAI.chat` (Responses API) with `stream: true` and a model that emits more than one `function_call` item in a single response, deep-chat enters an infinite request loop. The chat shows a mix of valid responses and repeated "Error, please try again." messages, and the original user message keeps being re-submitted.

### Prerequisites
- `directConnection.openAI.chat` configured with `tools` and `function_handler`
- `connect.stream: true`
- A model/prompt combination where the model makes two or more tool calls in a single response (i.e. `parallel_tool_calls` is not disabled on the API side)

### Steps to reproduce
Configure deep-chat with the Responses API, at least two tools, and `stream: true`:

```
directConnection = {
  openAI: {
    key: '...',
    chat: {
      tools: [
        { type: 'function', strict: true, name: 'get_user_name', ... },
        { type: 'function', strict: true, name: 'get_user_goals', ... },
      ],
      function_handler: (calls) => calls.map(({ name }) => ({ response: '...' })),
    }
  }
};
connect = { stream: true };
```
Send a message that causes the model to invoke **two tools in the same response** (e.g. "_What is my name and what are my open goals?_").
Observe the chat UI.

### Expected behaviour
The two tool calls are resolved, a single follow-up request is made with both results, and the model returns a final text answer.

### Actual behaviour
Two separate follow-up SSE streams are started concurrently — one per tool call. Each follow-up body only contains one of the two tool results (missing the other's context). The API responds to each incomplete context, potentially triggering more tool calls. The process repeats: the original question re-appears in the chat, "Error, please try again." messages accumulate, and the cycle continues until the component is destroyed or the browser tab is closed.

### Suspected Cause
`handleStreamedResponsesFunctionCall` calls `handleResponsesFunctionCalls` — and therefore `makeAnotherRequest` — **once per `response.output_item.done` event**:

```
// openAIChatIO.ts
if (result[TYPE] === `${RESPONSE}.output_item.done`) {
  this._functionStreamInProgress = false;
  if (result.item?.[TYPE] === FUNCTION_CALL) {
    return this.handleResponsesFunctionCalls([result.item], prevBody); // fires per event
  }
}
```
For a response containing two parallel function calls, the SSE stream delivers two `output_item.done` events, so `makeAnotherRequest` is called twice with separate, incomplete bodies. The non-streaming path in `extractResult` does not have this problem — it receives the complete `result.output` array and calls `handleResponsesFunctionCalls` once with all function calls.

### Suggested fix
Accumulate function calls during streaming and flush them all at once when `response.completed` arrives, matching the non-streaming path's atomic behaviour. See the detailed fix in the comments below.

### Workaround
Set` parallel_tool_calls: false` on the chat config. This prevents the model from emitting more than one `function_call` item per response, so `makeAnotherRequest` is only called once per round. It does not fix the underlying library bug.

### Environment
deep-chat version: 9.0.370 (latest at time of writing)

# Further Details

### Root cause in the source
The bug is in 1handleStreamedResponsesFunctionCall1 ([openAIChatIO.ts:218](https://github.com/OvidijusParsiunas/deep-chat/blob/main/component/src/services/openAI/openAIChatIO.ts)):

```
private async handleStreamedResponsesFunctionCall(result: OpenAIResult, prevBody?: OpenAIChat) {
  if (result[TYPE] === `${RESPONSE}.output_item.done`) {
    this._functionStreamInProgress = false;
    if (result.item?.[TYPE] === FUNCTION_CALL) {
      return this.handleResponsesFunctionCalls([result.item], prevBody);  // ← fires per event
    }
  } else if (result[TYPE] === `${RESPONSE}.output_item.added`) {
    this._functionStreamInProgress = true;
  }
  return {[TEXT]: ''};
}
```
The SSE stream for a response with parallel tool calls looks like:

```
response.output_item.added  (call_1)
response.function_call_arguments.delta  ...
response.output_item.done   (call_1)   ← handleResponsesFunctionCalls fires → makeAnotherRequest #1
response.output_item.added  (call_2)
response.function_call_arguments.delta  ...
response.output_item.done   (call_2)   ← handleResponsesFunctionCalls fires → makeAnotherRequest #2
response.completed                      ← currently ignored / falls through to {text:''}
```
Two concurrent follow-up streams are started. Each only carries one tool call's context. The API responds to each partial context — possibly triggering more tool calls — and the loop begins.

The non-streaming path in `extractResult` does it right: it receives `result.output` (the complete `output[]` array from `response.completed`) and calls `handleResponsesFunctionCalls(allCalls, prevBody)` once. The streaming path needs to match that behaviour.

### Suggested fix
Two small changes to `OpenAIChatIO`:

**1. Add a pending-calls accumulator and reset it on each new request:**

```
// new field
private _pendingStreamedFunctionCalls: ResponsesFunctionCall[] = [];

override async callServiceAPI(messages: Messages, pMessages: MessageContentI[]) {
  this._pendingStreamedFunctionCalls = [];   // ← reset; guards against interrupted streams
  this.messages ??= messages;
  // ... rest unchanged
}
```
**2. Stop processing immediately on `output_item.done` — accumulate instead:**

```
private async handleStreamedResponsesFunctionCall(result: OpenAIResult, prevBody?: OpenAIChat) {
  if (result[TYPE] === `${RESPONSE}.output_item.done`) {
    this._functionStreamInProgress = false;
    if (result.item?.[TYPE] === FUNCTION_CALL) {
      this._pendingStreamedFunctionCalls.push(result.item as ResponsesFunctionCall); // ← collect
    }
  } else if (result[TYPE] === `${RESPONSE}.output_item.added`) {
    this._functionStreamInProgress = true;
  }
  return {[TEXT]: ''};
}
```
**3. Flush the accumulated calls on `response.completed` in `extractResult`:**

```
private async extractResult(result: OpenAIResult, prevBody?: OpenAIChat): Promise<ResponseI> {
  if (result[ERROR]) throw result[ERROR].message;

  if (result.status) {
    // ... non-streaming path unchanged
  }

  // NEW: flush all parallel tool calls atomically when the response stream ends
  if (result[TYPE] === `${RESPONSE}.completed` && this._pendingStreamedFunctionCalls[LENGTH] > 0) {
    const pending = this._pendingStreamedFunctionCalls;
    this._pendingStreamedFunctionCalls = [];
    return this.handleResponsesFunctionCalls(pending, prevBody) as Promise<ResponseI>;
  }

  if (result.item?.[TYPE] === FUNCTION_CALL && result[TYPE]) {
    return this.handleStreamedResponsesFunctionCall(result, prevBody);
  }
  // ... rest unchanged
}
```

### Why this is correct
The `response.completed` SSE event fires **before** the server closes the connection. So the timing of `asyncCallInProgress = true` (set inside `callToolFunction`, called from `handleResponsesFunctionCalls`) is still guaranteed to happen before `handleClose` fires — which is exactly what it needs to suppress the original stream's close and hand off to the follow-up stream.

The non-streaming path already does this correctly: it receives the whole `output[]` from `result.output` and calls `handleResponsesFunctionCalls(allCalls)` once. This fix makes the streaming path identical in semantics: collect all function calls from the stream, then process them together in one `makeAnotherRequest`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Streaming + parallel tool calls causes infinite request loop when using the Responses API #509

Bug: Streaming + parallel tool calls causes infinite request loop when using the Responses API

Prerequisites

Steps to reproduce

Expected behaviour

Actual behaviour

Suspected Cause

Suggested fix

Workaround

Environment

Further Details

Root cause in the source

Suggested fix

Why this is correct

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Bug: Streaming + parallel tool calls causes infinite request loop when using the Responses API #509

Description

Bug: Streaming + parallel tool calls causes infinite request loop when using the Responses API

Prerequisites

Steps to reproduce

Expected behaviour

Actual behaviour

Suspected Cause

Suggested fix

Workaround

Environment

Further Details

Root cause in the source

Suggested fix

Why this is correct

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions