Skip to content

Fix AI chat signed file cache busting#21639

Closed
abdulrahmancodes wants to merge 2 commits into
mainfrom
fix/ai-chat-signed-file-cache-busting
Closed

Fix AI chat signed file cache busting#21639
abdulrahmancodes wants to merge 2 commits into
mainfrom
fix/ai-chat-signed-file-cache-busting

Conversation

@abdulrahmancodes

@abdulrahmancodes abdulrahmancodes commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

What

When you upload a file in AI chat, the prompt cache was getting busted on every turn, any thread with a file just never hit the cache.

Why: we store only the fileId and re-signed a fresh url each time the thread was loaded. That signed url gets handed straight to the provider, and since the token changes every turn, the cached conversation prefix changes too and the cache misses.

Fix

Instead of giving the provider a signed URL, we download the file on our side and inline it as base64 bytes right before the model call. The bytes don't change between turns, so the cached prefix stays stable.

Follow-ups / things considered

  • File byte cache (later if needed): we re-download and re-encode the thread's files from storage on every turn. It's fine for now (storage reads are cheap and the provider caches the bytes after the first turn), but if it ever shows up in latency we can cache the bytes by fileId since file content is immutable. Skipped for now to keep this focused.
  • Per-provider Files API (didn't do): the "cleanest" version is to upload each file to the provider once and reference it by their file id, so bytes never get re-sent. We didn't go this route because we support 8 providers, not all of them have a files API, they all differ, and they come with expiry/lifecycle we'd have to manage. base64 is the one representation that works the same across every provider. Worth revisiting if we ever narrow down the provider list.

Review in cubic

…d file handling

- Removed dependency on `FileUrlService` in `AgentChatStreamingService` and simplified message processing by directly mapping database parts to UI message parts.
- Introduced `inlineFilePartsAsBase64` utility to handle inlining of file parts as base64 in `ChatExecutionService`, enhancing file content retrieval and integration.
- Updated `ChatExecutionService` to utilize the new utility for processing messages with inlined file content, improving overall message handling efficiency.
@twenty-ci-bot-public

Copy link
Copy Markdown

👋 Thanks for contributing to Twenty!

Your PR has been set to draft while you work on it. Once you're done, mark it as Ready for review and our automated checks will run.

Looking forward to your contribution!

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 3 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/twenty-server/src/engine/metadata-modules/ai/ai-chat/services/chat-execution.service.ts">

<violation number="1" location="packages/twenty-server/src/engine/metadata-modules/ai/ai-chat/services/chat-execution.service.ts:269">
P1: Pruning decision uses stale `conversationSizeTokens` after file base64 inlining. Large attachments can push actual prompt over model context without triggering compaction.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

};

const rawModelMessages = await convertToModelMessages(processedMessages);
const messagesWithInlinedFiles = await inlineFilePartsAsBase64(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Pruning decision uses stale conversationSizeTokens after file base64 inlining. Large attachments can push actual prompt over model context without triggering compaction.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/twenty-server/src/engine/metadata-modules/ai/ai-chat/services/chat-execution.service.ts, line 269:

<comment>Pruning decision uses stale `conversationSizeTokens` after file base64 inlining. Large attachments can push actual prompt over model context without triggering compaction.</comment>

<file context>
@@ -263,7 +266,19 @@ export class ChatExecutionService {
     };
 
-    const rawModelMessages = await convertToModelMessages(processedMessages);
+    const messagesWithInlinedFiles = await inlineFilePartsAsBase64(
+      processedMessages,
+      (fileId) =>
</file context>

…se64 utility

- Updated `ChatExecutionService` to log warnings when AI chat attachments cannot be loaded, improving error visibility.
- Modified `inlineFilePartsAsBase64` to return a placeholder message for unavailable attachments, ensuring better user feedback in chat messages.
- These changes enhance the robustness of file content retrieval and improve overall user experience in the chat interface.
@twenty-ci-bot-public

Copy link
Copy Markdown

🔍 Automated Pre-Review

No issues detected - This PR is ready for human review.


View details

Automated pre-review — human approval still required.

@etiennejouan

etiennejouan commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Hi @abdulrahmancodes, I'm not sure it's the right approach (but don't sure of the approach we should have). For example pdf files are parsed and provide in an XML shape at Anthropic. It would be great to compare cost and processing time before/after with a few pages pdf.

Wonder if we should not create a readFile tool with specific returning format according to file type

@etiennejouan

etiennejouan commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

@abdulrahmancodes I try to find other way but nothing really interesting for the moment.

I'll test your PR and approved it if works with main providers ! (

AI Chat File Handling — Investigation Summary

The PR (fix/ai-chat-signed-file-cache-busting, #21639)

Before: AI-chat attachments were stored as just a fileId. On every turn the thread loader re-signed a fresh URL (${SERVER_URL}/file/...?token=<JWT>) and handed that signed URL to the LLM provider, which fetched it.

The bug: prompt caching. The JWT token changes every turn → the file part's URL changes → the cached conversation prefix changes → cache miss every turn for any thread containing a file.

The fix: inlineFilePartsAsBase64 downloads the bytes server-side and embeds them as a data: base64 part right before convertToModelMessages. Bytes are deterministic → stable cached prefix. Signing was removed from the message loader.

The Idea Explored: inline once, then inventory by fileId + a fetch_file tool

  1. "Inline only on the first turn" alone doesn't help — LLM APIs are stateless, so the full history (including the original file message) is re-sent every turn. The real optimization is: strip the file from history → keep a lightweight reference → re-deliver via a tool on demand. This is orthogonal to the PR's base64-vs-signed-URL change.

  2. A tool returning a signed URL string is useless — the model can't fetch URLs at inference time. (Correctly spotted: "I can't pass it to the model.")

  3. Correction: at the AI SDK level a tool can return content parts via toModelOutput{ type: 'content', value: [...] } supporting image-data, file-data, file-url, file-id. But this codebase's Tool abstraction only returns plain JSON (ToolOutput), so it would need extending.

AI SDK Deep-Dive (the decisive part)

  • Tool-result content gets no SDK normalization — no URL download, no supportedUrls capability check, no fallback. It's a pure pass-through (mapToolResultOutput only renames the deprecated media type).
  • The download/fallback safety net (downloadAssets) applies only to user-message file/image parts, never tool results.
  • Behavior is therefore 100% per-provider, and Twenty's 8 providers diverge badly:
Provider file-url base64 (image-data/file-data)
OpenAI (default = Responses API) ✅ provider fetches it
Anthropic ✅ provider fetches it ⚠️ image ok; file-data PDF-only else dropped; no file-id
Google
xAI (Responses)
Mistral JSON.stringify'd to text ❌ stringified (token bloat)
openai-compatible (DeepSeek/custom) JSON.stringify'd to text ❌ stringified (token bloat)

Conclusions

  • file-url from a tool is the worst option: where it works it relies on the provider fetching your URL → reintroduces the exact expiry/staleness/cache-busting this PR removed; where it doesn't, it's silently stringified; and the SDK never rescues you.
  • Even base64 from a tool isn't universally safe (Mistral/openai-compatible stringify it; Anthropic drops non-PDF files).
  • This validates the PR's choice: inlining base64 at the message-part level inherits the SDK's mature, uniform handling; moving the same content into a tool result forfeits it.
  • Net guidance: a fetch_file tool to strip files from history is viable only for text output across all providers. Anything binary (images/PDFs) must live in a message part — exactly where this PR put it.
  • The genuinely "clean" alternative (per-provider Files API + file-id) was deliberately skipped by the author due to the 8-provider spread and lifecycle management.

@etiennejouan

Copy link
Copy Markdown
Contributor

Hi @abdulrahmancodes it works well. And it has same perf than main, good point. But I've tried to make the cache crash uploading many files, and don't succeed busting the cache. Seems weird but as initial issue is not confirmed, we should close.

Then, you're approach is still interesting for user where server is not publicly reachable.

@abdulrahmancodes

Copy link
Copy Markdown
Contributor Author

@etiennejouan When I tested the cache busting, it seemed to be working, but I'll take another look to double-check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants