Summary
Streaming responses live only in RAM until generation completes. If the app or model crashes mid-generation, the partial output is lost even though the user could see it on screen. Even a partial response is better than losing everything and regenerating from scratch.
Problem
- User chats with a model
- Model generates a long response (streaming, token by token)
- App crashes mid-generation (background kill, model crash, OOM)
- User reopens the app
- The partial response that was already visible is gone
- User has to regenerate from scratch
Root cause: streaming text is stored in Zustand state (RAM). It only persists to AsyncStorage when finalizeStreamingMessage() runs after generation completes. If the app crashes before that, the function never fires and the output is lost.
Proposed Solution
Auto-save the streaming buffer to AsyncStorage every ~100 tokens during generation. On next app launch, detect the partial response and offer recovery.
During streaming:
- Every 100 tokens, write current streaming content to a partial_response key in AsyncStorage (async, non-blocking)
- On successful generation complete, delete partial_response
On app launch:
- Check for partial_response in AsyncStorage
- If found, prompt: "A response was interrupted. Recover partial output?"
- Recover appends the partial message to the conversation
- Discard deletes it
Files involved
- src/stores/chatStore.ts - streamingMessage and appendStreamingToken
- App entry point - recovery check on launch
Reported by
Community member on Slack. Original feedback: partial output should auto-save to storage instead of staying solely in RAM. Even a partial output is better than having to regenerate everything.
Platform
Summary
Streaming responses live only in RAM until generation completes. If the app or model crashes mid-generation, the partial output is lost even though the user could see it on screen. Even a partial response is better than losing everything and regenerating from scratch.
Problem
Root cause: streaming text is stored in Zustand state (RAM). It only persists to AsyncStorage when finalizeStreamingMessage() runs after generation completes. If the app crashes before that, the function never fires and the output is lost.
Proposed Solution
Auto-save the streaming buffer to AsyncStorage every ~100 tokens during generation. On next app launch, detect the partial response and offer recovery.
During streaming:
On app launch:
Files involved
Reported by
Community member on Slack. Original feedback: partial output should auto-save to storage instead of staying solely in RAM. Even a partial output is better than having to regenerate everything.
Platform