🌊 Resumable Streams
LibreChat features a resilient streaming architecture that ensures you never lose AI-generated content. Whether your connection drops, you switch tabs, or you pick up on another device, your responses are always preserved and synchronized.
Why It Matters
Traditional chat applications lose all streaming content when your connection drops. With resumable streams, LibreChat:
- Preserves every response — Network hiccups, browser refreshes, or server restarts won’t cause data loss
- Keeps multiple tabs in sync — Open the same conversation in two browser tabs and watch them update together in real-time
- Enables seamless device switching — Start a conversation on your desktop and continue on your phone
- Lets you multitask freely — Start a generation, browse other tabs, and come back to a complete response
How It Works
When you send a message to an AI model, LibreChat creates a generation job that tracks all streamed content. The magic happens when something interrupts your connection:
- Automatic detection — The client detects the disconnection instantly
- State reconstruction — Upon reconnecting, the server rebuilds all previously streamed content
- Seamless sync — Missing content is delivered via a sync event
- Transparent continuation — Streaming resumes from the current position
This all happens automatically—no user action required.
Multi-Tab & Multi-Device Experience
One of the most powerful aspects of resumable streams is real-time synchronization:
- Same chat, multiple windows — Open a conversation in two browser tabs and both receive updates simultaneously
- Cross-device continuity — Start a long generation on your laptop, then check the result on your phone
- Team collaboration — In shared conversations, all viewers see content appear in real-time
Deployment Modes
LibreChat supports two deployment configurations:
Single-Instance Mode (Default)
Uses in-memory storage with Node.js EventEmitter for pub/sub. Perfect for:
- Local development
- Single-server deployments
- Docker Compose setups
No configuration required — Works out of the box.
Redis Mode (Production)
Uses Redis Streams and Pub/Sub for cross-instance communication. Essential for:
- Horizontally scaled deployments
- Load-balanced production environments
- High-availability setups
- Kubernetes clusters
With Redis mode, a user can start a generation on one server instance and seamlessly resume on another—perfect for rolling deployments and auto-scaling.
Note: If you only run a single LibreChat instance, Redis for resumable streams is typically unnecessary—the in-memory mode handles everything. Redis becomes valuable when you have multiple LibreChat instances behind a load balancer. That said, Redis is still useful for other features like caching and session storage even in single-instance deployments.
Configuration
Enabling Redis Streams
When Redis is enabled (USE_REDIS=true), resumable streams automatically use Redis. You can also explicitly enable it:
USE_REDIS=true
REDIS_URI=redis://localhost:6379
# Resumable streams will use Redis automatically when USE_REDIS=true
# To explicitly control it:
USE_REDIS_STREAMS=trueRedis Cluster Support
For Redis Cluster deployments:
USE_REDIS_STREAMS=true
USE_REDIS_CLUSTER=true
REDIS_URI=redis://node1:7001,redis://node2:7002,redis://node3:7003LibreChat automatically uses hash-tagged keys to ensure multi-key operations stay within the same cluster slot.
Use Cases
Unstable Networks
On spotty WiFi or cellular connections, responses automatically resume when connectivity returns. No need to re-send your prompt.
Mobile Users
Switch from WiFi to cellular (or vice versa) without losing your response. The stream picks up exactly where it left off.
Long-Running Generations
For complex prompts that generate lengthy responses, feel free to check other tabs or apps. Your response will be waiting when you return.
Multi-Device Workflows
Start a conversation on your work computer, commute home, and check the result on your phone—the full response is there.
Production Deployments
Scale horizontally across multiple server instances while maintaining stream continuity. Rolling deployments won’t interrupt active generations.
Technical Details
Content Reconstruction
The system aggregates all streamed delta events to rebuild:
- Message content (text, tool calls, citations)
- Agent run steps and intermediate reasoning
- Metadata and state information
Performance Optimizations
Memory-first approach: When reconnecting to the same server instance, LibreChat uses local cache for zero-latency content recovery, avoiding unnecessary Redis round trips.
Automatic cleanup: Stale job entries are removed during queries to prevent memory leaks. Completed streams expire automatically.
Efficient storage: In-memory mode uses WeakRef for graph storage, enabling automatic garbage collection when conversations end.
Data Flow
| Component | Storage Mechanism |
|---|---|
| Chunks | Redis Streams (XADD/XRANGE) |
| Job metadata | Redis Hash structures |
| Real-time events | Redis Pub/Sub channels |
| Expiration | Automatic TTL after stream completion |
Testing Resumable Streams
You can verify the feature is working:
- Start a streaming conversation with any AI model
- Tab test: Open the same chat in a new browser tab—both should sync
- Disconnect test: Turn off your network briefly, then reconnect
- Navigation test: Navigate away mid-stream, then return
In all cases, you should see the complete response with no data loss.
Troubleshooting
Streams not resuming?
Check Redis connectivity:
docker exec -it librechat-redis redis-cli ping
# Should return: PONGVerify environment variables:
# Ensure USE_REDIS_STREAMS is set
echo $USE_REDIS_STREAMSContent appears duplicated?
This typically indicates a client version mismatch. Ensure you’re running the latest version of LibreChat.
High memory usage in single-instance mode?
Completed streams are automatically garbage collected. If you’re seeing high memory usage, check for:
- Very long-running streams that haven’t completed
- Streams that errored without proper cleanup
Related Documentation
- Redis Configuration — Setting up Redis for caching and horizontal scaling
- Agents — AI agents with tool use capabilities
- Docker Deployment — Container-based deployment guide
For implementation details, see PR #10926.