Resumable Streams

Never lose an AI response—seamless reconnection, multi-tab sync, and horizontal scaling for production deployments

LibreChat features a resilient streaming architecture that ensures you never lose AI-generated content. Whether your connection drops, you switch tabs, or you pick up on another device, your responses are always preserved and synchronized.

Why It Matters

Traditional chat applications lose all streaming content when your connection drops. With resumable streams, LibreChat:

Preserves every response — Network hiccups, browser refreshes, or server restarts won't cause data loss
Keeps multiple tabs in sync — Open the same conversation in two browser tabs and watch them update together in real-time
Enables seamless device switching — Start a conversation on your desktop and continue on your phone
Lets you multitask freely — Start a generation, browse other tabs, and come back to a complete response

How It Works

When you send a message to an AI model, LibreChat creates a generation job that tracks all streamed content. The magic happens when something interrupts your connection:

Automatic detection — The client detects the disconnection instantly
State reconstruction — Upon reconnecting, the server rebuilds all previously streamed content
Seamless sync — Missing content is delivered via a sync event
Transparent continuation — Streaming resumes from the current position

This all happens automatically—no user action required.

Multi-Tab & Multi-Device Experience

One of the most powerful aspects of resumable streams is real-time synchronization:

Same chat, multiple windows — Open a conversation in two browser tabs and both receive updates simultaneously
Cross-device continuity — Start a long generation on your laptop, then check the result on your phone
Team collaboration — In shared conversations, all viewers see content appear in real-time

Deployment Modes

LibreChat supports two deployment configurations:

Single-Instance Mode (Default)

Uses in-memory storage with Node.js EventEmitter for pub/sub. Perfect for:

Local development
Single-server deployments
Docker Compose setups

No configuration required — Works out of the box.

Redis Mode (Production)

Uses Redis Streams and Pub/Sub for cross-instance communication. Essential for:

Horizontally scaled deployments
Load-balanced production environments
High-availability setups
Kubernetes clusters

With Redis mode, a user can start a generation on one server instance and seamlessly resume on another—perfect for rolling deployments and auto-scaling.

Note: If you only run a single LibreChat instance, Redis for resumable streams is typically unnecessary—the in-memory mode handles everything. Redis becomes valuable when you have multiple LibreChat instances behind a load balancer. That said, Redis is still useful for other features like caching and session storage even in single-instance deployments.

Configuration

Enabling Redis Streams

When Redis is enabled (USE_REDIS=true), resumable streams automatically use Redis. You can also explicitly enable it:

USE_REDIS=true
REDIS_URI=redis://localhost:6379
# Resumable streams will use Redis automatically when USE_REDIS=true
# To explicitly control it:
USE_REDIS_STREAMS=true

Redis Cluster Support

For Redis Cluster deployments:

USE_REDIS_STREAMS=true
USE_REDIS_CLUSTER=true
REDIS_URI=redis://node1:7001,redis://node2:7002,redis://node3:7003

LibreChat automatically uses hash-tagged keys to ensure multi-key operations stay within the same cluster slot.

Use Cases

Unstable Networks

On spotty WiFi or cellular connections, responses automatically resume when connectivity returns. No need to re-send your prompt.

Mobile Users

Switch from WiFi to cellular (or vice versa) without losing your response. The stream picks up exactly where it left off.

Long-Running Generations

For complex prompts that generate lengthy responses, feel free to check other tabs or apps. Your response will be waiting when you return.

Multi-Device Workflows

Start a conversation on your work computer, commute home, and check the result on your phone—the full response is there.

Production Deployments

Scale horizontally across multiple server instances while maintaining stream continuity. Rolling deployments won't interrupt active generations.

Technical Details

Content Reconstruction

The system aggregates all streamed delta events to rebuild:

Message content (text, tool calls, citations)
Agent run steps and intermediate reasoning
Metadata and state information

Performance Optimizations

Memory-first approach: When reconnecting to the same server instance, LibreChat uses local cache for zero-latency content recovery, avoiding unnecessary Redis round trips.

Automatic cleanup: Stale job entries are removed during queries to prevent memory leaks. Completed streams expire automatically.

Efficient storage: In-memory mode uses WeakRef for graph storage, enabling automatic garbage collection when conversations end.

Data Flow

Component	Storage Mechanism
Chunks	Redis Streams (`XADD`/`XRANGE`)
Job metadata	Redis Hash structures
Real-time events	Redis Pub/Sub channels
Expiration	Automatic TTL after stream completion

Testing Resumable Streams

You can verify the feature is working:

Start a streaming conversation with any AI model
Tab test: Open the same chat in a new browser tab—both should sync
Disconnect test: Turn off your network briefly, then reconnect
Navigation test: Navigate away mid-stream, then return

In all cases, you should see the complete response with no data loss.

Troubleshooting

Streams not resuming?

Check Redis connectivity:

docker exec -it librechat-redis redis-cli ping
# Should return: PONG

Verify environment variables:

# Ensure USE_REDIS_STREAMS is set
echo $USE_REDIS_STREAMS

Content appears duplicated?

This typically indicates a client version mismatch. Ensure you're running the latest version of LibreChat.

High memory usage in single-instance mode?

Completed streams are automatically garbage collected. If you're seeing high memory usage, check for:

Very long-running streams that haven't completed
Streams that errored without proper cleanup

Redis Configuration — Setting up Redis for caching and horizontal scaling
Agents — AI agents with tool use capabilities
Docker Deployment — Container-based deployment guide

For implementation details, see PR #10926.

Resumable Streams

On this page