Docs
✨ Features
Resumable Streams

🌊 Resumable Streams

LibreChat features a resilient streaming architecture that ensures you never lose AI-generated content. Whether your connection drops, you switch tabs, or you pick up on another device, your responses are always preserved and synchronized.

Why It Matters

Traditional chat applications lose all streaming content when your connection drops. With resumable streams, LibreChat:

  • Preserves every response — Network hiccups, browser refreshes, or server restarts won’t cause data loss
  • Keeps multiple tabs in sync — Open the same conversation in two browser tabs and watch them update together in real-time
  • Enables seamless device switching — Start a conversation on your desktop and continue on your phone
  • Lets you multitask freely — Start a generation, browse other tabs, and come back to a complete response

How It Works

When you send a message to an AI model, LibreChat creates a generation job that tracks all streamed content. The magic happens when something interrupts your connection:

  1. Automatic detection — The client detects the disconnection instantly
  2. State reconstruction — Upon reconnecting, the server rebuilds all previously streamed content
  3. Seamless sync — Missing content is delivered via a sync event
  4. Transparent continuation — Streaming resumes from the current position

This all happens automatically—no user action required.

Multi-Tab & Multi-Device Experience

One of the most powerful aspects of resumable streams is real-time synchronization:

  • Same chat, multiple windows — Open a conversation in two browser tabs and both receive updates simultaneously
  • Cross-device continuity — Start a long generation on your laptop, then check the result on your phone
  • Team collaboration — In shared conversations, all viewers see content appear in real-time

Deployment Modes

LibreChat supports two deployment configurations:

Single-Instance Mode (Default)

Uses in-memory storage with Node.js EventEmitter for pub/sub. Perfect for:

  • Local development
  • Single-server deployments
  • Docker Compose setups

No configuration required — Works out of the box.

Redis Mode (Production)

Uses Redis Streams and Pub/Sub for cross-instance communication. Essential for:

  • Horizontally scaled deployments
  • Load-balanced production environments
  • High-availability setups
  • Kubernetes clusters

With Redis mode, a user can start a generation on one server instance and seamlessly resume on another—perfect for rolling deployments and auto-scaling.

Note: If you only run a single LibreChat instance, Redis for resumable streams is typically unnecessary—the in-memory mode handles everything. Redis becomes valuable when you have multiple LibreChat instances behind a load balancer. That said, Redis is still useful for other features like caching and session storage even in single-instance deployments.

Configuration

Enabling Redis Streams

When Redis is enabled (USE_REDIS=true), resumable streams automatically use Redis. You can also explicitly enable it:

.env
USE_REDIS=true
REDIS_URI=redis://localhost:6379
# Resumable streams will use Redis automatically when USE_REDIS=true
# To explicitly control it:
USE_REDIS_STREAMS=true

Redis Cluster Support

For Redis Cluster deployments:

.env
USE_REDIS_STREAMS=true
USE_REDIS_CLUSTER=true
REDIS_URI=redis://node1:7001,redis://node2:7002,redis://node3:7003

LibreChat automatically uses hash-tagged keys to ensure multi-key operations stay within the same cluster slot.

Use Cases

Unstable Networks

On spotty WiFi or cellular connections, responses automatically resume when connectivity returns. No need to re-send your prompt.

Mobile Users

Switch from WiFi to cellular (or vice versa) without losing your response. The stream picks up exactly where it left off.

Long-Running Generations

For complex prompts that generate lengthy responses, feel free to check other tabs or apps. Your response will be waiting when you return.

Multi-Device Workflows

Start a conversation on your work computer, commute home, and check the result on your phone—the full response is there.

Production Deployments

Scale horizontally across multiple server instances while maintaining stream continuity. Rolling deployments won’t interrupt active generations.

Technical Details

Content Reconstruction

The system aggregates all streamed delta events to rebuild:

  • Message content (text, tool calls, citations)
  • Agent run steps and intermediate reasoning
  • Metadata and state information

Performance Optimizations

Memory-first approach: When reconnecting to the same server instance, LibreChat uses local cache for zero-latency content recovery, avoiding unnecessary Redis round trips.

Automatic cleanup: Stale job entries are removed during queries to prevent memory leaks. Completed streams expire automatically.

Efficient storage: In-memory mode uses WeakRef for graph storage, enabling automatic garbage collection when conversations end.

Data Flow

ComponentStorage Mechanism
ChunksRedis Streams (XADD/XRANGE)
Job metadataRedis Hash structures
Real-time eventsRedis Pub/Sub channels
ExpirationAutomatic TTL after stream completion

Testing Resumable Streams

You can verify the feature is working:

  1. Start a streaming conversation with any AI model
  2. Tab test: Open the same chat in a new browser tab—both should sync
  3. Disconnect test: Turn off your network briefly, then reconnect
  4. Navigation test: Navigate away mid-stream, then return

In all cases, you should see the complete response with no data loss.

Troubleshooting

Streams not resuming?

Check Redis connectivity:

docker exec -it librechat-redis redis-cli ping
# Should return: PONG

Verify environment variables:

# Ensure USE_REDIS_STREAMS is set
echo $USE_REDIS_STREAMS

Content appears duplicated?

This typically indicates a client version mismatch. Ensure you’re running the latest version of LibreChat.

High memory usage in single-instance mode?

Completed streams are automatically garbage collected. If you’re seeing high memory usage, check for:

  • Very long-running streams that haven’t completed
  • Streams that errored without proper cleanup

Related Documentation


For implementation details, see PR #10926.