User Memory
Key/value store for user memory that runs on every chat request in LibreChat
Overview
User Memory in LibreChat is a key/value store that persists user-specific information across conversations. A dedicated memory agent runs at the start of every chat request, reading from and writing to this store to provide personalized context to the main AI response.
Key/Value Store, Not Conversation Memory
This is not semantic memory over your entire conversation history. It does not index, embed, or search past conversations. Instead, it maintains a structured set of key/value pairs (e.g., user_preferences, learned_facts) that are injected into each request as context. Think of it as a persistent notepad the AI reads before every response.
For context about previous messages within a single conversation, LibreChat already uses the standard message history window — that is separate from this feature.
⚠️ Configuration Required
Memory functionality must be explicitly configured in your librechat.yaml file to work. It is not enabled by default.
Key Features
- Runs Every Request: The memory agent executes at the start of each chat request, ensuring stored context is always available
- Key/Value Storage: Information is stored as structured key/value pairs, not as raw conversation logs
- Manual Entries: Users can manually add, edit, or remove memory entries directly, giving full control over what the AI remembers
- User Control: When enabled, users can toggle memory on/off for their individual chats
- Customizable Keys: Restrict what categories of information can be stored using
validKeys - Token Management: Set limits on memory usage to control costs
- Agent Integration: Use AI agents to intelligently manage what gets remembered
Configuration
To enable memory features, you need to add the memory configuration to your librechat.yaml file:
The provider field should match the accepted values as defined in the Model Spec Guide.
Note: If you are using a custom endpoint, the endpoint value must match the defined custom endpoint name exactly.
See the Memory Configuration Guide for detailed configuration options.
How It Works
Memory Agent Execution
The memory agent runs on every chat request when memory is enabled. It executes concurrently with the main chat response — it begins before the main response starts and is limited to the duration of the main request plus up to 3 seconds after it finishes.
This means every message you send triggers the memory agent to:
- Read the current key/value store and inject relevant entries as context
- Analyze the recent message window for information worth storing or updating
- Write any new or modified entries back to the store
1. Key/Value Storage
Memory entries are stored as key/value pairs. When memory is enabled, the system can store entries such as:
- User preferences (communication style, topics of interest)
- Important facts explicitly shared by users
- Ongoing projects or tasks mentioned
- Any category you define via
validKeys
Users can also manually create, edit, and delete memory entries through the interface, giving direct control over what the AI knows about them.
2. Context Window
The messageWindowSize parameter determines how many recent messages are analyzed for memory updates. This helps the memory agent decide what information is worth storing or updating in the key/value store.
3. User Control
When personalize is set to true:
- Users see a memory toggle in their chat interface
- They can enable/disable memory for individual conversations
- Memory settings persist across sessions
4. Valid Keys
You can restrict what categories of information are stored by specifying validKeys:
Best Practices
1. Token Limits
Set appropriate token limits to balance functionality with cost:
- Higher limits allow more comprehensive memory
- Lower limits reduce processing costs
- Consider your usage patterns and budget
2. Custom Instructions
When using validKeys, provide custom instructions to the memory agent:
3. Privacy Considerations
- Memory stores user information across conversations
- Ensure users understand what information is being stored
- Consider implementing data retention policies
- Provide clear documentation about memory usage
Examples
Basic Configuration
Enable memory with default settings:
Advanced Configuration
Full configuration with all options:
For valid model parameters per provider, see the Model Spec Preset Fields.
Using Predefined Agents
Reference an existing agent by ID:
Custom Endpoints with Memory
Memory fully supports custom endpoints, including those with custom headers and environment variables. When using a custom endpoint, header placeholders and environment variables are properly resolved during memory processing.
- All custom endpoint headers are supported
Troubleshooting
Memory Not Working
- Verify memory is configured in
librechat.yaml - Check that
disabledis set tofalse - Ensure the configured agent/model is available
- Verify users have enabled memory in their chat interface
- For custom endpoints: ensure the
providername matches the custom endpointnameexactly
High Token Usage
- Reduce
tokenLimitto control costs - Decrease
messageWindowSizeto analyze fewer messages - Use
validKeysto restrict what gets stored - Review and optimize agent instructions
Inconsistent Memory
- Check if users are toggling memory on/off
- Verify token limits aren't being exceeded
- Ensure consistent agent configuration
- Review stored memory for conflicts
Custom Endpoint Authentication Issues
- Verify environment variables are set correctly in your
.envfile - Ensure custom headers use the correct syntax (
${ENV_VAR}for environment variables,{{LIBRECHAT_USER_*}}for user placeholders) - Check that the custom endpoint is working for regular chat completions before testing with memory
- Review server logs for authentication errors from the custom endpoint API
Future Improvements
The current implementation runs the memory agent on every chat request unconditionally. Planned improvements include:
- Semantic Trigger for Writes: Detect when a user has explicitly asked the model to remember something (e.g., "Remember that I prefer Python") and only run the memory write agent in those cases, reducing unnecessary processing on routine messages.
- Vector Similarity Recall: Instead of injecting all stored memory entries into every request, use vector embeddings to retrieve only the entries most relevant to the current conversation context, improving both efficiency and relevance.
Related Features
- Agents - Build custom AI assistants
- Presets - Save conversation settings
- Fork Messages - Branch conversations while maintaining context
How is this guide?