Image Generation & Editing
Comprehensive guide to LibreChat's built-in image generation and editing tools
LibreChat ships with built-in image tools that you add to an Agent. Each tool has its own model, price point, and setup, usually just an API key or a URL. There is no separate image page: you generate or edit images by chatting with an Agent that has an image tool enabled.
How image generation works
Upload an image when you want an edit, or send a plain text prompt when you want a new image. Generated images follow the configured fileStrategy and the tool output is sent to the LLM as part of the chat context immediately after generation.
Quick Start
Get image generation working in a few minutes with OpenAI Image Tools.
Create an agent. Select Agents from the endpoint menu, open the Agent Builder from the side panel, and create a new agent. Give it a name like "Image Creator".
Add OpenAI Image Tools. Open the agent's Tools list, select OpenAI Image Tools, and save the agent. This adds both image generation and image editing capabilities.
Set your API key. Add the following to your .env file:
Restart and test. Restart LibreChat, then send a message like "Generate an image of a sunset over mountains" to your agent.
| Deployment | Command |
|---|---|
| Docker | docker compose down && docker compose up -d |
| Local | Stop (Ctrl+C) then npm run backend |
Good to know
- API keys can be omitted to let users enter their own key from the UI.
- Image outputs are sent to the LLM only immediately after generation, not on every message. The LLM otherwise gets vision context only from images attached to user messages. See Image Storage and Handling.
- MCP server tools can also output images, though they may not always use the correct format. See the MCP section.
OpenAI Image Tools
"OpenAI Image Tools" is an agent toolkit made up of two separate tools:
- Image Generation creates brand-new images from text prompts (no upload required).
- Image Editing edits or remixes images you uploaded: change colors, add objects, extend the canvas, and more.
Both default to GPT-Image-1 for instruction following, text rendering, detailed editing, and real-world knowledge. Use IMAGE_GEN_OAI_MODEL to choose a different OpenAI image model when your deployment supports it. See OpenAI's Image Generation documentation for more details.
Generation vs. Editing
| Use case | Invokes |
|---|---|
| "Start from scratch" | Image Generation |
| "Use existing image(s)" | Image Editing |
Both tools are always available, and the agent chooses the appropriate one based on the request:
- Image Generation creates new images from text descriptions only.
- Image Editing modifies or remixes existing images using their image IDs. These can be images from the current message or previously generated and referenced images. The LLM keeps track of image IDs as long as they remain in the context window and includes them in the tool output.
Image editing relies on image IDs
- Image IDs are retained in the chat history. When files are uploaded to the current request, their IDs are added to the LLM's context before any tokens are generated.
- Previously referenced or generated image IDs can be used for editing as long as they remain within the context window. The LLM includes any relevant IDs in the
image_idsarray when calling the editing tool. - You can attach previously uploaded images from the side panel without uploading them again. This also gives a vision model the image context, which can help inform the
promptfor the editing tool.
Parameters
Image Generation
- prompt: text description (required)
- size:
auto(default),1024x1024(square),1536x1024(landscape), or1024x1536(portrait) - quality:
auto(default),high,medium, orlow - background:
auto(default),transparent, oropaque(transparent requires PNG or WebP format)
Image Editing
- image_ids: array of image IDs to use as reference for editing (required)
- prompt: text description of the changes (required)
- size:
auto(default),1024x1024,1536x1024,1024x1536,256x256, or512x512 - quality:
auto(default),high,medium, orlow
Setup
Create or reuse an OpenAI key and add it to .env, then add "OpenAI Image Tools" to your agent's Tools list:
For Azure OpenAI deployments, first request access at https://aka.ms/oai/gptimage1access, then add your credentials to .env:
Advanced Configuration
Customize the tool descriptions and prompt guidance with these environment variables:
Pricing
See the GPT-Image-1 pricing page and Image Generation documentation for image generation costs.
Gemini Image Tools
Gemini Image Tools integrate Google's latest image generation models, supporting both text-to-image generation and image context-aware editing.
- Text-to-image generation: create high-quality images from detailed text descriptions.
- Image context support: use existing images as context or inspiration for new generations.
- Image editing: generate new images based on modifications to existing ones (include the original image ID).
- Multiple models: choose
gemini-2.5-flash-image(default) orgemini-3-pro-image-preview. - Dual API support: works with both simple Gemini API keys and Google Cloud Vertex AI.
Parameters
- prompt: detailed text description of the desired image (required, up to 32,000 characters)
- image_ids: optional array of image IDs to use as visual context for generation
Setup
For the Gemini API, get a key from Google AI Studio:
For Vertex AI (Google Cloud users with Vertex AI access):
Model Selection
Advanced Configuration
Customize tool descriptions via environment variables:
More details are in the dedicated Gemini Image Gen guide.
DALL·E (legacy)
DALL·E provides legacy image generation using OpenAI's dall-e-3 image model.
Parameters
- prompt: text description of the desired image (required, up to 4000 characters)
- style:
vivid(hyper-real, dramatic, default) ornatural(less hyper-real) - quality:
standard(default) orhd - size:
1024x1024(default, square),1792x1024(wide), or1024x1792(tall)
Setup
Enable the DALL·E tool for the agent and start prompting.
Advanced Configuration
For Azure OpenAI deployments, configure the base URL and API version:
Pricing
See the DALL-E pricing page and Image Generation documentation for image generation costs.
Stable Diffusion (local)
Run images entirely on your own machine or server. Point LibreChat at any Automatic1111 (or compatible) endpoint and you're set.
Parameters
- prompt: detailed keywords describing desired elements in the image (required)
- negative_prompt: keywords describing elements to exclude from the image (required)
The Stable Diffusion implementation uses these fixed default parameters, which produce good results for most use cases:
- cfg_scale: 4.5
- steps: 22
- width: 1024
- height: 1024
Setup
No API key is required, just the reachable URL of your Automatic1111 WebUI:
More details on setting up Automatic1111 are in the dedicated Stable Diffusion guide.
Flux
Cloud generator with an emphasis on speed and optional fine-tuned models.
- Fast cloud-based image generation
- Support for fine-tuned models
- Multiple quality levels and aspect ratios
- Raw mode for less processed, more natural-looking images
Parameters
The Flux tool supports three main actions:
- generate: create a new image from a text prompt
- generate_finetuned: create an image using a fine-tuned model
- list_finetunes: list available custom models for the user
More details are in the dedicated Flux guide.
Setup
Choose the Flux tool inside the agent. Prompts are plain text, and one call produces one image.
Pricing
See the Flux pricing page for image generation costs.
Model Context Protocol (MCP)
Image outputs are supported from MCP servers. For example, the Puppeteer MCP Server can generate screenshots of web pages, which output the image in the expected format and are treated the same as LibreChat's built-in image tools.
MCP image support is still emerging
- The examples below assume LibreChat runs outside of Docker, directly using Node.js. The Model Context Protocol is a relatively new framework, and many developers are still learning how to serve their systems with uv/node for scalable distribution.
- Few image-generating servers exist, and many have yet to adopt the correct response format for images.
- While many MCP servers function well within Docker, the following examples do not, or not without more advanced configurations, showing some of the current inconsistency between MCP servers.
The following is an example of an Image Generation server that outputs images using the Replicate API, but returns URLs of the images, which doesn't conform to MCP's image response standard.
Global install required
For this particular server, install the @gongrzhe/image-gen-server package globally with npm install -g @gongrzhe/image-gen-server, then point to the package's compiled files as shown below.
Image Storage and Handling
All generated images are:
- Saved according to the configured
fileStrategy - Displayed directly in the chat interface
- Sent to the LLM as part of the immediate chat context following generation
A few caveats apply to that last point:
- This may cause issues with an LLM that does not support image inputs. An option to disable the behavior per agent is planned.
- Outputs are sent to the LLM only upon generation, not on every message.
- To include an image in later turns, attach it to the message from the side panel.
- In short, the LLM gets vision context only from images attached to user messages, and from generations or edits immediately after they happen.
Proxy Support
All image generation tools support proxy configuration through the PROXY environment variable:
When PROXY is unset, supported server-side clients honor HTTP_PROXY, HTTPS_PROXY, and NO_PROXY/no_proxy.
Error Handling
If a tool encounters an error, it returns a message explaining what went wrong. Common issues include:
- Invalid API key
- API unavailability
- Content policy violations
- Proxy/network issues
- Invalid parameters
- Unsupported image payload (see Image Storage and Handling above)
Prompting
You can customize the prompts for OpenAI Image Tools and DALL·E, but the following tips inform the default prompts the tools supply, which is useful to know for your own writing:
- Start with the subject and style (photo, oil painting, etc.).
- Add composition and camera/medium ("wide-angle shot of…", "watercolour…").
- Mention lighting and mood ("golden hour", "dramatic shadows").
- Finish with detail keywords (textures, colours, expressions).
- Keep negatives positive: describe what should be included, not what to avoid.
Example:
A cinematic photo of an antique library bathed in warm afternoon sunlight. Tall wooden shelves overflow with leather-bound books, and dust particles shimmer in the light. A single green-shaded banker's lamp illuminates an open atlas on a polished mahogany desk in the foreground. 85 mm lens, shallow depth of field, rich amber tones, ultra-high detail.
Related Pages
How is this guide?