Skip to main content
LibreChat is joining ClickHouse to power the open-source Agentic Data Stack 🎉 Learn more
LibreChat

Image Generation & Editing

Comprehensive guide to LibreChat's built-in image generation and editing tools

LibreChat ships with built-in image tools that you add to an Agent. Each tool has its own model, price point, and setup, usually just an API key or a URL. There is no separate image page: you generate or edit images by chatting with an Agent that has an image tool enabled.

How image generation works

Upload an image when you want an edit, or send a plain text prompt when you want a new image. Generated images follow the configured fileStrategy and the tool output is sent to the LLM as part of the chat context immediately after generation.

Quick Start

Get image generation working in a few minutes with OpenAI Image Tools.

Create an agent. Select Agents from the endpoint menu, open the Agent Builder from the side panel, and create a new agent. Give it a name like "Image Creator".

Add OpenAI Image Tools. Open the agent's Tools list, select OpenAI Image Tools, and save the agent. This adds both image generation and image editing capabilities.

Set your API key. Add the following to your .env file:

IMAGE_GEN_OAI_API_KEY=sk-your-openai-api-key
# Optional; defaults to gpt-image-1
IMAGE_GEN_OAI_MODEL=gpt-image-1

Restart and test. Restart LibreChat, then send a message like "Generate an image of a sunset over mountains" to your agent.

DeploymentCommand
Dockerdocker compose down && docker compose up -d
LocalStop (Ctrl+C) then npm run backend

Good to know

  • API keys can be omitted to let users enter their own key from the UI.
  • Image outputs are sent to the LLM only immediately after generation, not on every message. The LLM otherwise gets vision context only from images attached to user messages. See Image Storage and Handling.
  • MCP server tools can also output images, though they may not always use the correct format. See the MCP section.

OpenAI Image Tools

"OpenAI Image Tools" is an agent toolkit made up of two separate tools:

  • Image Generation creates brand-new images from text prompts (no upload required).
  • Image Editing edits or remixes images you uploaded: change colors, add objects, extend the canvas, and more.

Both default to GPT-Image-1 for instruction following, text rendering, detailed editing, and real-world knowledge. Use IMAGE_GEN_OAI_MODEL to choose a different OpenAI image model when your deployment supports it. See OpenAI's Image Generation documentation for more details.

Generation vs. Editing

Use caseInvokes
"Start from scratch"Image Generation
"Use existing image(s)"Image Editing

Both tools are always available, and the agent chooses the appropriate one based on the request:

  • Image Generation creates new images from text descriptions only.
  • Image Editing modifies or remixes existing images using their image IDs. These can be images from the current message or previously generated and referenced images. The LLM keeps track of image IDs as long as they remain in the context window and includes them in the tool output.

Image editing relies on image IDs

  • Image IDs are retained in the chat history. When files are uploaded to the current request, their IDs are added to the LLM's context before any tokens are generated.
  • Previously referenced or generated image IDs can be used for editing as long as they remain within the context window. The LLM includes any relevant IDs in the image_ids array when calling the editing tool.
  • You can attach previously uploaded images from the side panel without uploading them again. This also gives a vision model the image context, which can help inform the prompt for the editing tool.

Parameters

Image Generation

  • prompt: text description (required)
  • size: auto (default), 1024x1024 (square), 1536x1024 (landscape), or 1024x1536 (portrait)
  • quality: auto (default), high, medium, or low
  • background: auto (default), transparent, or opaque (transparent requires PNG or WebP format)

Image Editing

  • image_ids: array of image IDs to use as reference for editing (required)
  • prompt: text description of the changes (required)
  • size: auto (default), 1024x1024, 1536x1024, 1024x1536, 256x256, or 512x512
  • quality: auto (default), high, medium, or low

Setup

Create or reuse an OpenAI key and add it to .env, then add "OpenAI Image Tools" to your agent's Tools list:

IMAGE_GEN_OAI_API_KEY=sk-...
# optional extras
IMAGE_GEN_OAI_MODEL=gpt-image-1
IMAGE_GEN_OAI_BASEURL=https://...

For Azure OpenAI deployments, first request access at https://aka.ms/oai/gptimage1access, then add your credentials to .env:

IMAGE_GEN_OAI_API_KEY=your-api-key
# optional extras
IMAGE_GEN_OAI_MODEL=gpt-image-1
IMAGE_GEN_OAI_BASEURL=https://deploymentname.openai.azure.com/openai/deployments/gpt-image-1/
IMAGE_GEN_OAI_AZURE_API_VERSION=2025-04-01-preview

Advanced Configuration

Customize the tool descriptions and prompt guidance with these environment variables:

# Image Model
IMAGE_GEN_OAI_MODEL=gpt-image-1
 
# Image Generation Tool Descriptions
IMAGE_GEN_OAI_DESCRIPTION=...
IMAGE_GEN_OAI_PROMPT_DESCRIPTION=...
 
# Image Editing Tool Descriptions
IMAGE_EDIT_OAI_DESCRIPTION=...
IMAGE_EDIT_OAI_PROMPT_DESCRIPTION=...

Pricing

See the GPT-Image-1 pricing page and Image Generation documentation for image generation costs.

Gemini Image Tools

Gemini Image Tools integrate Google's latest image generation models, supporting both text-to-image generation and image context-aware editing.

  • Text-to-image generation: create high-quality images from detailed text descriptions.
  • Image context support: use existing images as context or inspiration for new generations.
  • Image editing: generate new images based on modifications to existing ones (include the original image ID).
  • Multiple models: choose gemini-2.5-flash-image (default) or gemini-3-pro-image-preview.
  • Dual API support: works with both simple Gemini API keys and Google Cloud Vertex AI.

Parameters

  • prompt: detailed text description of the desired image (required, up to 32,000 characters)
  • image_ids: optional array of image IDs to use as visual context for generation

Setup

For the Gemini API, get a key from Google AI Studio:

GEMINI_API_KEY=your_api_key_here

For Vertex AI (Google Cloud users with Vertex AI access):

GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
GOOGLE_CLOUD_LOCATION=us-central1  # optional, default: global

Model Selection

# Default model (fast and efficient)
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
 
# Higher quality model
GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview

Advanced Configuration

Customize tool descriptions via environment variables:

GEMINI_IMAGE_GEN_DESCRIPTION=...
GEMINI_IMAGE_GEN_PROMPT_DESCRIPTION=...
GEMINI_IMAGE_IDS_DESCRIPTION=...

More details are in the dedicated Gemini Image Gen guide.

DALL·E (legacy)

DALL·E provides legacy image generation using OpenAI's dall-e-3 image model.

Parameters

  • prompt: text description of the desired image (required, up to 4000 characters)
  • style: vivid (hyper-real, dramatic, default) or natural (less hyper-real)
  • quality: standard (default) or hd
  • size: 1024x1024 (default, square), 1792x1024 (wide), or 1024x1792 (tall)

Setup

# Required
DALLE_API_KEY=sk-...  # or DALLE3_API_KEY=sk-...
 
# Optional
DALLE_REVERSE_PROXY=https://...  # Alternative endpoint
DALLE3_BASEURL=https://...  # For Azure or custom endpoints
DALLE3_AZURE_API_VERSION=2023-12-01-preview  # For Azure deployments
DALLE3_SYSTEM_PROMPT=...  # Custom system prompt for DALL·E

Enable the DALL·E tool for the agent and start prompting.

Advanced Configuration

For Azure OpenAI deployments, configure the base URL and API version:

DALLE3_BASEURL=https://your-resource-name.openai.azure.com/openai/deployments/your-deployment-name
DALLE3_AZURE_API_VERSION=2023-12-01-preview
DALLE3_API_KEY=your-azure-api-key

Pricing

See the DALL-E pricing page and Image Generation documentation for image generation costs.

Stable Diffusion (local)

Run images entirely on your own machine or server. Point LibreChat at any Automatic1111 (or compatible) endpoint and you're set.

Parameters

  • prompt: detailed keywords describing desired elements in the image (required)
  • negative_prompt: keywords describing elements to exclude from the image (required)

The Stable Diffusion implementation uses these fixed default parameters, which produce good results for most use cases:

  • cfg_scale: 4.5
  • steps: 22
  • width: 1024
  • height: 1024

Setup

No API key is required, just the reachable URL of your Automatic1111 WebUI:

SD_WEBUI_URL=http://127.0.0.1:7860  # URL to your Automatic1111 WebUI

More details on setting up Automatic1111 are in the dedicated Stable Diffusion guide.

Flux

Cloud generator with an emphasis on speed and optional fine-tuned models.

  • Fast cloud-based image generation
  • Support for fine-tuned models
  • Multiple quality levels and aspect ratios
  • Raw mode for less processed, more natural-looking images

Parameters

The Flux tool supports three main actions:

  1. generate: create a new image from a text prompt
  2. generate_finetuned: create an image using a fine-tuned model
  3. list_finetunes: list available custom models for the user

More details are in the dedicated Flux guide.

Setup

Choose the Flux tool inside the agent. Prompts are plain text, and one call produces one image.

FLUX_API_KEY=flux_live_...
FLUX_API_BASE_URL=https://api.us1.bfl.ai   # default is fine for most users

Pricing

See the Flux pricing page for image generation costs.

Model Context Protocol (MCP)

Image outputs are supported from MCP servers. For example, the Puppeteer MCP Server can generate screenshots of web pages, which output the image in the expected format and are treated the same as LibreChat's built-in image tools.

MCP image support is still emerging

  • The examples below assume LibreChat runs outside of Docker, directly using Node.js. The Model Context Protocol is a relatively new framework, and many developers are still learning how to serve their systems with uv/node for scalable distribution.
  • Few image-generating servers exist, and many have yet to adopt the correct response format for images.
  • While many MCP servers function well within Docker, the following examples do not, or not without more advanced configurations, showing some of the current inconsistency between MCP servers.
mcpServers:
  puppeteer:
    command: npx
    args:
      - -y
      - '@modelcontextprotocol/server-puppeteer'

The following is an example of an Image Generation server that outputs images using the Replicate API, but returns URLs of the images, which doesn't conform to MCP's image response standard.

Global install required

For this particular server, install the @gongrzhe/image-gen-server package globally with npm install -g @gongrzhe/image-gen-server, then point to the package's compiled files as shown below.

mcpServers:
  image-gen:
    command: 'node'
    # First, install the package globally using npm:
    # `npm install -g @gongrzhe/image-gen-server`
    # Then, point to the location of the installed package,
    # which you can find by running `npm root -g`
    args:
      - '{REPLACE_WITH_NODE_MODULES_LOCATION}/@gongrzhe/image-gen-server/build/index.js'
      # Example with output from `npm root -g`:
      # - "/home/danny/.nvm/versions/node/v24.16.0/lib/node_modules/@gongrzhe/image-gen-server/build/index.js"
    env:
      # Do not hardcode the API token here, use the environment variable instead
      # The following will pick up the token from your .env file or environment
      REPLICATE_API_TOKEN: '${REPLICATE_API_TOKEN}'
      MODEL: 'google/imagen-3'

Image Storage and Handling

All generated images are:

  1. Saved according to the configured fileStrategy
  2. Displayed directly in the chat interface
  3. Sent to the LLM as part of the immediate chat context following generation

A few caveats apply to that last point:

  • This may cause issues with an LLM that does not support image inputs. An option to disable the behavior per agent is planned.
  • Outputs are sent to the LLM only upon generation, not on every message.
  • To include an image in later turns, attach it to the message from the side panel.
  • In short, the LLM gets vision context only from images attached to user messages, and from generations or edits immediately after they happen.

Proxy Support

All image generation tools support proxy configuration through the PROXY environment variable:

PROXY=http://proxy-url:port

When PROXY is unset, supported server-side clients honor HTTP_PROXY, HTTPS_PROXY, and NO_PROXY/no_proxy.

Error Handling

If a tool encounters an error, it returns a message explaining what went wrong. Common issues include:

  • Invalid API key
  • API unavailability
  • Content policy violations
  • Proxy/network issues
  • Invalid parameters
  • Unsupported image payload (see Image Storage and Handling above)

Prompting

You can customize the prompts for OpenAI Image Tools and DALL·E, but the following tips inform the default prompts the tools supply, which is useful to know for your own writing:

  1. Start with the subject and style (photo, oil painting, etc.).
  2. Add composition and camera/medium ("wide-angle shot of…", "watercolour…").
  3. Mention lighting and mood ("golden hour", "dramatic shadows").
  4. Finish with detail keywords (textures, colours, expressions).
  5. Keep negatives positive: describe what should be included, not what to avoid.

Example:

A cinematic photo of an antique library bathed in warm afternoon sunlight. Tall wooden shelves overflow with leather-bound books, and dust particles shimmer in the light. A single green-shaded banker's lamp illuminates an open atlas on a polished mahogany desk in the foreground. 85 mm lens, shallow depth of field, rich amber tones, ultra-high detail.

How is this guide?