AI Image Model Capabilities — OpenRouter

Last updated: 2026-03-17 Source: OpenRouter model pages and documentation Purpose: Guide for selecting the right model per creative task, particularly for Clarity Diamonds ad production

Key Finding: Text & Logo in Images

All 8 models now support text rendering in images to varying degrees of quality. Several models also accept image inputs (logos, reference photos, brand assets) — meaning you can pass the Clarity Diamonds logo as a reference image and have the model incorporate it naturally into the generated ad creative.

Model Comparison Table

Model	Text Rendering	Image Input (logo/refs)	Best For	Speed	Price Tier
GPT-5 Image	Excellent	Yes	Complex reasoning, detailed editing, text-heavy ads	Medium	High
GPT-5 Image Mini	Excellent	Yes	Same as above, faster and cheaper	Fast	Low-Mid
Gemini 3 Pro Image	Industry-leading — including long passages and multilingual	Yes	Professional creative, multi-subject, brand identity preservation	Medium	High
Gemini 3.1 Flash Image	Good	Yes	Fast iteration, professional output	Fastest	Low
Gemini 2.5 Flash Image	Good	Yes	Cost-sensitive bulk generation	Fast	Lowest
Seedream 4.5	Improved (esp. small text)	Yes	Portrait/lifestyle, colour/lighting preservation	Medium	Very Low (flat $0.04/image)
Riverflow V2 Fast	Good + custom font inputs	Yes (URLs preferred)	Production speed, custom typography	Fastest	Mid
FLUX.2 Flex	Excellent — complex typography	Yes	Typography-heavy creative, fine detail	Medium	Mid-High

Detailed Model Profiles

GPT-5 Image

Text: Superior instruction following and text rendering. Handles detailed copy, pricing, CTAs reliably.
Image input: Yes — accepts logo files and reference images. Can incorporate brand assets naturally.
Sizes: 10 standard + 4 extended aspect ratios; 1K, 2K, 4K resolution
Best for: Ads with copy baked in, logo placement, complex compositions
Pricing: $10/ M in p u t,$ 40/M image output
Context: 400K tokens

GPT-5 Image Mini

Text: Same quality as GPT-5 Image at 4× lower cost
Image input: Yes — same capabilities as full GPT-5 Image
Sizes: Same as GPT-5 Image
Best for: Most production work — same quality, better economics
Pricing: $2.50/ M in p u t,$ 8/M image output
Context: 400K tokens

Gemini 3 Pro Image (Nano Banana Pro)

Text: Industry-leading — best in class for long text, multilingual, detailed layout
Image input: Yes — multimodal reasoning, identity preservation for up to 5 subjects. Ideal for passing logo + product shot and asking it to compose a complete ad.
Sizes: 2K/4K, flexible aspect ratios
Best for: Final production ads needing precise text and logo integration, consistent brand identity
Pricing: $2/ M in p u t,$ 12/M output
Context: 65K tokens

Gemini 3.1 Flash Image (Nano Banana 2)

Text: Good — handles single headlines and short copy well
Image input: Yes — accepts images and text
Sizes: 0.5K to 4K; customisable aspect ratios via image_config
Best for: Fast iteration and testing, professional output at low cost
Pricing: $0.50/ M in p u t,$ 3/M output, $60/M image output
Released: February 2026 (newest Gemini image model)

Gemini 2.5 Flash Image (Nano Banana)

Text: Good — standard text rendering
Image input: Yes
Sizes: Customisable aspect ratios
Best for: High-volume or budget-sensitive generation
Pricing: $0.30/ M in p u t,$ 2.50/M output — cheapest option
Context: 32K tokens

Seedream 4.5

Text: Improved — particularly good at small text rendering (improved over v4.0)
Image input: Yes — editing consistency, preserves subject details, lighting, colour tone
Sizes: Variable
Best for: Lifestyle/portrait imagery, colour-accurate product editing, preserving brand identity across variations
Pricing: Flat $0.04 per output image — simplest pricing, great for volume
Context: 4K tokens

Riverflow V2 Fast

Text: Good — integrated reasoning for text accuracy. Supports custom font inputs ($0.03 each, max 2 fonts) — you can specify Inter or Montserrat exactly
Image input: Yes — recommends image URLs rather than base64. Also supports super-resolution references ($0.20 each, max 4) to enhance specific elements.
Sizes: 1K and 2K — no 4K support
Best for: Production-speed generation with brand-specific typography
Pricing: $0.02/ ima g e (1 K),$ 0.04/image (2K)
Limitation: 4.5MB request size limit; no 4K
Released: February 2026

FLUX.2 Flex

Text: Excellent — best for complex typography and fine detail rendering
Image input: Yes — multi-reference editing in a unified architecture (pass multiple reference images in one request)
Sizes: Flexible aspect ratios, megapixel-based pricing
Best for: Typography-driven creatives, ads where the headline IS the visual, multi-reference compositions
Pricing: $0.06/megapixel (input + output combined)
Note: Does not use submissions for model training. Retains prompts for 30 days only.

Recommendations for Clarity Diamonds Ad Production

For ads with copy baked in (headlines, pricing, CTA)

Best: Gemini 3 Pro Image — industry-leading text, accepts logo as input Good: FLUX.2 Flex — excellent typography Budget: GPT-5 Image Mini — reliable text at lower cost

For passing the Clarity logo as a reference

All 8 models accept image inputs. The recommended workflow:

Pass clarity_logo.png as an image input in the message
Describe placement: “include the Clarity Diamonds logo in the bottom-right corner”
Gemini 3 Pro handles this most reliably (identity preservation up to 5 subjects)

For lifestyle/warmth (Ad 4A, ring on hand)

Best: Seedream 4.5 — excellent portrait refinement, colour warmth

For fast iteration and A/B testing

Best: Gemini 3.1 Flash Image — fastest, still professional quality

For carousel card production (volume)

Best: Seedream 4.5 — flat $0.04/image, good consistency across variations

For custom brand typography (Inter/Montserrat)

Best: Riverflow V2 Fast — only model supporting custom font file inputs

How to Pass a Logo/Reference Image via OpenRouter API

curl -s https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3-pro-image-preview",
    "messages": [{
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,<BASE64_OF_LOGO>"
          }
        },
        {
          "type": "text",
          "text": "Create a luxury jewellery advertisement. Use the provided logo in the bottom-right corner of the image. [REST OF PROMPT]"
        }
      ]
    }],
    "modalities": ["image"]
  }'

To encode the logo: base64 -i clarity_logo.png | tr -d '\n'

Reference: https://openrouter.ai/docs/guides/overview/multimodal/image-generation

AbSim

Explorer

AI Image Model Capabilities — OpenRouter

AI Image Model Capabilities — OpenRouter

Key Finding: Text & Logo in Images

Model Comparison Table

Detailed Model Profiles

GPT-5 Image

GPT-5 Image Mini

Gemini 3 Pro Image (Nano Banana Pro)

Gemini 3.1 Flash Image (Nano Banana 2)

Gemini 2.5 Flash Image (Nano Banana)

Seedream 4.5

Riverflow V2 Fast

FLUX.2 Flex

Recommendations for Clarity Diamonds Ad Production

For ads with copy baked in (headlines, pricing, CTA)

For passing the Clarity logo as a reference

For lifestyle/warmth (Ad 4A, ring on hand)

For fast iteration and A/B testing

For carousel card production (volume)

For custom brand typography (Inter/Montserrat)

How to Pass a Logo/Reference Image via OpenRouter API

Graph View

Table of Contents

Backlinks