AI Image Model Capabilities — OpenRouter

Last updated: 2026-03-17 Source: OpenRouter model pages and documentation Purpose: Guide for selecting the right model per creative task, particularly for Clarity Diamonds ad production


Key Finding: Text & Logo in Images

All 8 models now support text rendering in images to varying degrees of quality. Several models also accept image inputs (logos, reference photos, brand assets) — meaning you can pass the Clarity Diamonds logo as a reference image and have the model incorporate it naturally into the generated ad creative.


Model Comparison Table

ModelText RenderingImage Input (logo/refs)Best ForSpeedPrice Tier
GPT-5 ImageExcellentYesComplex reasoning, detailed editing, text-heavy adsMediumHigh
GPT-5 Image MiniExcellentYesSame as above, faster and cheaperFastLow-Mid
Gemini 3 Pro ImageIndustry-leading — including long passages and multilingualYesProfessional creative, multi-subject, brand identity preservationMediumHigh
Gemini 3.1 Flash ImageGoodYesFast iteration, professional outputFastestLow
Gemini 2.5 Flash ImageGoodYesCost-sensitive bulk generationFastLowest
Seedream 4.5Improved (esp. small text)YesPortrait/lifestyle, colour/lighting preservationMediumVery Low (flat $0.04/image)
Riverflow V2 FastGood + custom font inputsYes (URLs preferred)Production speed, custom typographyFastestMid
FLUX.2 FlexExcellent — complex typographyYesTypography-heavy creative, fine detailMediumMid-High

Detailed Model Profiles

GPT-5 Image

  • Text: Superior instruction following and text rendering. Handles detailed copy, pricing, CTAs reliably.
  • Image input: Yes — accepts logo files and reference images. Can incorporate brand assets naturally.
  • Sizes: 10 standard + 4 extended aspect ratios; 1K, 2K, 4K resolution
  • Best for: Ads with copy baked in, logo placement, complex compositions
  • Pricing: 40/M image output
  • Context: 400K tokens

GPT-5 Image Mini

  • Text: Same quality as GPT-5 Image at 4× lower cost
  • Image input: Yes — same capabilities as full GPT-5 Image
  • Sizes: Same as GPT-5 Image
  • Best for: Most production work — same quality, better economics
  • Pricing: 8/M image output
  • Context: 400K tokens

Gemini 3 Pro Image (Nano Banana Pro)

  • Text: Industry-leading — best in class for long text, multilingual, detailed layout
  • Image input: Yes — multimodal reasoning, identity preservation for up to 5 subjects. Ideal for passing logo + product shot and asking it to compose a complete ad.
  • Sizes: 2K/4K, flexible aspect ratios
  • Best for: Final production ads needing precise text and logo integration, consistent brand identity
  • Pricing: 12/M output
  • Context: 65K tokens

Gemini 3.1 Flash Image (Nano Banana 2)

  • Text: Good — handles single headlines and short copy well
  • Image input: Yes — accepts images and text
  • Sizes: 0.5K to 4K; customisable aspect ratios via image_config
  • Best for: Fast iteration and testing, professional output at low cost
  • Pricing: 3/M output, $60/M image output
  • Released: February 2026 (newest Gemini image model)

Gemini 2.5 Flash Image (Nano Banana)

  • Text: Good — standard text rendering
  • Image input: Yes
  • Sizes: Customisable aspect ratios
  • Best for: High-volume or budget-sensitive generation
  • Pricing: 2.50/M output — cheapest option
  • Context: 32K tokens

Seedream 4.5

  • Text: Improved — particularly good at small text rendering (improved over v4.0)
  • Image input: Yes — editing consistency, preserves subject details, lighting, colour tone
  • Sizes: Variable
  • Best for: Lifestyle/portrait imagery, colour-accurate product editing, preserving brand identity across variations
  • Pricing: Flat $0.04 per output image — simplest pricing, great for volume
  • Context: 4K tokens

Riverflow V2 Fast

  • Text: Good — integrated reasoning for text accuracy. Supports custom font inputs ($0.03 each, max 2 fonts) — you can specify Inter or Montserrat exactly
  • Image input: Yes — recommends image URLs rather than base64. Also supports super-resolution references ($0.20 each, max 4) to enhance specific elements.
  • Sizes: 1K and 2K — no 4K support
  • Best for: Production-speed generation with brand-specific typography
  • Pricing: 0.04/image (2K)
  • Limitation: 4.5MB request size limit; no 4K
  • Released: February 2026

FLUX.2 Flex

  • Text: Excellent — best for complex typography and fine detail rendering
  • Image input: Yes — multi-reference editing in a unified architecture (pass multiple reference images in one request)
  • Sizes: Flexible aspect ratios, megapixel-based pricing
  • Best for: Typography-driven creatives, ads where the headline IS the visual, multi-reference compositions
  • Pricing: $0.06/megapixel (input + output combined)
  • Note: Does not use submissions for model training. Retains prompts for 30 days only.

Recommendations for Clarity Diamonds Ad Production

For ads with copy baked in (headlines, pricing, CTA)

Best: Gemini 3 Pro Image — industry-leading text, accepts logo as input Good: FLUX.2 Flex — excellent typography Budget: GPT-5 Image Mini — reliable text at lower cost

For passing the Clarity logo as a reference

All 8 models accept image inputs. The recommended workflow:

  1. Pass clarity_logo.png as an image input in the message
  2. Describe placement: “include the Clarity Diamonds logo in the bottom-right corner”
  3. Gemini 3 Pro handles this most reliably (identity preservation up to 5 subjects)

For lifestyle/warmth (Ad 4A, ring on hand)

Best: Seedream 4.5 — excellent portrait refinement, colour warmth

For fast iteration and A/B testing

Best: Gemini 3.1 Flash Image — fastest, still professional quality

Best: Seedream 4.5 — flat $0.04/image, good consistency across variations

For custom brand typography (Inter/Montserrat)

Best: Riverflow V2 Fast — only model supporting custom font file inputs


How to Pass a Logo/Reference Image via OpenRouter API

curl -s https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3-pro-image-preview",
    "messages": [{
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,<BASE64_OF_LOGO>"
          }
        },
        {
          "type": "text",
          "text": "Create a luxury jewellery advertisement. Use the provided logo in the bottom-right corner of the image. [REST OF PROMPT]"
        }
      ]
    }],
    "modalities": ["image"]
  }'

To encode the logo: base64 -i clarity_logo.png | tr -d '\n'


Reference: https://openrouter.ai/docs/guides/overview/multimodal/image-generation