OpenAI API Pricing Guide 2026: GPT-4o, o3, and Beyond

OpenAI’s API pricing has evolved significantly in 2026. With models ranging from the cost-efficient GPT-4o mini to the powerful reasoning o3, understanding which model to use for which task can mean the difference between a $50/month API bill and a $5,000 one.

This guide breaks down every OpenAI API pricing tier, what you get for your money, and how to optimize costs for production applications.

Quick Reference: OpenAI Pricing 2026

Text Generation (Chat Completions)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window
GPT-4o	$2.50	$10.00	128K
GPT-4o mini	$0.15	$0.60	128K
o3	$10.00	$40.00	200K
o3-mini	$1.10	$4.40	200K
GPT-3.5 Turbo	$0.50	$1.50	16K

Embeddings

Model	Price (per 1M tokens)	Dimensions
text-embedding-3-small	$0.02	1536
text-embedding-3-large	$0.13	3072
ada-002 (legacy)	$0.10	1536

Image Generation (DALL-E 3)

Resolution	Quality	Price
1024×1024	Standard	$0.040/image
1024×1024	HD	$0.080/image
1792×1024	Standard	$0.080/image
1792×1024	HD	$0.120/image

Audio

Service	Pricing
Whisper (speech-to-text)	$0.006/minute
TTS Standard voices	$15.00/1M characters
TTS HD voices	$30.00/1M characters

Fine-Tuning

Model	Training	Usage Input	Usage Output
GPT-4o mini fine-tune	$3.00/1M tokens	$0.30/1M	$1.20/1M
GPT-3.5 Turbo fine-tune	$8.00/1M tokens	$3.00/1M	$6.00/1M

Understanding Tokens

A token is roughly 4 characters or 0.75 words in English. Practical token counts:

Short message (“Hello, how are you?”): ~5 tokens
Typical user query (50 words): ~65 tokens
System prompt (200 words): ~260 tokens
Full response (500 words): ~650 tokens
Complete transaction (prompt + response): ~1,000–2,000 tokens typical

Calculation example: 10,000 API calls with 1,500 tokens each = 15,000,000 tokens = 15M tokens

At GPT-4o mini ($0.15/1M input + $0.60/1M output, assume 50/50 split):

Cost: 7.5M × $0.15 + 7.5M × $0.60 = $1.125 + $4.50 = ~$5.63

The same with GPT-4o: ~$93.75. The 16× price difference between models is significant at scale.

Which Model Should You Use?

GPT-4o mini: The Default Choice

For the vast majority of applications, GPT-4o mini should be your first choice. It delivers approximately 80% of GPT-4o’s capability at 6% of the cost.

Use GPT-4o mini for:

Customer service chatbots
Text classification and extraction
Content summarization
Code completions and explanations
Q&A systems
Data transformation

GPT-4o: When Quality Matters

Use GPT-4o when:

Complex multi-step reasoning is required
Code generation for non-trivial problems
Long-form creative content
Tasks where accuracy is more important than cost
Medical, legal, or technical domains requiring higher accuracy

o3 and o3-mini: Reasoning Tasks

The o3 model family is designed for tasks requiring step-by-step reasoning:

Mathematical problem solving
Complex coding challenges
Scientific analysis
Multi-hop reasoning over documents
Competition-level problems

o3-mini is the cost-effective version for reasoning tasks. o3 is the full model for maximum reasoning capability.

When to use o3-mini over GPT-4o: When your task explicitly benefits from chain-of-thought reasoning (math, code, logic). For most generation tasks, GPT-4o or GPT-4o mini are better value.

Cost Optimization Strategies

1. Cache Repeated Prompts

If you have a large system prompt (instructions, context) that doesn’t change between calls, OpenAI’s Prompt Caching reduces the cost of cached tokens by 50%:

Cache writes: Full price
Cache reads: 50% discount on input tokens

For applications with a large static system prompt, caching can save 30–40% of input token costs.

# Same system prompt hits the cache on repeated calls
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": very_long_system_prompt},  # Cached after first call
        {"role": "user", "content": user_query}
    ]
)

2. Use Streaming for Better UX, Not Cost

Streaming doesn’t reduce token costs — it just makes responses feel faster. Use it for user-facing interfaces. For batch processing, skip streaming.

3. Truncate Context Aggressively

Every token in your context window costs money. Common mistakes:

Including entire conversation history when only the last 3–5 turns matter
Injecting full documents when only summaries are needed
Verbose system prompts

Review your context window usage and trim. Reducing context by 40% reduces costs by 40%.

4. Use GPT-4o mini for Classification, GPT-4o for Generation

A common architecture: use cheap models for classification/routing, expensive models only when needed.

# Classify intent cheaply
intent = classify_intent(user_message, model="gpt-4o-mini")  # $0.15/1M

# Only use expensive model for complex tasks
if intent == "complex_analysis":
    response = generate_analysis(user_message, model="gpt-4o")  # $2.50/1M
else:
    response = generate_response(user_message, model="gpt-4o-mini")

5. Batch API for Offline Workloads

OpenAI’s Batch API provides 50% cost reduction for non-real-time workloads:

Asynchronous processing (24-hour turnaround)
Same models, half the price
Perfect for data pipelines, content generation, and classification jobs

6. Embeddings: Always Use text-embedding-3-small

text-embedding-3-small at $0.02/1M tokens is 5× cheaper than text-embedding-3-large and performs comparably on most tasks. Only use text-embedding-3-large if benchmarks show meaningful quality improvement for your specific use case.

Real-World Cost Examples

Customer Service Chatbot (100K conversations/month)

Average: 500 tokens per conversation (250 input + 250 output)
Total: 50M tokens
GPT-4o mini cost: ~$11.25/month
GPT-4o cost: ~$187.50/month

RAG Application (10K queries/day)

Context: 2,000 tokens per query (retrieved documents)
Response: 500 tokens
Total monthly: 750M tokens
GPT-4o mini: ~$206.25/month
GPT-4o: ~$3,375/month

Code Review Tool (1K reviews/day)

Average prompt: 3,000 tokens (code + instructions)
Average response: 1,000 tokens
Monthly: 120M tokens
GPT-4o (better for code): ~$1,300/month
GPT-4o mini: ~$90/month (may be acceptable for simpler reviews)

Setting Up Cost Monitoring

Always implement cost monitoring before going to production:

Set usage limits in OpenAI dashboard: Monthly hard cap prevents runaway bills
Track tokens per request: Log response.usage.total_tokens for every call
Set up billing alerts: Email alerts at 50% and 90% of your monthly budget
Use LangSmith or custom logging: Track cost per feature, per user, per day

response = client.chat.completions.create(...)
print(f"Cost: ${response.usage.total_tokens * 0.00000060:.6f}")  # GPT-4o mini output cost

Free Credits for New Accounts

New OpenAI API accounts receive $5 in free credits. At GPT-4o mini pricing, $5 provides:

~33 million input tokens
~8.3 million output tokens
Thousands of API calls for typical use cases

Enough to build and test a complete application before paying anything.

Get started with the OpenAI API →

For building applications with OpenAI, pair with LangChain for orchestration and Pinecone or Supabase for vector storage in RAG pipelines.

OpenAI API Pricing Guide 2026: GPT-4o, o3, and Beyond