Calculate and compare the cost of using the latest OpenAI, Azure, Anthropic, Llama 3, Google Gemini, Mistral, and Cohere APIs with our powerful FREE pricing calculator. Updated continuously as models & pricing changes.
AI LLM Model Pricing Overview (Free to Use!)
Key Players: OpenAI, Anthropic, Google, Cohere, Mistral, and Meta
These providers offer diverse models tailored for specific tasks and capabilities. Understanding their pricing structure is crucial for businesses and developers.
Tokens: The Pricing Unit
LLM pricing is based on "tokens." Typically, 1,000 tokens equal about 750 words. In English, one token is approximately four characters. For example, "This paragraph is 5 tokens" is 5 tokens long. Different languages may affect token counts.
Context Length
Context length refers to the number of tokens a model can process at once, impacting performance and cost. For instance, a model with an 8K token context can handle 8,000 tokens in a single pass. Longer contexts allow for more complex tasks and continuous conversations but are costlier.
Language Models and Pricing
1. OpenAI:
- GPT-4o: Advanced multimodal model, 2x faster and 50% cheaper than GPT-4 Turbo, with 128K context and upcoming audio support.
- GPT-4: High general knowledge and precision, slower and more expensive. GPT-4 Turbo is 3x cheaper with 128K context.
- GPT-3.5 Turbo: Optimized for dialogue, fast, and cost-effective.
2. Anthropic:
- Claude 3: Includes Haiku, Sonnet, and Opus models. Opus matches GPT-4, while Haiku is cost-effective and outperforms GPT-3.5 Turbo. Sonnet balances cost and performance. Claude 3 has a 200K context window.
3. Meta:
- Llama 3: Open-source, similar to GPT-3.5 Turbo, free for research and commercial use, but primarily English.
4. Google:
- Gemini: Successor to PaLM 2, with Ultra, Pro, and Flash models. Gemini Ultra rivals GPT-4, Pro matches GPT-3.5, and Flash offers 1M context with multimodal support.
5. Cohere, Mistral, and Other Models:
- Mistral: Small, fast, and cheap open-source models like Mistral 7B and Mixtral 8x7B, comparable to GPT-3.5 Turbo. Mistral Large is near GPT-4 level for reasoning tasks.
Customization and Embedding Models
OpenAI offers fine-tuning for custom models, billed per token used. Embedding models support advanced search, clustering, and classification tasks, essential for applications like AI support chatbots using Retrieval Augmented Generation (RAG).
This is such a useful tool for developers working with various LLM APIs. Coincidentally, I was wondering about this same question earlier today.
How often do you update the pricing information? Also, are there plans to add more models in the future?
Thanks for providing such a helpful resource, Clara!
@charlestehio The list is per LLM model, not provider right now. I have just listed prices for the cheapest production providers for the open models for now, Groq is more expensive (though fast!).
LLM API Pricing Calculator