Inference Pricing

BETA

We're still working on our pricing scraper. Please double check prices before making any decisions on this information while we're still in beta.

Example: AI Assistant Chat (10 msg thread)×(users)

Total tokens: 0
Sample Conversation
Input: 0 tokens × 10
Output: 0 tokens × 10
User: I need to build a simple website for my small business. What's the best approach? User: I sell handmade leather goods like wallets, belts, and ...
5 user messages + 5 AI responses
Input Tokens: 0 × 10 = 0
User messages are typically charged at a lower rate
Output Tokens: 0 × 10 = 0
AI responses are typically charged at a higher rate

Jump to Category:

Vendor Filters

Top-tier General-purpose

The most advanced and capable models with strong performance across all tasks. These models excel at complex reasoning, content generation, and understanding nuanced instructions.

Example Use Case:

Building enterprise-grade AI assistants with enhanced reasoning, contextual understanding, and sophisticated task handling capabilities.

The most advanced and capable models with strong performance across all tasks. These models excel at complex reasoning, content generation, and understanding nuanced instructions.

Example Use Case:

Building enterprise-grade AI assistants with enhanced reasoning, contextual understanding, and sophisticated task handling capabilities.

Models

Claude 3.7 Sonnet

AnthropicTop-tier General-purpose
Parameters:142B
Input ($/1M):$3.000
Output ($/1M):$15.000
Example Cost:
$0.003
In: $0.000 | Out: $0.003

GPT 4.5

OpenAITop-tier General-purpose
Parameters:200B
Input ($/1M):$75.000
Output ($/1M):$150.000
Example Cost:
$0.036
In: $0.011 | Out: $0.025

Pricing shown per 1M tokens. Example costs are estimates only and may vary based on actual tokenization.

Last updated: 7/8/2025

High-performance

Powerful models offering excellent capabilities with better price-performance ratio than top-tier options. These models handle most complex tasks efficiently.

Example Use Case:

Developing production applications requiring strong reasoning and generation abilities while managing costs, such as automated content creation platforms.

Powerful models offering excellent capabilities with better price-performance ratio than top-tier options. These models handle most complex tasks efficiently.

Example Use Case:

Developing production applications requiring strong reasoning and generation abilities while managing costs, such as automated content creation platforms.

Models

Llama 3.1 70B

Inference.netHigh-performance
Parameters:70B
Input ($/1M):$0.400
Output ($/1M):$0.400
Example Cost:
$0.000
In: $0.000 | Out: $0.000

DeepSeek R1 Distill Llama 70B (FP8)

Inference.netHigh-performance
Parameters:70B
Input ($/1M):$0.400
Output ($/1M):$0.400
Example Cost:
$0.000
In: $0.000 | Out: $0.000

Llama 3.3 70B

Inference.netHigh-performance
Parameters:70B
Input ($/1M):$0.400
Output ($/1M):$0.400
Example Cost:
$0.000
In: $0.000 | Out: $0.000

Llama 3 70B

GroqHigh-performance
Parameters:70B
Input ($/1M):$0.590
Output ($/1M):$0.790
Example Cost:
$0.000
In: $0.000 | Out: $0.000

Qwen2-72B

Together AIHigh-performance
Parameters:72B
Input ($/1M):$0.900
Output ($/1M):$0.900
Example Cost:
$0.000
In: $0.000 | Out: $0.000

Llama 3 70B

Together AIHigh-performance
Parameters:70B
Input ($/1M):$0.900
Output ($/1M):$0.900
Example Cost:
$0.000
In: $0.000 | Out: $0.000

DeepSeek V3 (FP8)

Inference.netHigh-performance
Parameters:127B
Input ($/1M):$1.200
Output ($/1M):$1.200
Example Cost:
$0.000
In: $0.000 | Out: $0.000

GPT-4o

OpenAIHigh-performance
Parameters:1.5T
Input ($/1M):$2.500
Output ($/1M):$10.000
Example Cost:
$0.002
In: $0.000 | Out: $0.002

Pricing shown per 1M tokens. Example costs are estimates only and may vary based on actual tokenization.

Last updated: 7/8/2025

Mid-range

Balanced models offering good capabilities at moderate costs. These models handle common tasks well and are suitable for most general applications.

Example Use Case:

Creating customer support chatbots and knowledge retrieval systems where cost efficiency and good performance are both important.

Balanced models offering good capabilities at moderate costs. These models handle common tasks well and are suitable for most general applications.

Example Use Case:

Creating customer support chatbots and knowledge retrieval systems where cost efficiency and good performance are both important.

Models

Llama 3.2 11B Vision Instruct (FP16)

Inference.netMid-range
Parameters:11B
Input ($/1M):$0.055
Output ($/1M):$0.055
Example Cost:
$0.000
In: $0.000 | Out: $0.000

Mistral Nemo 12B Instruct (FP8)

Inference.netMid-range
Parameters:12B
Input ($/1M):$0.100
Output ($/1M):$0.100
Example Cost:
$0.000
In: $0.000 | Out: $0.000

ChatGPT 3o-mini

OpenAIMid-range
Parameters:25B
Input ($/1M):$0.150
Output ($/1M):$0.600
Example Cost:
$0.000
In: $0.000 | Out: $0.000

Gemini 2.0 Flash

GoogleMid-range
Parameters:100B
Input ($/1M):$0.150
Output ($/1M):$0.600
Example Cost:
$0.000
In: $0.000 | Out: $0.000

Claude 3.5 Haiku

AnthropicMid-range
Parameters:20B
Input ($/1M):$0.800
Output ($/1M):$4.000
Example Cost:
$0.001
In: $0.000 | Out: $0.001

Pricing shown per 1M tokens. Example costs are estimates only and may vary based on actual tokenization.

Last updated: 7/8/2025

Cost-effective

Budget-friendly models optimized for efficiency and lower costs. These models perform well on straightforward tasks while minimizing token usage expenses.

Example Use Case:

Building high-volume applications like content tagging, classification systems, or user intent recognition where scale matters.

Budget-friendly models optimized for efficiency and lower costs. These models perform well on straightforward tasks while minimizing token usage expenses.

Example Use Case:

Building high-volume applications like content tagging, classification systems, or user intent recognition where scale matters.

Models

Llama 3.2 1B Instruct (FP16)

Inference.netCost-effective
Parameters:1B
Input ($/1M):$0.010
Output ($/1M):$0.010
Example Cost:
$0.000
In: $0.000 | Out: $0.000

Llama 3.2 3B Instruct

Inference.netCost-effective
Parameters:3B
Input ($/1M):$0.020
Output ($/1M):$0.020
Example Cost:
$0.000
In: $0.000 | Out: $0.000

Llama 3.1 8B Instruct (FP8)

Inference.netCost-effective
Parameters:8B
Input ($/1M):$0.025
Output ($/1M):$0.025
Example Cost:
$0.000
In: $0.000 | Out: $0.000

Llama 3.1 8B Instruct (FP16)

Inference.netCost-effective
Parameters:8B
Input ($/1M):$0.030
Output ($/1M):$0.030
Example Cost:
$0.000
In: $0.000 | Out: $0.000

GPT-4o Mini

OpenAICost-effective
Parameters:250B
Input ($/1M):$0.150
Output ($/1M):$0.600
Example Cost:
$0.000
In: $0.000 | Out: $0.000

Llama 3.1 7B

Together AICost-effective
Parameters:7B
Input ($/1M):$0.200
Output ($/1M):$0.200
Example Cost:
$0.000
In: $0.000 | Out: $0.000

Pricing shown per 1M tokens. Example costs are estimates only and may vary based on actual tokenization.

Last updated: 7/8/2025

Reasoning-focused

Models specialized in logical reasoning, problem-solving, and analytical tasks. These models excel at tasks requiring step-by-step thinking and mathematical operations.

Example Use Case:

Implementing code generation tools, math problem solvers, or data analysis assistants requiring precise logical reasoning.

Models specialized in logical reasoning, problem-solving, and analytical tasks. These models excel at tasks requiring step-by-step thinking and mathematical operations.

Example Use Case:

Implementing code generation tools, math problem solvers, or data analysis assistants requiring precise logical reasoning.

Models

Qwen 2.5 72B Instruct (FP8)

Inference.netReasoning-focused
Parameters:72B
Input ($/1M):$0.350
Output ($/1M):$0.350
Example Cost:
$0.000
In: $0.000 | Out: $0.000

DeepSeek R1 (FP8)

Inference.netReasoning-focused
Parameters:236B
Input ($/1M):$3.000
Output ($/1M):$3.000
Example Cost:
$0.001
In: $0.000 | Out: $0.001

DeepSeek R1

Together AIReasoning-focused
Parameters:172B
Input ($/1M):$3.000
Output ($/1M):$7.000
Example Cost:
$0.002
In: $0.000 | Out: $0.001

Claude 3.7 Sonnet Thinking

AnthropicReasoning-focused
Parameters:142B
Input ($/1M):$8.000
Output ($/1M):$24.000
Example Cost:
$0.005
In: $0.001 | Out: $0.004

Pricing shown per 1M tokens. Example costs are estimates only and may vary based on actual tokenization.

Last updated: 7/8/2025

Download all available models and pricing data instantly.

Navigate the complex landscape of AI models with ease. This tool helps developers find the right model for their projects by comparing pricing across providers. We are also working on improving our model categorizations. If you have suggestions for how we can improve our grouping or tags/categories, please create an issue in our GitHub repo.