Models Directory

Browse all 51 available language models and their capabilities

Free Models21

These models are available to all users without any subscription or pay-as-you-go charges.

mistralai/mistral-small-3.2-24b-instruct

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on...

Context: 256000 tokens

Max output: 16384 tokens

z-ai/glm-4-32b

baidu/ernie-4.5-21b-a3b

mistralai/ministral-3b-2512

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

Context: 131072 tokens

Max output: N/A tokens

mistralai/ministral-8b-2512

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

Context: 262144 tokens

Max output: N/A tokens

ibm-granite/granite-4.0-h-micro

Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by IBM. They are fine-tuned for long...

Context: 131000 tokens

Max output: 131000 tokens

nousresearch/hermes-4-70b

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either...

Context: 131072 tokens

Max output: N/A tokens

openai/gpt-5-nano

GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger...

Context: 400000 tokens

Max output: 128000 tokens

openai/gpt-oss-20b

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

Context: 131072 tokens

Max output: 131072 tokens

google/gemini-2.5-flash-lite

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...

Context: 1048576 tokens

Max output: 65535 tokens

meta-llama/llama-4-scout

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input...

Context: 1310720 tokens

Max output: 16384 tokens

nvidia/nemotron-nano-9b-v2

qwen/qwen3-30b-a3b

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique...

Context: 131072 tokens

Max output: 16384 tokens

qwen/qwen3-8b

Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between "thinking" mode for math,...

Context: 131072 tokens

Max output: 8192 tokens

qwen/qwen3-14b

Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...

Context: 131072 tokens

Max output: 8192 tokens

qwen/qwen3-32b

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...

Context: 131072 tokens

Max output: 16384 tokens

google/gemma-3-4b-it

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

Context: 131072 tokens

Max output: 16384 tokens

google/gemma-3-12b-it

Context: 131072 tokens

Max output: 16384 tokens

essentialai/rnj-1-instruct

x-ai/grok-4-fast

x-ai/grok-4.1-fast

Pro Models18

These models are available to Pro subscribers with unlimited usage included in the subscription.

z-ai/glm-4.5-air

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter...

Context: 131072 tokens

Max output: 98304 tokens

mistralai/mistral-small-creative

mistralai/ministral-14b-2512

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language...

Context: 262144 tokens

Max output: N/A tokens

minimax/minimax-m2

MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning,...

Context: 204800 tokens

Max output: 131072 tokens

minimax/minimax-m2.1

MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world...

Context: 204800 tokens

Max output: 131072 tokens

openai/gpt-oss-120b

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...

Context: 131072 tokens

Max output: 131072 tokens

google/gemma-3-27b-it

Context: 262144 tokens

Max output: 131072 tokens

meta-llama/llama-4-maverick

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward...

Context: 1048576 tokens

Max output: 16384 tokens

meta-llama/llama-3.3-70b-instruct

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...

Context: 131072 tokens

Max output: 128000 tokens

deepseek/deepseek-chat-v3-0324

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well...

Context: 163840 tokens

Max output: 65536 tokens

deepseek/deepseek-chat-v3.1

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context...

Context: 163840 tokens

Max output: 32768 tokens

deepseek/deepseek-v3.2

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism...

Context: 163840 tokens

Max output: 65536 tokens

minimax/minimax-m2-her

MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. Designed to stay consistent in tone and personality, it supports rich message...

Context: 65536 tokens

Max output: 2048 tokens

nvidia/llama-3.3-nemotron-super-49b-v1.5

nousresearch/hermes-4-405b

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with...

Context: 131072 tokens

Max output: N/A tokens

qwen/qwen3-next-80b-a3b-instruct

Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual...

Context: 262144 tokens

Max output: 262144 tokens

qwen/qwen3-235b-a22b-2507

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following,...

Context: 262144 tokens

Max output: 16384 tokens

qwen/qwen3-235b-a22b

Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and...

Context: 131072 tokens

Max output: 8192 tokens

Pro Metered Models12

These premium models are available on a pay-as-you-go basis with per-token pricing.

openai/gpt-5.1-chat

Input: $0.00000125 per token

Output: $0.00001 per token

GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on...

Context: 128000 tokens

Max output: 32000 tokens

✗ Unmoderated

z-ai/glm-4.6

Input: $0.0000005 per token

Output: $0.000002 per token

Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex...

Context: 204800 tokens

Max output: 131072 tokens

✗ Unmoderated

mistralai/mistral-large-2512

Input: $0.0000005 per token

Output: $0.0000015 per token

Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and released under the Apache 2.0 license.

Context: 262144 tokens

Max output: N/A tokens

✗ Unmoderated

anthropic/claude-sonnet-4.5

Input: $0.000003 per token

Output: $0.000015 per token

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with...

Context: 1000000 tokens

Max output: 64000 tokens

✓ Moderated

anthropic/claude-haiku-4.5

Input: $0.000001 per token

Output: $0.000005 per token

Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance...

Context: 200000 tokens

Max output: 64000 tokens

✓ Moderated

google/gemini-2.5-pro

Input: $0.00000125 per token

Output: $0.00001 per token

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

Context: 1048576 tokens

Max output: 65536 tokens

✗ Unmoderated

google/gemini-2.5-flash

Input: $0.0000003 per token

Output: $0.0000025 per token

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...

Context: 1048576 tokens

Max output: 65535 tokens

✗ Unmoderated

google/gemini-3-flash-preview

Input: $0.0000005 per token

Output: $0.000003 per token

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool...

Context: 1048576 tokens

Max output: 65535 tokens

✗ Unmoderated

amazon/nova-premier-v1

Input: $0.0000025 per token

Output: $0.0000125 per token

Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.

Context: 1000000 tokens

Max output: 32000 tokens

✓ Moderated

mistralai/mistral-medium-3.1

Input: $0.0000004 per token

Output: $0.000002 per token

Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances...

Context: 131072 tokens

Max output: N/A tokens

✗ Unmoderated

deepseek/deepseek-r1-0528

Input: $0.0000005 per token

Output: $0.00000215 per token

May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active...

Context: 163840 tokens

Max output: 32768 tokens

✗ Unmoderated

moonshotai/kimi-k2-0905

Input: $0.0000006 per token

Output: $0.0000025 per token

Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32...

Context: 262144 tokens

Max output: 100352 tokens

✗ Unmoderated