Models

Model providers have different strengths. Use the summaries below to choose a provider, then pick a model family that matches the task.

OpenAI (GPT)

Strong all-around for writing, planning, coding, and everyday questions. Includes GPT models from OpenAI plus GPT OSS variants served through Fireworks and Cerebras.

GPT-4o series

Model	Requests	Best For
GPT-4o · May 2024	2 premium requests	May 2024 checkpoint of gpt-4o for that special voice.
GPT-4o · Aug. 2024	1 premium request	August 2024 checkpoint of gpt-4o with enhanced capabilities.
GPT-4o · Nov. 2024	1 premium request	November 2024 checkpoint of gpt-4o with latest improvements.
GPT-4o	1 premium request	OpenAI's default gpt-4o through API.
GPT-4o mini	1 base request	Agile, cost-efficient 4o variant ideal for everyday conversation.

GPT-3.5 Turbo

Model	Requests	Best For
GPT-3.5 Turbo	2 base requests	Legacy GPT model for cheaper chat and non-chat tasks.
GPT-3.5 Turbo · 0125	2 base requests	Pinned January 2024 snapshot of GPT-3.5 Turbo.
GPT-3.5 Turbo · 1106	2 base requests	Pinned November 2023 snapshot of GPT-3.5 Turbo.

GPT-audio series

Model	Requests	Best For
GPT audio mini	3 base requests	Cost-efficient audio-native chat model. Supports text + audio output in chat completions.
GPT audio mini · Dec. 2025	3 base requests	Pinned December 2025 snapshot of GPT Audio mini for stable behavior.

GPT-5.6 series

Model	Requests	Best For
GPT-5.6 Sol	5 premium requests	OpenAI's frontier GPT-5.6 model for complex professional work, coding, research, and agentic tool use.
GPT-5.6 Terra	2 premium requests	GPT-5.6's balanced model for capable everyday work, coding, vision, and tool-using chat at a lower cost.
GPT-5.6 Luna	1 premium request	GPT-5.6's cost-sensitive model for fast, high-volume chat, extraction, ranking, and lightweight tool use.

GPT-5.4 series

Model	Requests	Best For
GPT-5.4	2 premium requests	Latest GPT-5.4 flagship chat model with stronger reasoning and accuracy.
GPT-5.4 mini	1 premium request	Higher-capability GPT-5.4 mini for high-volume coding, computer use, and subagent workflows.
GPT-5.4 nano	2 base requests	Cheapest GPT-5.4-class model for simple high-volume tasks such as extraction, ranking, and lightweight subagents.

GPT-5.3 series

Model	Requests	Best For
GPT-5.3-Codex	1 premium request	The most capable agentic coding model to date. Optimized for agentic coding tasks in Codex or similar environments. 400K context, 128K max output. Reasoning off by default.
GPT-5.3 latest	1 premium request	GPT-5.3 model used in ChatGPT. Best general-purpose model with high intelligence and vision support. Pricing assumed same as 5.2/5.1 chat latest until announced.

GPT-5.2 series

Model	Requests	Best For
GPT-5.2 latest	1 premium request	GPT-5.2 model used in ChatGPT. Best general-purpose model with high intelligence and vision support.
GPT-5.2	1 premium request	Pinned GPT-5.2 snapshot for stable behavior.

GPT-5.1 series

Model	Requests	Best For
GPT-5.1	1 premium request	Pinned snapshot gpt-5.1-2025-11-13. The most intelligent model yet, with faster responses and increased steerability.

GPT-5 series

Model	Requests	Best For
GPT-5	1 premium request	Frontier reasoning depth with best-in-class reliability.
GPT-5 mini	3 base requests	Responsive, budget-friendly member of the GPT-5 family.
GPT-5 nano	1 base request	Ultra light-touch assistant for simple interactions.

GPT-4.1 series

Model	Requests	Best For
GPT-4.1	1 premium request	GPT-4 refinement designed for coding with broad tool compatibility.
GPT-4.1 mini	2 base requests	Compact GPT-4.1 option for consistent tone and speed.
GPT-4.1 nano	1 base request	Minimal footprint 4.1 for background automation tasks.

O-series

Model	Requests	Best For
o3	1 premium request	Reasoning-focused o-series model optimized for long horizon tasks.
o4 mini	1 premium request	Lean o-series model for high volume creative projects.
o3 mini	1 premium request	Balanced o-series variant with emphasis on tool use during reasoning.

GPT-oss series

Model	Requests	Best For
GPT OSS 120b	1 base request	OpenAI's open-weight 117B MoE via Fireworks. Production-grade reasoning, agentic tasks, function calling. 131k context. Does not support web search or image input.
GPT OSS 120B Fast	1 premium request	OpenAI's GPT OSS 120B routed through Cerebras chat completions for very fast tool-capable replies. 131k context. Does not support web search or image input.
GPT OSS 20b	1 base request	OpenAI's open-weight 21B MoE via Fireworks. Lower latency, local or specialized use-cases. 131k context. Does not support web search, image input, or function calling.

Google (Gemini)

Great for long instructions, large context, and quick iteration on bigger tasks.

Gemini 2.5 series

Model	Requests	Best For
Gemini 2.5 Pro	1 premium request	state-of-the-art multipurpose model, which excels at coding and complex reasoning tasks.
Gemini 2.5 Flash	3 base requests	first hybrid reasoning model which supports a 1M token context window and has thinking budgets.
Gemini 2.5 Flash-Lite	1 base request	smallest and most cost effective model, built for at scale usage.

Gemini 3.1 series

Model	Requests	Best For
Gemini 3.1 Pro Preview	3 premium requests	Next iteration of Gemini 3 Pro: performance, behavior, and intelligence improvements. 1M/64k context. Agentic workflows, autonomous coding, complex multimodal. Jan 2025.
Gemini 3.1 Flash-Lite	2 base requests	Stable Gemini 3.1 Flash-Lite model for high-volume agentic tasks, translation, and simple data processing. 1M/65k context.

Gemini 3 series

Model	Requests	Best For
Gemini 3 Flash Preview	4 base requests	Preview of Gemini 3 Flash. 1M/64k context. Jan 2025.

Gemini 3.5 family

Model	Requests	Best For
Gemini 3.5 Flash	2 premium requests	Gemini 3.5 Flash combines frontier intelligence with fast responses, search grounding, and multimodal strengths. Uses 2 premium requests per send before length multipliers.
Gemini 3.5 Flash-Lite	3 base requests	Google's fastest Gemini 3.5 model, optimized for high-volume agentic work, document processing, translation, and classification. Uses 3 base requests per send before length multipliers.

Gemini 3.6 family

Model	Requests	Best For
Gemini 3.6 Flash	2 premium requests	Google's production workhorse for coding, agentic execution, knowledge work, and multimodal analysis. Uses 2 premium requests per send before length multipliers.

Anthropic (Claude)

Good for careful writing, nuanced edits, and thoughtful longer responses.

Claude series

Model	Requests	Best For
Claude Sonnet 4.6	5 premium requests	Anthropic's most capable Sonnet yet. Full upgrade across coding, long-context reasoning, agent planning, and design. 1M token context window in beta. Same pricing as Sonnet 4.5.
Claude Sonnet 4.5	5 premium requests	Anthropic's balanced Claude model with strong reasoning and efficiency.
Claude Haiku 4.5	1 premium request	Anthropic's fastest Claude model, optimized for speed and cost efficiency.

xAI (Grok)

Good for quick back-and-forth, practical answers, and fast drafting.

Grok-3

Model	Requests	Best For
Grok-3 Mini	1 base request	Compact Grok-3 variant for cost-effective conversations.

Grok-4.20

Model	Requests	Best For
Grok-4.20 Reasoning	1 premium request	xAI's flagship Grok 4.20 reasoning model with a 2M-token context window, stronger multi-step reasoning, and native tool support.
Grok-4.20 Non-Reasoning	1 premium request	Latency-optimized Grok 4.20 variant with a 2M-token context window, image understanding, and native tool support.

Grok-4.3

Model	Requests	Best For
Grok 4.3	1 premium request	xAI's latest flagship Grok model with 1M context, image input, configurable reasoning, and strong agentic tool use.

Grok-4.5

Model	Requests	Best For
Grok 4.5	1 premium request	xAI's latest flagship for coding, agentic tasks, knowledge work, and tool-using chat with configurable reasoning.

Grok-build

Model	Requests	Best For
Grok Build 0.1	1 premium request	xAI's dedicated coding model, built for agentic software engineering, code review, and debugging across large repositories.

DeepSeek

Strong for reasoning and complex tasks. DeepSeek v3.x models and DeepSeek V4 Flash are available on all tiers; DeepSeek V4 Pro is Premium. V4 Flash and V4 Pro have 1M context and function calling. No web search or images.

DeepSeek

Model	Requests	Best For
DeepSeek V4 Flash	1 base request	DeepSeek-V4-Flash via Fireworks: streamlined open-source MoE model optimized for fast, cost-efficient inference while preserving strong reasoning and coding performance at 1M context scale. Function calling supported. Uses 1 base request per send before length multipliers. Does not support web search or image input.
DeepSeek V4 Pro	1 premium request	DeepSeek-V4-Pro via Fireworks: flagship open-source 1.6T MoE model for frontier reasoning, advanced coding, and long-context agentic workflows. 1M context. Function calling supported. Uses 1 premium request per send before length multipliers. Does not support web search or image input.

Qwen

Strong for multimodal chat, tool use, and general flagship work. Available on all tiers; Qwen 3.6 Plus is served via Fireworks and Qwen 3.7 Plus via the Vercel AI Gateway. Supports image input, but not web search.

Qwen

Model	Requests	Best For
Qwen 3.7 Plus	2 base requests	Alibaba's Qwen 3.7 Plus via the Vercel AI Gateway: flagship Qwen model with function calling and image input support. 1M context. Available serverless on just4o.chat at the base tier. Uses 2 base requests per send before length multipliers. Does not support web search.

Moonshot (Kimi)

Good for complex reasoning, multimodal agentic tasks, and long-horizon coding. Kimi K2.5 and K2.6 support images. No web search.

Kimi

Model	Requests	Best For
Kimi K2.6	2 premium requests	Moonshot AI's Kimi K2.6 via Fireworks: open-source, native multimodal agentic model for long-horizon coding, coding-driven design, autonomous execution, and task orchestration. 1T MoE, 262k context. Supports image input and function calling. Uses 2 premium requests per send before length multipliers. Does not support web search.
Kimi K2.7 Code	2 premium requests	Moonshot AI's Kimi K2.7 Code via the Vercel AI Gateway: open-source, native multimodal agentic model tuned for long-horizon coding, coding-driven design, autonomous execution, and task orchestration. 262k context. Supports image input and function calling. Uses 2 premium requests per send before length multipliers. Does not support web search.
Kimi K2.7 Code Highspeed	3 premium requests	Moonshot AI's Kimi K2.7 Code Highspeed via the Vercel AI Gateway: a low-latency, high-throughput serving tier of Kimi K2.7 Code for long-horizon coding, coding-driven design, autonomous execution, and task orchestration. 262k context. Supports image input and function calling. Uses 3 premium requests per send before length multipliers. Does not support web search.

MiniMax

Strong for coding, complex tasks, and office work. MiniMax M2.7 supports image input. Available on all tiers. No web search.

MiniMax

Model	Requests	Best For
MiniMax M2.7	2 base requests	MiniMax M2.7 via Fireworks: 228B MoE model for complex agent harnesses, productivity tasks, Agent Teams, Skills, and dynamic tool search. 196k context. Supports image input and function calling. Does not support web search.
MiniMax M3	2 base requests	MiniMax M3 via the Vercel AI Gateway: agentic model for complex tool use, productivity tasks, Agent Teams, and long-horizon workflows with a 512k-token context window. Supports image input and function calling. Does not support web search.

Z.ai (GLM)

Strong for coding, reasoning, and long-horizon agentic workflows. GLM models are available through Fireworks and Cerebras; GLM 4.7 Fast is the Cerebras-backed Premium OSS variant. No web search or images.

GLM 4.7 family

Model	Requests	Best For
GLM 4.7 Fast	2 premium requests	Z.ai's GLM-4.7 routed through Cerebras chat completions for lower-latency coding and agentic work. 131k context. Does not support web search or image input. Cerebras currently lists it as a preview model.

GLM 5 family

Model	Requests	Best For
GLM 5.2	2 premium requests	Z.ai's GLM-5.2 via the Vercel AI Gateway: flagship agentic engineering and coding with a 1M-token context window. Adds function calling (not available on GLM-5/5.1). Uses 2 premium requests per send before length multipliers. Does not support web search or image input.