Claude 3.7 Sonnet
30
About
Claude 3.7 Sonnet is a hybrid-reasoning language model built to give users a practical balance between speed and depth. It offers two operational modes: a fast standard mode for quick, high-quality replies, and an extended thinking mode that performs step-by-step reasoning, planning, and multi-perspective analysis before returning an answer. This makes it useful both for near-instant support and for tackling complex problems that need careful deliberation.
The model excels at software engineering tasks — it achieves industry-leading coding benchmark results and is Anthropic’s most capable model for creative and context-aware coding. Developers can use agentic workflows via the Claude Code command-line tool to delegate substantial engineering tasks directly from the terminal. Claude 3.7 Sonnet also supports very large interactions: up to 200,000 input tokens and up to 128,000 output tokens (64K generally available, 128K in beta), letting you handle long codebases, detailed reports, and extensive research in a single session.
API users get fine-grained control over how long the model ‘thinks’, helping balance response speed, depth, and cost. Practical features include batch predictions, prompt caching, function calling, and token counting. The model is widely available across Anthropic’s plans and major cloud providers (Anthropic API, Amazon Bedrock, Google Vertex AI); extended thinking is included on paid tiers. Knowledge is current through November 2024.
Use Claude 3.7 Sonnet for complex coding and debugging, AI agents requiring multi-step workflows, long-form content and technical documentation, and problems that benefit from deliberate, stepwise reasoning. While Sonnet 4 improves on some capabilities, Claude 3.7 remains a powerful, flexible choice when you need strong coding performance, large-context handling, and controllable reasoning depth.
Percs
Large context
Strong coding
High accuracy
Support file upload
Settings
Diversity control (Top P)- Controls response diversity by filtering based on cumulative probability. Lower values (0.1-0.5) produce focused, deterministic responses. Higher values (0.7-1.0) allow more creative and diverse outputs.
Temperature- Controls randomness and creativity. Lower values (0.1-0.5) make responses more focused and deterministic. Higher values (0.6-1.0) increase creativity and variability.
Response length- Maximum number of tokens in the AI's response. Higher values allow longer answers. 1000 tokens ≈ 750 words.
Context length- Maximum input length the model can process. Higher values allow more conversation history or longer documents.
Reasoning- Enable deep thinking mode where the model works through problems step-by-step before responding. Best for complex analysis, debugging, and strategic planning.
Reasoning Tokens- Token budget allocated for internal reasoning process. Must be less than max_tokens. Higher values allow deeper analysis but increase cost.
