API Primer — Integrating Claude into Your Application

With Anthropic's Claude API, you can embed Claude's AI capabilities into your own applications and services. It opens up possibilities beyond what a chat UI can offer, including automating backend processing, building custom interfaces, and batch-processing large volumes of text.

This article covers everything from obtaining an API key to sending requests, Function Calling, streaming, and cost optimization, with practical examples in Python and TypeScript.

API Overview

Endpoint Structure

The base URL for the Claude API is https://api.anthropic.com/v1. The main endpoints are as follows.

Endpoint	Purpose
`POST /v1/messages`	Text generation (the most fundamental API)
`POST /v1/messages/batches`	Batch processing (asynchronous handling of large volumes of requests)
`POST /v1/messages/count_tokens`	Count tokens in advance

Getting an API Key

Visit the Anthropic Console and create an account. You can generate a new key in the "API Keys" section.

Keep your API key strictly confidential. If your API key is leaked, third parties can send requests on your account and incur charges. Follow these rules:

Do not hardcode the API key in your source code
Do not commit it to a Git repository (especially a public one)
Add .env files to .gitignore
In production, use environment variables or a Secret Manager (AWS Secrets Manager, Google Secret Manager, etc.)

Rate Limits

The Claude API has rate limits on the number of requests (RPM) and tokens (TPM). Limits vary depending on your account usage and tier. When a limit is reached, the API returns 429 Too Many Requests.

Common pitfall: If you see frequent 429 errors in a production app, add retry logic with exponential backoff (gradually increasing the retry interval). This feature is built into Anthropic's official SDKs.

Quickstart

Installing the SDK

First, install Anthropic's official SDK.

Python:

pip install anthropic

TypeScript / Node.js:

npm install @anthropic-ai/sdk

Python:

pip install anthropic

TypeScript / Node.js:

npm install @anthropic-ai/sdk

Setting the Environment Variable

Set your API key as an environment variable.

export ANTHROPIC_API_KEY="sk-ant-..."

In production apps, it is recommended to use a .env file and add it to .gitignore.

# .env
ANTHROPIC_API_KEY=sk-ant-...

Your First Request

Python:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude!"}
    ]
)

print(message.content[0].text)

TypeScript:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const message = await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello, Claude!" }],
});

console.log(message.content[0].text);

cURL (handy for verifying API behavior):

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-opus-4-6",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

The response comes back as JSON in the following format.

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I assist you today?"
    }
  ],
  "model": "claude-opus-4-6",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 10,
    "output_tokens": 12
  }
}

Messages API Basics

Request Parameters

The main parameters for the /v1/messages endpoint are as follows.

Parameter	Required	Description
`model`	Yes	The model ID to use (e.g., `claude-opus-4-6`)
`max_tokens`	Yes	Maximum number of tokens to generate
`messages`	Yes	Array of conversation history (role and content pairs)
`system`	No	System prompt (sets the model's role and constraints)
`temperature`	No	Randomness of output (0 to 1, default 1)
`top_p`	No	Nucleus sampling threshold
`stop_sequences`	No	Array of sequences that stop generation
`stream`	No	Whether to enable streaming (`true`/`false`)

System Prompt

Using a system prompt allows you to set a role and constraints for the model.

Python:

message = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    system="You are a customer support agent. Reply politely and concisely.",
    messages=[
        {"role": "user", "content": "What is your return policy?"}
    ]
)

TypeScript:

const message = await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 1024,
  system: "You are a customer support agent. Reply politely and concisely.",
  messages: [{ role: "user", content: "What is your return policy?" }],
});

Multi-Turn Conversations

Multi-turn conversations are achieved by accumulating conversation history in the messages array.

Python:

conversation_history = []

def chat(user_message: str) -> str:
    conversation_history.append({
        "role": "user",
        "content": user_message
    })

    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        messages=conversation_history
    )

    assistant_message = response.content[0].text
    conversation_history.append({
        "role": "assistant",
        "content": assistant_message
    })

    return assistant_message

# Usage example
print(chat("Please introduce yourself."))
print(chat("What are you good at?"))

TypeScript:

const conversationHistory: { role: "user" | "assistant"; content: string }[] =
  [];

async function chat(userMessage: string): Promise<string> {
  conversationHistory.push({ role: "user", content: userMessage });

  const response = await client.messages.create({
    model: "claude-opus-4-6",
    max_tokens: 1024,
    messages: conversationHistory,
  });

  const assistantMessage = (response.content[0] as { text: string }).text;
  conversationHistory.push({ role: "assistant", content: assistantMessage });

  return assistantMessage;
}

// Usage example
console.log(await chat("Please introduce yourself."));
console.log(await chat("What are you good at?"));

Common pitfall: Continuously accumulating conversation history will eventually exceed the context window limit of a given request (which varies by model — Claude Opus 4.6 / Sonnet 4.6 support up to 1 million tokens, while Haiku 4.5 supports up to 200K tokens). For long-running conversations, you need a design that removes old messages or summarizes them before passing them to the next request.

When to Use temperature

temperature controls the diversity of the output.

temperature	Suitable use cases
`0.0`	Analysis, classification, fact-checking (when deterministic output is needed)
`0.3 to 0.7`	Chatbots, Q&A (balanced output)
`0.8 to 1.0`	Creative writing, brainstorming (when varied output is desired)

Function Calling (Tool Use)

Function Calling (Tool Use) is a feature that gives Claude the ability to call external functions or tools. For example, you can integrate processing that Claude cannot do on its own, such as retrieving current weather data, searching a database, or running calculations.

Execution Flow

The app sends a request containing "tool definitions"
When Claude determines it needs to call a tool, it returns a tool_use block
The app executes the actual function and returns the result to Claude
Claude uses the result to generate the final answer

Writing Tool Definitions

Python:

tools = [
    {
        "name": "get_weather",
        "description": "Retrieves the current weather information for a specified city.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The name of the city to check the weather for (e.g., Tokyo, London)"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The unit for temperature"
                }
            },
            "required": ["city"]
        }
    }
]

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What is the weather like in Tokyo today?"}
    ]
)

# Handling a tool call response
if response.stop_reason == "tool_use":
    tool_use_block = next(
        block for block in response.content if block.type == "tool_use"
    )
    tool_name = tool_use_block.name
    tool_input = tool_use_block.input
    tool_use_id = tool_use_block.id

    # Actual tool processing (using a mock response here)
    if tool_name == "get_weather":
        tool_result = {
            "temperature": 22,
            "condition": "Sunny",
            "humidity": 60
        }

    # Return the tool execution result to Claude
    final_response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        tools=tools,
        messages=[
            {"role": "user", "content": "What is the weather like in Tokyo today?"},
            {"role": "assistant", "content": response.content},
            {
                "role": "user",
                "content": [
                    {
                        "type": "tool_result",
                        "tool_use_id": tool_use_id,
                        "content": str(tool_result)
                    }
                ]
            }
        ]
    )
    print(final_response.content[0].text)

TypeScript:

const tools: Anthropic.Messages.Tool[] = [
  {
    name: "get_weather",
    description: "Retrieves the current weather information for a specified city.",
    input_schema: {
      type: "object",
      properties: {
        city: {
          type: "string",
          description: "The name of the city to check the weather for (e.g., Tokyo, London)",
        },
        unit: {
          type: "string",
          enum: ["celsius", "fahrenheit"],
          description: "The unit for temperature",
        },
      },
      required: ["city"],
    },
  },
];

const response = await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 1024,
  tools,
  messages: [{ role: "user", content: "What is the weather like in Tokyo today?" }],
});

if (response.stop_reason === "tool_use") {
  const toolUseBlock = response.content.find(
    (block): block is Anthropic.Messages.ToolUseBlock =>
      block.type === "tool_use"
  )!;

  // Actual tool processing (using a mock response here)
  const toolResult = { temperature: 22, condition: "Sunny", humidity: 60 };

  const finalResponse = await client.messages.create({
    model: "claude-opus-4-6",
    max_tokens: 1024,
    tools,
    messages: [
      { role: "user", content: "What is the weather like in Tokyo today?" },
      { role: "assistant", content: response.content },
      {
        role: "user",
        content: [
          {
            type: "tool_result",
            tool_use_id: toolUseBlock.id,
            content: JSON.stringify(toolResult),
          },
        ],
      },
    ],
  });
  console.log(finalResponse.content[0]);
}

Common pitfall: If stop_reason is neither "tool_use" nor "end_turn" (e.g., "max_tokens"), generation was cut off without a tool_use block being returned. Increase max_tokens or add logic to determine in advance whether a tool call is needed.

Streaming

With streaming, you can receive tokens one by one as the response is generated, before it is complete. In chat apps or long-form generation, this allows you to display results to users in real time without having them wait.

The Claude API supports streaming in SSE (Server-Sent Events) format.

Python:

with client.messages.stream(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Please explain quantum computing in detail."}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    print()  # Final newline

# Get the final message after the stream completes
final_message = stream.get_final_message()
print(f"\nToken usage: {final_message.usage}")

TypeScript:

const stream = await client.messages.stream({
  model: "claude-opus-4-6",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: "Please explain quantum computing in detail.",
    },
  ],
});

for await (const chunk of stream) {
  if (
    chunk.type === "content_block_delta" &&
    chunk.delta.type === "text_delta"
  ) {
    process.stdout.write(chunk.delta.text);
  }
}

// Get the final message after the stream completes
const finalMessage = await stream.getFinalMessage();
console.log("\nToken usage:", finalMessage.usage);

Server-Sent Events Implementation in Next.js

TypeScript (Next.js App Router):

// app/api/chat/route.ts
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

export async function POST(req: Request) {
  const { messages } = await req.json();

  const encoder = new TextEncoder();

  const stream = new ReadableStream({
    async start(controller) {
      const anthropicStream = await client.messages.stream({
        model: "claude-opus-4-6",
        max_tokens: 1024,
        messages,
      });

      for await (const chunk of anthropicStream) {
        if (
          chunk.type === "content_block_delta" &&
          chunk.delta.type === "text_delta"
        ) {
          controller.enqueue(encoder.encode(chunk.delta.text));
        }
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

Extended Thinking

Extended Thinking is a feature that allows Claude to internally develop a thinking process before producing the final answer. It is particularly effective for tasks requiring high accuracy, such as complex math problems, multi-step reasoning, and detailed analysis.

Enable it by specifying a thinking parameter with budget_tokens (the maximum number of tokens Claude can use for thinking).

Python:

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Number of tokens available for the thinking process
    },
    messages=[
        {
            "role": "user",
            "content": "Please provide the following mathematical proof: prove that there are infinitely many prime numbers."
        }
    ]
)

# The response contains both a thinking block and a text block
for block in response.content:
    if block.type == "thinking":
        print("=== Thinking Process ===")
        print(block.thinking)
    elif block.type == "text":
        print("=== Final Answer ===")
        print(block.text)

TypeScript:

const response = await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 16000,
  thinking: {
    type: "enabled",
    budget_tokens: 10000,
  },
  messages: [
    {
      role: "user",
      content:
        "Please provide the following mathematical proof: prove that there are infinitely many prime numbers.",
    },
  ],
});

for (const block of response.content) {
  if (block.type === "thinking") {
    console.log("=== Thinking Process ===");
    console.log(block.thinking);
  } else if (block.type === "text") {
    console.log("=== Final Answer ===");
    console.log(block.text);
  }
}

Extended Thinking is only available on supported models such as claude-opus-4-6. Increasing budget_tokens improves reasoning accuracy but also increases cost. Set an appropriate value based on your use case.

Batch Processing

When processing large volumes of requests, the Message Batches API is the efficient choice. It offers up to 50% cost savings compared to the standard API and processes requests asynchronously in the background.

Submitting a Batch Request

Python:

# Creating a batch job
batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "request-1",
            "params": {
                "model": "claude-opus-4-6",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Tell me about Tokyo."}
                ]
            }
        },
        {
            "custom_id": "request-2",
            "params": {
                "model": "claude-opus-4-6",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Tell me about London."}
                ]
            }
        }
    ]
)

print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")

TypeScript:

const batch = await client.messages.batches.create({
  requests: [
    {
      custom_id: "request-1",
      params: {
        model: "claude-opus-4-6",
        max_tokens: 1024,
        messages: [{ role: "user", content: "Tell me about Tokyo." }],
      },
    },
    {
      custom_id: "request-2",
      params: {
        model: "claude-opus-4-6",
        max_tokens: 1024,
        messages: [{ role: "user", content: "Tell me about London." }],
      },
    },
  ],
});

console.log(`Batch ID: ${batch.id}`);
console.log(`Status: ${batch.processing_status}`);

Retrieving Batch Results

Batches can take anywhere from a few minutes to several hours to complete. Poll for the status to check progress.

Python:

import time

# Wait for batch completion (polling)
while True:
    batch = client.messages.batches.retrieve(batch.id)
    if batch.processing_status == "ended":
        break
    print(f"Processing... ({batch.request_counts.processing} remaining)")
    time.sleep(60)  # Wait 1 minute and check again

# Retrieve results
for result in client.messages.batches.results(batch.id):
    print(f"ID: {result.custom_id}")
    if result.result.type == "succeeded":
        print(result.result.message.content[0].text)
    else:
        print(f"Error: {result.result.error}")

Common pitfall: Batch processing completes within 24 hours; beyond that it times out. If you have a large number of items to process, consider splitting them into multiple smaller batches. The maximum number of requests per batch is 10,000.

Cost Optimization

Pricing Overview

Claude's API pricing is usage-based, charged per input and output token. The following is a rough guide as of March 2026. Always check the Anthropic official pricing page for the latest rates.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Notes
Claude Haiku 4.5	$1.00	$5.00	Fastest and lowest cost. Best for simple tasks
Claude Sonnet 4.6	$3.00	$15.00	Excellent balance of performance and cost
Claude Opus 4.6	$5.00	$25.00	Highest performance. Best for complex reasoning and analysis

The prices above are estimates. Anthropic may change its pricing. Check actual rates at https://www.anthropic.com/pricing.

Reducing Costs with Prompt Caching

Prompt Caching is a feature that caches long system prompts or context that you use repeatedly. When the cache is hit, the cost for input tokens is significantly reduced (approximately 90% off).

It is effective in cases where the same content is sent every time, such as long system prompts, referencing large volumes of documents, or repeating few-shot examples.

Python:

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an AI assistant with deep expertise in law.\n\n" + very_long_legal_document,  # Long reference document
            "cache_control": {"type": "ephemeral"}  # Enable caching
        }
    ],
    messages=[
        {"role": "user", "content": "Please explain the interpretation of Article 3."}
    ]
)

# Check cache usage
print(f"Cache write tokens: {response.usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")

Model Selection Guidelines

Use Case	Recommended Model	Reason
Classification, routing, simple Q&A	Claude Haiku 4.5	Prioritizes speed and cost
Chatbots, text generation, code completion	Claude Sonnet 4.6	Best overall balance
Complex reasoning, analysis, research	Claude Opus 4.6	Accuracy is the top priority
Batch processing (large volumes of simple tasks)	Claude Haiku 4.5	Low-cost, high-volume processing

Checking Token Counts in Advance

Before sending a request, you can use /v1/messages/count_tokens to check the token count.

Python:

token_count = client.messages.count_tokens(
    model="claude-opus-4-6",
    messages=[
        {"role": "user", "content": "How many tokens does this request use?"}
    ]
)
print(f"Input token count: {token_count.input_tokens}")

Next Steps

Now that you understand the basics of the Claude API, expand your usage further.

MCP Guide — How to connect external tools to Claude using the Model Context Protocol
Anthropic Official API Reference — Detailed specifications for all parameters
Anthropic Cookbook (GitHub) — Practical recipes and sample code
Plan Comparison — Differences between the API and a claude.ai subscription