Building AI chatbot in Python with Claude API

Chatbots are one of the most popular first AI (artificial intelligence) projects for new developers, partly because the result is satisfying and partly because the moving parts are easier to reason about than image generation or fine-tuning.

This guide walks through building a working chatbot in Python from a clean install, then upgrading it from a one-shot demo into something that handles multi-turn conversations, system prompts, and predictable errors.

We will use Anthropic’s Claude model for the work, because its API has a clean separation between system prompts and conversation history that makes the patterns easier to teach. The full code is included at every step so you can copy-paste, run, and modify.

By the end of this tutorial, you’ll have a chatbot you can talk to in the terminal, plus a clear sense of what would need to change before it could be used in something a real business depends on.

What you’ll need

Python 3.9 or later installed on your machine
An Anthropic account and an API (application programming interface) key
A terminal and a code editor

You can sign up for an Anthropic account at platform.claude.com. Once you’re in, generate an API key from the API keys section in the dashboard. You will also need to add a small amount of credit to your account if you have not used the API before.

The examples in this tutorial each cost a fraction of a penny at current Claude Haiku pricing, so a pound or two of credit is plenty for working through the tutorial and experimenting afterwards.

Treat the API key like a password. It should never be committed to git or pasted into shared documents, and the code in this tutorial uses an environment variable so the key never appears in your source files at all.

Step 1: Install the Anthropic Python library

Open a terminal and run:

pip install anthropic

That installs the official Anthropic Python client. We will use this library for every request in this tutorial.

Step 2: Set your API key as an environment variable

The Anthropic client looks for an environment variable called ANTHROPIC_API_KEY by default. Set it in your shell before running any code.

On macOS or Linux:

export ANTHROPIC_API_KEY="sk-ant-..."

On Windows PowerShell:

$env:ANTHROPIC_API_KEY = "sk-ant-..."

This keeps the key out of your source files. If you want it set permanently, add the line to your shell profile (.zshrc, .bashrc, or your PowerShell $PROFILE script).

Step 3: Make your first request

Create a file called chatbot.py and paste in the following:

from anthropic import Anthropic
client = Anthropic()

message = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello! What is the capital of France?"}
    ],
)
print(message.content[0].text)

Run it from the terminal:

python chatbot.py

You should see something like:

The capital of France is Paris.

That is a working chatbot interaction in ten lines of code. Here is what just happened:

Anthropic() creates a client. It reads your API key from the environment automatically.
messages.create() sends a request to the Messages endpoint, which is the standard way to talk to Claude. Older tutorials sometimes reference a completions endpoint; that is a separate, deprecated API and not what we are using here.
model=”claude-haiku-4-5″ selects Anthropic’s small, fast, cheap Claude model. You can swap it for a Sonnet or Opus model later if you want stronger responses, at a higher per-request cost. The alias claude-haiku-4-5 currently resolves to the pinned snapshot claude-haiku-4-5-20251001; for production code where you want the version to be explicit and stable across any future aliasing change, you can pin the dated form directly. Anthropic publishes the current list of model identifiers in their documentation at platform.claude.com.
max_tokens=1024 sets the maximum length of the reply. Anthropic requires this parameter on every request, which is a useful nudge toward thinking about response length and cost from the start.
messages is a list of turns in the conversation. Each turn has a role (user or assistant) and a content (the text of that turn).
The reply comes back at message.content[0].text. Claude responses are made up of content blocks; for plain text replies the first block is the one you want.

So far this is a one-shot request that does not remember what you said previously. That is the next thing to fix.

Step 4: Add multi-turn memory

A real chatbot needs to remember the conversation so far. The model itself is stateless, so to give it memory you keep the list of messages on your side and send the whole list with every new request.

Replace the contents of chatbot.py with:

from anthropic import Anthropic
client = Anthropic()

messages = []
print("Type 'quit' to exit.\n")

while True:
    user_input = input("You: ")
    if user_input.lower() in ("quit", "exit"):
        break

messages.append({"role": "user", "content": user_input})
message = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=1024,
    messages=messages,
)
assistant_reply = message.content[0].text
messages.append({"role": "assistant", "content": assistant_reply})
print(f"Assistant: {assistant_reply}\n")

Run it and try a multi-turn exchange:

You: My name is Sam.
Assistant: Nice to meet you, Sam. How can I help today?
You: What did I just tell you my name was?
Assistant: You told me your name is Sam.

The bot remembers because the entire conversation history is being sent on every call. The history grows with each turn, which matters later when we talk about cost.

Step 5: Add a system prompt to control the bot

A system prompt is an instruction to the model that sits outside the conversation history and shapes how it replies. In Claude’s API it is a separate top-level parameter called system, which keeps it visually distinct from the user-and-assistant exchange. It is the most direct way you have to control tone, scope, and behavior without retraining anything.

Add a system_prompt variable and pass it to the create call:

system_prompt = (
    "You are a helpful assistant who answers questions about Python "
    "programming. Be concise. If asked about anything unrelated to "
    "Python, politely redirect the user back to Python topics."
)
# inside the while loop:
message = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=1024,
    system=system_prompt,
    messages=messages,
)

Re-run and try asking something off-topic:

You: What is the capital of France?
Assistant: That is outside Python’s scope. Did you have a Python question I can help with?

The system prompt is doing all the work there, with no filter or post-processing layer anywhere in the code. You are telling the model how to behave, and it tries to follow.

A few practical points on system prompts:

They are not magic. The model will sometimes break character, especially with persistent or clever user prompts. Real production systems usually combine a system prompt with input filtering and output checks.
Long system prompts cost more per request because they are sent on every call. Keep them as short as you can while still describing the behaviour you want.
Iterate on them like code. Small wording changes can produce large behavioural changes, and the only way to know what works is to test.

Step 6: Handle errors that you will hit

The minimal version above breaks the first time the API has a hiccup. Before adding your own retry logic, note that the SDK retries certain transient failures up to two times by default with short exponential backoff. Adding an explicit retry layer on top gives you visibility into what is failing and control over the policy. Three errors are particularly common in early projects and worth handling explicitly:

from anthropic import Anthropic, RateLimitError, APIConnectionError, APIStatusError
import time
client = Anthropic()

def send_message(messages, system_prompt, retries=3):
  for attempt in range(retries):
    try:
      return client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=1024,
        system=system_prompt,
        messages=messages,
        timeout=30,
      )
except RateLimitError:
  wait = 2 ** attempt
  print(f"Rate limited. Retrying in {wait}s...")
  time.sleep(wait)
except APIConnectionError:
  print("Network problem. Retrying...")
  time.sleep(1)
except APIStatusError as e:
  print(f"API returned an error: {e}")
  return None
print("Gave up after multiple retries.")
return None

Now your main loop can call send_message(messages, system_prompt) and only proceed if the response is not None. Three patterns are worth noticing here:

Exponential backoff on rate limits. Each retry waits twice as long as the last, which is the standard way to back off without hammering the API.
Connection errors are common on flaky networks and almost always succeed on retry.
Generic API errors get logged so you can read them later, instead of silently disappearing.

At this scale, that level of error handling is sufficient. The list of failure modes grows quickly once you have actual users.

Where this stops working

What you have now runs. It will answer questions in your terminal for hours. There is a meaningful distance between this and something a business can put in front of customers or use to handle internal queries at scale. Specifically:

The conversation history grows without limit. Every turn appends to messages. After enough turns you will exceed the model’s context window, and even before that, each request gets more expensive because every previous turn is being re-sent. Real systems summarise or truncate old turns.
No application-level moderation. Claude is trained to refuse a wide range of harmful requests, but that does not handle every business-specific edge case. You will need to think about what content is appropriate for your specific application and add filtering or routing around the model as needed.
No persistence across sessions. Close the terminal and the conversation is gone. Real bots store the relevant context somewhere a future session can read it.
No factual grounding. The model will confidently make up information that sounds right. For a Python tutor that is mostly harmless. For a customer service bot answering questions about your product, hallucination is a recurring reason real pilot projects get shelved. The standard fix is retrieval over your own data, so the model is answering from sources you control rather than reaching into its training memory.
No cost monitoring. A loop somewhere that accidentally fires 10,000 requests can rack up real money before you notice. Real deployments meter every call and alert on anomalies.

These are the issues that turn a working demo into something users can rely on. Red Eagle Tech has written up what makes AI chatbots actually work in production for anyone wondering what those production patterns look like in practice.

What to try next

The code above is the skeleton. The interesting work starts when you push on it:

Add a summarisation step that compresses old turns once the conversation gets long.
Add streaming so replies appear chunk-by-chunk instead of all at once.
Wrap the bot in a Flask or FastAPI endpoint so it can be called from a web page.
Add an input filter that rejects or rewrites obviously off-topic or unsafe requests before they reach the model.
Connect a vector database so the bot can look up real information instead of making it up.

Each of those is a separate tutorial in itself. You now have the base they all build on.