Build a RAG Agent

Build a Python agent that searches your knowledge base, retrieves context, and generates grounded answers.

This guide walks through building a complete Retrieval-Augmented Generation (RAG) agent in Python that uses Context Harness as its knowledge backend. The agent searches your docs, retrieves relevant context, and generates grounded answers — no hallucination.

What you’ll build

A command-line agent that:

Takes a natural language question
Searches your knowledge base via Context Harness
Retrieves the most relevant documents
Generates an answer grounded in those documents
Cites its sources

Prerequisites

Context Harness installed and running (ctx serve mcp)
Python 3.10+ with pip
An OpenAI API key (or any OpenAI-compatible provider)

Step 1: Set up your knowledge base

If you haven’t already, create a config and index your docs:

$ cat > config/ctx.toml << 'EOF'
[db]
path = "./data/ctx.sqlite"

[chunking]
max_tokens = 700
overlap_tokens = 80

[embedding]
provider = "openai"
model = "text-embedding-3-small"
dims = 1536

[retrieval]
final_limit = 10
hybrid_alpha = 0.6

[server]
bind = "127.0.0.1:7331"

[connectors.git.repo]
url = "https://github.com/your-org/your-repo.git"
branch = "main"
include_globs = ["docs/**/*.md", "src/**/*.rs", "README.md"]
shallow = true
EOF

$ ctx init && ctx sync git:repo && ctx embed pending
$ ctx serve mcp &

Step 2: Install Python dependencies

$ pip install openai requests

Step 3: Build the agent

#!/usr/bin/env python3
"""rag_agent.py — A RAG agent powered by Context Harness."""

import json
import requests
import openai

CTX_URL = "http://localhost:7331"
client = openai.OpenAI()

# ── Tool definitions (fetched from Context Harness) ──────────

def get_tools():
    """Fetch tool schemas from Context Harness and convert to OpenAI format."""
    resp = requests.get(f"{CTX_URL}/tools/list")
    ctx_tools = resp.json()["tools"]
    return [
        {
            "type": "function",
            "function": {
                "name": t["name"],
                "description": t["description"],
                "parameters": t["parameters"],
            },
        }
        for t in ctx_tools
    ]

# ── Tool execution ───────────────────────────────────────────

def call_tool(name: str, arguments: dict) -> dict:
    """Execute a tool via Context Harness HTTP API."""
    if name in ("search", "get", "sources"):
        # Built-in tools use their dedicated endpoints
        endpoint = {"search": "search", "get": "get", "sources": "sources"}[name]
        method = "POST" if name != "sources" else "GET"
        if method == "POST":
            resp = requests.post(f"{CTX_URL}/tools/{endpoint}", json=arguments)
        else:
            resp = requests.get(f"{CTX_URL}/tools/{endpoint}")
    else:
        # Lua tools use the dynamic endpoint
        resp = requests.post(f"{CTX_URL}/tools/{name}", json=arguments)

    return resp.json()

# ── Agent loop ───────────────────────────────────────────────

SYSTEM_PROMPT = """You are a helpful technical assistant with access to a knowledge base.

When answering questions:
1. ALWAYS search the knowledge base first using the 'search' tool
2. Use 'hybrid' mode for natural language questions, 'keyword' for specific terms
3. If a search result looks relevant, use 'get' to retrieve the full document
4. Ground your answers in the retrieved documents
5. Cite your sources with titles and URLs when available
6. If you can't find relevant information, say so honestly

Never make up information that isn't in the knowledge base."""

def agent(question: str) -> str:
    """Run the RAG agent on a question."""
    tools = get_tools()
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": question},
    ]

    # Agent loop — allow up to 5 tool calls
    for _ in range(5):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )

        msg = response.choices[0].message
        messages.append(msg)

        # If the model wants to call tools
        if msg.tool_calls:
            for tool_call in msg.tool_calls:
                name = tool_call.function.name
                args = json.loads(tool_call.function.arguments)

                print(f"  🔧 Calling {name}({json.dumps(args, indent=2)[:100]}...)")
                result = call_tool(name, args)

                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(result),
                })
        else:
            # Model is done — return the final answer
            return msg.content

    return msg.content

# ── CLI interface ────────────────────────────────────────────

if __name__ == "__main__":
    import sys

    if len(sys.argv) > 1:
        question = " ".join(sys.argv[1:])
    else:
        question = input("Ask a question: ")

    print(f"\n🤔 Thinking about: {question}\n")
    answer = agent(question)
    print(f"\n📝 Answer:\n{answer}")

Step 4: Run it

$ python rag_agent.py "How does the hybrid search scoring work?"

🤔 Thinking about: How does the hybrid search scoring work?

  🔧 Calling search({"query": "hybrid search scoring", "mode": "hybrid", "limit": 5}...)
  🔧 Calling get({"id": "a1b2c3d4-..."}...)

📝 Answer:
Context Harness uses a weighted merge of keyword (BM25/FTS5) and semantic
(vector similarity) scores to produce hybrid search results.

The scoring algorithm works as follows:
1. **Candidate retrieval**: The system fetches the top `candidate_k_keyword`
   results from FTS5 and top `candidate_k_vector` from vector search.
2. **Score normalization**: Both scores are normalized to [0, 1] via min-max scaling.
3. **Weighted merge**: `final_score = (1 - alpha) * keyword + alpha * vector`
   where `hybrid_alpha` defaults to 0.6.

Sources:
- [Search & Retrieval](https://github.com/parallax-labs/context-harness/blob/main/docs/HYBRID_SCORING.md)

Step 5: Add conversation memory

Extend the agent to support multi-turn conversations:

def chat():
    """Interactive chat with conversation memory."""
    tools = get_tools()
    messages = [{"role": "system", "content": SYSTEM_PROMPT}]

    print("💬 Chat with your knowledge base (type 'quit' to exit)\n")

    while True:
        question = input("You: ").strip()
        if question.lower() in ("quit", "exit", "q"):
            break

        messages.append({"role": "user", "content": question})

        # Agent loop
        for _ in range(5):
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=tools,
                tool_choice="auto",
            )

            msg = response.choices[0].message
            messages.append(msg)

            if msg.tool_calls:
                for tc in msg.tool_calls:
                    result = call_tool(
                        tc.function.name,
                        json.loads(tc.function.arguments),
                    )
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tc.id,
                        "content": json.dumps(result),
                    })
            else:
                print(f"\nAssistant: {msg.content}\n")
                break

Step 6: Use with Anthropic / Claude

The same pattern works with Anthropic’s API:

import anthropic

client = anthropic.Anthropic()

# Convert OpenAI tool format to Anthropic format
def openai_to_anthropic_tools(tools):
    return [
        {
            "name": t["function"]["name"],
            "description": t["function"]["description"],
            "input_schema": t["function"]["parameters"],
        }
        for t in tools
    ]

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system=SYSTEM_PROMPT,
    tools=openai_to_anthropic_tools(get_tools()),
    messages=[{"role": "user", "content": "How do I deploy this?"}],
)

What’s next?

Add streaming responses for real-time output
Build a web UI with FastAPI or Flask
Add authentication for multi-user access
Deploy with Docker (see Deployment)
Add custom Lua tools for write operations (see Lua Tools)

← Previous Multi-Repo Context Next → Build a Knowledge-Base Chatbot