Multi-Repo Context

Index multiple repositories and data sources into a single searchable knowledge base.

A common pattern is indexing multiple repositories, wikis, and data sources into a single Context Harness instance. This gives your AI agents unified search across your entire engineering organization’s knowledge.

The idea

Configure multiple named Git connectors, filesystem mounts, S3 buckets, and Lua scripts — all feeding into the same SQLite database. When an agent searches, it gets results from all sources ranked together.

┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│ git:platform │  │  git:infra   │  │ s3:runbooks  │  │ script:jira  │
└──────┬───────┘  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘
       │                 │                 │                 │
       └─────────────────┴─────────────────┴─────────────────┘
                                   │
                          ┌────────▼────────┐
                          │  SQLite (single  │
                          │   database)      │
                          └────────┬────────┘
                                   │
                          ┌────────▼────────┐
                          │  ctx serve mcp  │
                          │  :7331          │
                          └─────────────────┘

Complete multi-repo config

Here’s a real-world config indexing an engineering org’s docs, multiple services, and external knowledge sources:

# config/ctx.toml — Multi-repo engineering context

[db]
path = "./data/ctx.sqlite"

[chunking]
max_tokens = 700
overlap_tokens = 80

[embedding]
provider = "openai"
model = "text-embedding-3-small"
dims = 1536
batch_size = 64

[retrieval]
final_limit = 15
hybrid_alpha = 0.6
candidate_k_keyword = 100
candidate_k_vector = 100

[server]
bind = "127.0.0.1:7331"

# ── Local project docs ──────────────────────────────────────

[connectors.filesystem.local]
root = "./docs"
include_globs = ["**/*.md", "**/*.txt"]

# ── Main platform service ───────────────────────────────────

[connectors.git.platform]
url = "https://github.com/acme/platform.git"
branch = "main"
root = "."
include_globs = [
    "docs/**/*.md",
    "src/**/*.rs",
    "README.md",
    "CHANGELOG.md",
    "ADR/**/*.md",
]
exclude_globs = ["**/target/**", "**/node_modules/**"]
shallow = true
cache_dir = "./data/.git-cache/platform"

# ── Additional Git repos ────────────────────────────────────

[connectors.git.infra]
url = "https://github.com/acme/infrastructure.git"
branch = "main"
root = "docs/"
include_globs = ["**/*.md"]
shallow = true

[connectors.git.auth-service]
url = "https://github.com/acme/auth-service.git"
branch = "main"
include_globs = ["src/**/*.rs", "docs/**/*.md", "README.md"]
shallow = true

[connectors.git.payments]
url = "https://github.com/acme/payments.git"
branch = "main"
include_globs = ["src/**/*.rs", "docs/**/*.md"]
shallow = true

# ── Runbooks from S3 ────────────────────────────────────────

[connectors.s3.runbooks]
bucket = "acme-engineering"
prefix = "runbooks/"
region = "us-east-1"
include_globs = ["**/*.md"]

# ── Jira issues ─────────────────────────────────────────────

[connectors.script.jira]
path = "connectors/jira.lua"
timeout = 60
url = "https://acme.atlassian.net"
project = "PLATFORM"
api_token = "${JIRA_API_TOKEN}"

Syncing all sources

Sync everything in one command — connectors run in parallel:

$ ctx sync all
Syncing 7 connector instances (parallel scan)...
sync filesystem:local
  fetched: 47 items
  upserted documents: 47
  chunks written: 203
ok
sync git:auth-service
  fetched: 24 items
  upserted documents: 24
  chunks written: 112
ok
sync git:infra
  fetched: 31 items
  upserted documents: 31
  chunks written: 89
ok
sync git:payments
  fetched: 18 items
  upserted documents: 18
  chunks written: 67
ok
sync git:platform
  fetched: 89 items
  upserted documents: 89
  chunks written: 412
ok
sync s3:runbooks
  fetched: 34 items
  upserted documents: 34
  chunks written: 156
ok
sync script:jira
  fetched: 142 items
  upserted documents: 142
  chunks written: 284
ok

Or sync specific types or instances:

$ ctx sync git               # All git connectors (parallel)
$ ctx sync git:platform      # Just one repo
$ ctx sync s3                # All S3 connectors
$ ctx sync script:jira       # One Lua connector

Filtering by source

When searching, you can filter results to a specific source:

# Search only the platform repo
$ ctx search "auth middleware" --source git:platform

# Search only Jira issues
$ ctx search "payment timeout" --source script:jira

# Search everything (default)
$ ctx search "deployment procedure"

Via the API:

$ curl -s localhost:7331/tools/search \
    -d '{"query": "error handling", "source": "git:auth-service"}' | jq .

Cursor workspace with multi-repo context

If you work in a Cursor workspace with multiple repos, a single Context Harness instance can provide unified context across all of them.

Setup:

Create a shared config directory:

~/ctx-workspace/
├── config/
│   └── ctx.toml          # Multi-repo config (as above)
├── data/
│   └── ctx.sqlite         # Shared database
├── connectors/
│   └── jira.lua           # Lua connector for Jira
└── scripts/
    └── sync-all.sh        # Sync script

Add to every Cursor workspace — create .cursor/mcp.json:

{
  "mcpServers": {
    "org-context": {
      "url": "http://127.0.0.1:7331/mcp"
    }
  }
}

Start the server once and use it everywhere:

$ cd ~/ctx-workspace
$ ctx serve mcp --config ./config/ctx.toml

Now every Cursor window in every repo has access to the full org knowledge base.

CI-based multi-repo index

For teams, build the index in CI so everyone gets fresh context:

# .github/workflows/build-context.yml
name: Build Context Index
on:
  schedule:
    - cron: '0 */6 * * *'  # Every 6 hours
  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo install --git https://github.com/parallax-labs/context-harness

      - name: Sync all sources
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          JIRA_API_TOKEN: ${{ secrets.JIRA_API_TOKEN }}
          GITHUB_TOKEN: ${{ secrets.GH_PAT }}
        run: |
          ctx init --config ./config/ctx.toml
          ctx sync all --full --config ./config/ctx.toml
          ctx embed pending --config ./config/ctx.toml

      - name: Upload database
        uses: actions/upload-artifact@v4
        with:
          name: ctx-database
          path: data/ctx.sqlite

Team members download the latest ctx.sqlite and run ctx serve mcp locally — instant multi-repo context without needing API keys or waiting for syncs.

← Previous Agent Integration Next → Build a RAG Agent