Home Docs Blog Demo

Search & Retrieval

Keyword, semantic, and hybrid search modes, scoring, and tuning.

Context Harness supports three search modes that can be mixed and tuned for your use case.

Keyword search (FTS5/BM25)

Uses SQLite’s FTS5 extension with BM25 ranking. Fast, zero-cost (no API calls), and works without embeddings.

$ ctx search "deployment procedure" --mode keyword

Good for: exact term matching, code symbols, error messages, specific identifiers.

Vector similarity search over embeddings. Requires [embedding] to be configured and ctx embed pending to have been run.

$ ctx search "how to ship code to production" --mode semantic

Good for: natural language questions, conceptual queries, finding related content that uses different terminology.

Combines keyword and semantic search with weighted scoring. The hybrid_alpha parameter controls the mix:

$ ctx search "auth middleware" --mode hybrid
hybrid_alphaBehavior
0.0100% keyword (BM25 only)
0.3Mostly keyword, some semantic
0.6Default — balanced (recommended)
0.8Mostly semantic
1.0100% semantic (vectors only)

How hybrid scoring works

  1. Candidate retrieval: Fetch top candidate_k_keyword results from FTS5 and top candidate_k_vector from vector search
  2. Score normalization: Both scores normalized to [0, 1] range via min-max scaling
  3. Weighted merge: final_score = (1 - alpha) * keyword_score + alpha * vector_score
  4. Deduplication: If the same chunk appears in both result sets, scores are merged
  5. Document grouping: Chunks grouped by parent document, aggregated with doc_agg strategy
  6. Final ranking: Top final_limit results returned

Retrieval tuning

[retrieval]
final_limit = 12          # Max results returned
hybrid_alpha = 0.6        # 0.0 = keyword, 1.0 = semantic
candidate_k_keyword = 80  # FTS5 candidate pool size
candidate_k_vector = 80   # Vector candidate pool size
group_by = "document"     # Group chunks by parent doc
doc_agg = "max"           # Aggregation: "max" or "avg"
max_chunks_per_doc = 3    # Max chunks per doc in results

Guidelines:

# Default keyword search
$ ctx search "error handling"

# Hybrid with source filter
$ ctx search "deploy" --mode hybrid --source git

# With custom limit
$ ctx search "config" --mode hybrid --limit 3
$ curl -s localhost:7331/tools/search \
    -H "Content-Type: application/json" \
    -d '{
      "query": "how to handle authentication",
      "mode": "hybrid",
      "limit": 5,
      "source": "git"
    }' | jq '.results[] | {title, score, source}'

Client-side search (ctx-search.js)

For static sites, ctx-search.js provides a ⌘K search modal that runs entirely in the browser:

<script src="/ctx-search.js"
    data-json="/data.json"
    data-trigger="#search-btn"
    data-placeholder="Search docs...">
</script>

Features: