Crate context_harness

Crate context_harness 

Source
Expand description

§Context Harness

A local-first context ingestion and retrieval framework for AI tools.

Context Harness provides a connector-driven pipeline for ingesting documents from multiple sources (filesystem, Git repositories, S3 buckets, Lua scripts), chunking and embedding them, and exposing hybrid search (keyword + semantic) via a CLI and MCP-compatible HTTP server.

§Architecture

┌─────────────┐   ┌─────────────┐   ┌──────────┐
│ Connectors  │──▶│  Pipeline    │──▶│  SQLite   │
│ FS/Git/S3   │   │ Chunk+Embed │   │ FTS5+Vec  │
└─────────────┘   └─────────────┘   └────┬─────┘
                                         │
                     ┌───────────────────┤
                     ▼                   ▼
                ┌──────────┐       ┌──────────┐
                │   CLI    │       │   HTTP   │
                │  (ctx)   │       │  (MCP)   │
                └──────────┘       └──────────┘

§Data Flow

  1. Connectors scan external sources and produce models::SourceItems.
  2. The ingestion pipeline (ingest) normalizes items into models::Documents, computes deduplication hashes, and upserts them into SQLite.
  3. Documents are split into models::Chunks by the paragraph-boundary chunker (chunk).
  4. Chunks are indexed in FTS5 for keyword search and optionally embedded via the embedding provider (embedding) for vector search.
  5. The query engine (search) supports keyword, semantic, and hybrid retrieval with min-max normalized scoring.
  6. Results are exposed via the CLI (ctx) and the MCP HTTP server (server).

§Quick Start

ctx init                      # create database
ctx sync all                  # ingest all configured sources (parallel)
ctx sync git:platform         # ingest a specific git connector
ctx embed pending             # generate embeddings
ctx search "deployment" --mode hybrid
ctx serve mcp                 # start HTTP server

§Connectors

ConnectorSourceModule
FilesystemLocal directoriesconnector_fs
GitAny Git repository (local or remote)connector_git
S3Amazon S3 / S3-compatible bucketsconnector_s3
Lua ScriptAny source via custom Lua scriptsconnector_script

§Search Modes

ModeEngineRequires Embeddings
keywordSQLite FTS5 (BM25)No
semanticCosine similarity over vectorsYes
hybridWeighted merge (configurable α)Yes

§Modules

ModulePurpose
configTOML configuration parsing and validation
modelsCore data types: SourceItem, Document, Chunk, SearchResult
connector_fsFilesystem connector: walk local directories
connector_gitGit connector: clone/pull repos with per-file metadata
connector_s3S3 connector: list and download objects with SigV4 signing
connector_scriptLua scripted connectors: custom data sources via Lua 5.4 scripts
lua_runtimeShared Lua 5.4 VM runtime: sandboxing, host APIs, value conversions
tool_scriptLua MCP tool extensions: load, validate, execute Lua tool scripts
traitsExtension traits: Connector, Tool, ToolContext, registries
agentsAgent system: Agent trait, AgentPrompt, AgentRegistry, TomlAgent
agent_scriptLua scripted agents: load, resolve, scaffold, test
chunkParagraph-boundary text chunker
embeddingEmbedding provider trait, OpenAI implementation, vector utilities
embed_cmdEmbedding CLI commands: pending and rebuild
exportJSON export for static site search (ctx export)
statsDatabase statistics: document, chunk, and embedding counts
ingestIngestion pipeline: connector → normalize → chunk → embed → store
searchKeyword, semantic, and hybrid search with score normalization
getDocument retrieval by UUID
sourcesConnector health and status listing
serverMCP-compatible HTTP server (Axum) with CORS
dbSQLite connection pool with WAL mode
migrateDatabase schema migrations (idempotent)

§Configuration

Context Harness is configured via a TOML file (default: config/ctx.toml). See config for all available options and config::load_config for validation rules.

Re-exports§

pub use agents::Agent;
pub use agents::AgentPrompt;
pub use agents::AgentRegistry;
pub use agents::TomlAgent;
pub use traits::Connector;
pub use traits::ConnectorRegistry;
pub use traits::GetTool;
pub use traits::SearchTool;
pub use traits::SourcesTool;
pub use traits::Tool;
pub use traits::ToolContext;
pub use traits::ToolRegistry;

Modules§

agent_script
Lua scripted agent runtime.
agents
Agent system for MCP prompts and personas.
chunk
Paragraph-boundary text chunker — re-exported from context-harness-core.
config
Configuration parsing and validation.
connector_fs
Filesystem connector.
connector_git
Git repository connector.
connector_s3
Amazon S3 connector.
connector_script
Lua scripted connector runtime.
db
SQLite database connection management.
embed_cmd
Embedding CLI commands: ctx embed pending and ctx embed rebuild.
embedding
Embedding provider abstraction and implementations.
export
Export the search index as JSON for static site search.
extract
Multi-format text extraction for binary documents (PDF, OOXML).
get
Document retrieval by ID.
ingest
Ingestion pipeline orchestration.
lua_runtime
Shared Lua 5.4 VM runtime for connectors and tools.
mcp
MCP JSON-RPC protocol bridge.
migrate
Database schema migrations.
models
Core data models — re-exported from context-harness-core.
progress
Sync and embed progress reporting.
registry
Extension registry system for community connectors, tools, and agents.
search
Search engine with keyword, semantic, and hybrid retrieval modes.
server
MCP-compatible HTTP server.
sources
Connector health and status listing.
sqlite_store
SQLite-backed Store implementation.
stats
Database statistics and health overview.
store
Storage abstraction for Context Harness.
tool_script
Lua MCP tool extensions.
traits
Extension traits for custom connectors and tools.

Structs§

SourceItem
Raw item produced by a connector before normalization.