Expand description
§Context Harness
A local-first context ingestion and retrieval framework for AI tools.
Context Harness provides a connector-driven pipeline for ingesting documents from multiple sources (filesystem, Git repositories, S3 buckets, Lua scripts), chunking and embedding them, and exposing hybrid search (keyword + semantic) via a CLI and MCP-compatible HTTP server.
§Architecture
┌─────────────┐ ┌─────────────┐ ┌──────────┐
│ Connectors │──▶│ Pipeline │──▶│ SQLite │
│ FS/Git/S3 │ │ Chunk+Embed │ │ FTS5+Vec │
└─────────────┘ └─────────────┘ └────┬─────┘
│
┌───────────────────┤
▼ ▼
┌──────────┐ ┌──────────┐
│ CLI │ │ HTTP │
│ (ctx) │ │ (MCP) │
└──────────┘ └──────────┘§Data Flow
- Connectors scan external sources and produce
models::SourceItems. - The ingestion pipeline (
ingest) normalizes items intomodels::Documents, computes deduplication hashes, and upserts them into SQLite. - Documents are split into
models::Chunks by the paragraph-boundary chunker (chunk). - Chunks are indexed in FTS5 for keyword search and optionally
embedded via the embedding provider (
embedding) for vector search. - The query engine (
search) supports keyword, semantic, and hybrid retrieval with min-max normalized scoring. - Results are exposed via the CLI (
ctx) and the MCP HTTP server (server).
§Quick Start
ctx init # create database
ctx sync all # ingest all configured sources (parallel)
ctx sync git:platform # ingest a specific git connector
ctx embed pending # generate embeddings
ctx search "deployment" --mode hybrid
ctx serve mcp # start HTTP server§Connectors
| Connector | Source | Module |
|---|---|---|
| Filesystem | Local directories | connector_fs |
| Git | Any Git repository (local or remote) | connector_git |
| S3 | Amazon S3 / S3-compatible buckets | connector_s3 |
| Lua Script | Any source via custom Lua scripts | connector_script |
§Search Modes
| Mode | Engine | Requires Embeddings |
|---|---|---|
keyword | SQLite FTS5 (BM25) | No |
semantic | Cosine similarity over vectors | Yes |
hybrid | Weighted merge (configurable α) | Yes |
§Modules
| Module | Purpose |
|---|---|
config | TOML configuration parsing and validation |
models | Core data types: SourceItem, Document, Chunk, SearchResult |
connector_fs | Filesystem connector: walk local directories |
connector_git | Git connector: clone/pull repos with per-file metadata |
connector_s3 | S3 connector: list and download objects with SigV4 signing |
connector_script | Lua scripted connectors: custom data sources via Lua 5.4 scripts |
lua_runtime | Shared Lua 5.4 VM runtime: sandboxing, host APIs, value conversions |
tool_script | Lua MCP tool extensions: load, validate, execute Lua tool scripts |
traits | Extension traits: Connector, Tool, ToolContext, registries |
agents | Agent system: Agent trait, AgentPrompt, AgentRegistry, TomlAgent |
agent_script | Lua scripted agents: load, resolve, scaffold, test |
chunk | Paragraph-boundary text chunker |
embedding | Embedding provider trait, OpenAI implementation, vector utilities |
embed_cmd | Embedding CLI commands: pending and rebuild |
export | JSON export for static site search (ctx export) |
stats | Database statistics: document, chunk, and embedding counts |
ingest | Ingestion pipeline: connector → normalize → chunk → embed → store |
search | Keyword, semantic, and hybrid search with score normalization |
get | Document retrieval by UUID |
sources | Connector health and status listing |
server | MCP-compatible HTTP server (Axum) with CORS |
db | SQLite connection pool with WAL mode |
migrate | Database schema migrations (idempotent) |
§Configuration
Context Harness is configured via a TOML file (default: config/ctx.toml).
See config for all available options and config::load_config for
validation rules.
Re-exports§
pub use agents::Agent;pub use agents::AgentPrompt;pub use agents::AgentRegistry;pub use agents::TomlAgent;pub use traits::Connector;pub use traits::ConnectorRegistry;pub use traits::GetTool;pub use traits::SearchTool;pub use traits::SourcesTool;pub use traits::Tool;pub use traits::ToolContext;pub use traits::ToolRegistry;
Modules§
- agent_
script - Lua scripted agent runtime.
- agents
- Agent system for MCP prompts and personas.
- chunk
- Paragraph-boundary text chunker — re-exported from
context-harness-core. - config
- Configuration parsing and validation.
- connector_
fs - Filesystem connector.
- connector_
git - Git repository connector.
- connector_
s3 - Amazon S3 connector.
- connector_
script - Lua scripted connector runtime.
- db
- SQLite database connection management.
- embed_
cmd - Embedding CLI commands:
ctx embed pendingandctx embed rebuild. - embedding
- Embedding provider abstraction and implementations.
- export
- Export the search index as JSON for static site search.
- extract
- Multi-format text extraction for binary documents (PDF, OOXML).
- get
- Document retrieval by ID.
- ingest
- Ingestion pipeline orchestration.
- lua_
runtime - Shared Lua 5.4 VM runtime for connectors and tools.
- mcp
- MCP JSON-RPC protocol bridge.
- migrate
- Database schema migrations.
- models
- Core data models — re-exported from
context-harness-core. - progress
- Sync and embed progress reporting.
- registry
- Extension registry system for community connectors, tools, and agents.
- search
- Search engine with keyword, semantic, and hybrid retrieval modes.
- server
- MCP-compatible HTTP server.
- sources
- Connector health and status listing.
- sqlite_
store - SQLite-backed
Storeimplementation. - stats
- Database statistics and health overview.
- store
- Storage abstraction for Context Harness.
- tool_
script - Lua MCP tool extensions.
- traits
- Extension traits for custom connectors and tools.
Structs§
- Source
Item - Raw item produced by a connector before normalization.