Module connector_git

Module connector_git 

Source
Expand description

Git repository connector.

Clones or updates a Git repository and walks files within a configurable subdirectory. Extracts rich metadata from git log: per-file commit timestamps, authors, and the HEAD commit SHA. Automatically generates web-browsable URLs for GitHub and GitLab repositories.

§Configuration

[connectors.git.platform]
url = "https://github.com/acme/platform.git"
branch = "main"
root = "docs/"
include_globs = ["**/*.md"]
shallow = true

§Cache Directory

Cloned repos are cached locally (default: alongside the SQLite DB in data/.git-cache/<url-hash>/). Subsequent syncs do git fetch && reset.

§Metadata Extraction

For each file, the connector extracts:

  • updated_at — last commit timestamp from git log -1 --format=%ct
  • author — last committer name from git log -1 --format=%an
  • source_url — web URL (GitHub/GitLab blob link) for the file
  • metadata_json — JSON with git_sha and repo_url

§Web URL Generation

The connector auto-detects GitHub and GitLab URLs and generates browsable blob links:

Input URLGenerated URL
git@github.com:org/repo.githttps://github.com/org/repo/blob/<sha>/<path>
https://github.com/org/repo.githttps://github.com/org/repo/blob/<sha>/<path>
git@gitlab.com:org/repo.githttps://gitlab.com/org/repo/-/blob/<sha>/<path>
Othergit://<url>/<path>

Structs§

GitConnector
A Git connector instance that implements the Connector trait.

Functions§

build_globset 🔒
Build a [GlobSet] from a list of glob pattern strings.
build_web_url 🔒
Attempt to build a web-browsable URL from the git remote URL.
file_to_source_item 🔒
Convert a file in the cloned repo to a SourceItem.
git_clone 🔒
Clone a Git repository into the cache directory.
git_file_last_author 🔒
Get the last commit author name for a specific file.
git_file_last_commit_time 🔒
Get the last commit timestamp (Unix epoch) for a specific file.
git_head_sha 🔒
Get the HEAD commit SHA of a repository.
git_pull 🔒
Update an existing cached repository via fetch + hard reset.
scan_git
Scan a Git repository and produce SourceItems.
short_hash 🔒
Generate a short (12-char) SHA-256 hash of input, used for cache directory naming.