pub struct SourceItem {
pub source: String,
pub source_id: String,
pub source_url: Option<String>,
pub title: Option<String>,
pub author: Option<String>,
pub created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
pub content_type: String,
pub body: String,
pub metadata_json: String,
pub raw_json: Option<String>,
pub raw_bytes: Option<Vec<u8>>,
}Expand description
Raw item produced by a connector before normalization.
Connectors (filesystem, Git, S3) emit SourceItems that are then
normalized into Documents during the ingestion pipeline.
§Fields
| Field | Description |
|---|---|
source | Connector name, e.g. "filesystem", "git", "s3" |
source_id | Unique identifier within the source (e.g. relative file path, S3 key) |
source_url | Optional web-browsable URL (e.g. GitHub blob URL, s3:// URI) |
title | Human-readable title, typically the filename |
author | Author extracted from source metadata (e.g. last Git committer) |
created_at / updated_at | Timestamps from the source (commit time, mtime, S3 LastModified) |
content_type | MIME type, e.g. "text/plain", "text/markdown" |
body | Full text content of the document |
metadata_json | Connector-specific metadata as a JSON string |
raw_json | Optional raw API response for debugging |
raw_bytes | When set, the pipeline runs extraction and sets body before upsert; content_type identifies the format |
Fields§
§source: StringConnector name: "filesystem", "git", or "s3".
source_id: StringUnique identifier within the source (e.g. relative file path or S3 object key).
source_url: Option<String>Web-browsable URL for the source item, if available.
title: Option<String>Human-readable title (typically the filename).
Author extracted from source metadata (e.g. last Git committer).
created_at: DateTime<Utc>Creation timestamp from the source.
updated_at: DateTime<Utc>Last modification timestamp from the source.
content_type: StringMIME content type (e.g. "text/plain", "text/markdown").
body: StringFull text content of the document.
metadata_json: StringConnector-specific metadata serialized as JSON.
raw_json: Option<String>Optional raw API/connector response for debugging.
raw_bytes: Option<Vec<u8>>When set, the pipeline runs extraction and sets body from the result before upsert; content_type identifies the format.
Trait Implementations§
Source§impl Clone for SourceItem
impl Clone for SourceItem
Source§fn clone(&self) -> SourceItem
fn clone(&self) -> SourceItem
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more