Python modules

Public surface area of the Python library, sourced from ragforge/.

ragforge (top-level)

Re-exports for the most common workflow.

NameKindDescription
parse_file(path)functionParse any supported file into a Document.
chunk_document(doc, strategy)functionSplit a Document into a list of Chunk.
DocumentclassParsed document: text, source, doc_type, metadata, id.
ChunkclassOne chunk: text, doc_id, index, metadata, id.
available(kind)functionList registered names for a plugin kind.

ragforge.core

core.models — models.py

NameDescription
Document@dataclass — text, source, doc_type, metadata, id; .token_count property.
Chunk@dataclass — text, doc_id, index, metadata, id; .token_count property.
estimate_tokens(text)~4 chars/token approximation, no tokenizer dependency.

core.registry — registry.py

NameDescription
@register(kind, name)Decorator to register a class under a (kind, name) pair.
get(kind, name)Look up a registered class.
available(kind)List all registered names for a kind.
registered_info()Full registry dump across all kinds.
python
from ragforge.core.registry import register, get, available

@register("chunker", "fixed")
class FixedChunker(...):
    ...

chunker_cls = get("chunker", "fixed")
print(available("chunker"))  # ['fixed', 'structure', ...]

ragforge.parsing

NameDescription
parse_file(path, parser=None)Auto-detect by extension and parse; optional override.
TextParserRegistered as 'text' — handles .txt and .md.
HtmlParserRegistered as 'html' — strips HTML tags.
PdfParserRegistered as 'pdf' — requires [pdf] extra (pypdf).
DoclingParserRegistered as 'docling' — for complex layouts. Requires [docling].

ragforge.chunking

NameDescription
chunk_document(doc, strategy, **kw)Dispatch to strategy; returns list[Chunk].
FixedChunker'fixed' — sliding window by token count (chunk_tokens, overlap).
StructureChunker'structure' — split on Markdown headings, respecting max_tokens.
DoclingChunker'docling' — Docling-aware; keeps tables and code blocks intact.

ragforge.pipeline

KnowledgeBase — knowledge.py

MethodDescription
KnowledgeBase.build(...)classmethod — parse, chunk, embed, persist a new KB.
KnowledgeBase.load(name)classmethod — load a previously built KB.
kb.query(question, top_k, mode, rerank)list[(Chunk, score)] — dense/BM25/hybrid retrieval.

Module-level functions

FunctionDescription
build_knowledge_base(name, sources, ...)Convenience wrapper around KnowledgeBase.build().
query_knowledge_base(knowledge, question, ...)Wrapper around KnowledgeBase.load().query() + optional generation.

Sub-modules

ModuleDescription
pipeline.embeddingsEmbedder ABC; DefaultEmbedder (hash-based, zero deps); SentenceTransformerEmbedder; OpenAIEmbedder.
pipeline.storeInMemoryStore — vector store with .add(), .search(), .save(), .load().
pipeline.bm25BM25Index — keyword index with RRF fusion.
pipeline.retrieverRetriever — dense/BM25/hybrid with optional cross-encoder reranking.
pipeline.generationLLMProvider ABC; OpenAI/Anthropic/Ollama; grounded answers with refusal.

ragforge.evaluation

NameDescription
GoldenItemDataclass: question, expected_answer, relevant_chunk_ids, relevant_sources, notes.
GoldenDataset.load(path) (JSON or CSV), .save(path).
Evaluator(kb).run(golden, metrics, ...)Run evaluation → EvalReport.
Evaluator.compare(a, b, golden, ...)A/B comparison with delta table.
generate_golden_draft(...)LLM-bootstrapped draft golden dataset (review required).
RETRIEVAL_METRICS['hit_rate', 'precision_at_k', 'recall_at_k', 'mrr'].

ragforge.quantization

See quantizer.py.

NameDescription
quantize_and_compare(target, knowledge, options)Quantize an embedding model and return a before/after CostQualityReport.

ragforge.migration

NameDescription
migrate_knowledge_base(knowledge, from_model, to_model, validate, options)Full shadow-index migration.

ragforge.coordination

Blackboard

NameDescription
BlackboardEntrykey, value, author, timestamp, tags, version.
Blackboard(name)SQLite-backed (WAL mode), thread-safe, persistent.
InMemoryBlackboard(name)In-memory variant with the same API.
board.write/read/read_all/read_by_tagRead/write entries with optional tag filters.

Agent + Orchestrator

NameDescription
Agent(id, trigger_fn, action_fn)Agent with trigger condition and action.
Orchestrator(board, agents, goal, max_steps)Loop until goal met or quiescence.
orchestrator.run()Returns OrchestrationResult (steps, tokens, cost, reason).
run_benchmark(task)Blackboard vs direct-messaging cost comparison.

ragforge.tracing

NameDescription
Tracer()Pipeline tracer backed by SQLite at ~/.ragforge/traces.db.
tracer.trace(query)Context manager — records start/end, persists on exit.
t.step(name, **data)Record a named pipeline step with arbitrary data.
TraceStore.list_traces / get_traceQuery the trace store.