Python modules

Public surface area of the Python library, sourced from ragforge/.

ragforge (top-level)

Re-exports for the most common workflow.

Name	Kind	Description
`parse_file(path)`	function	Parse any supported file into a Document.
`chunk_document(doc, strategy)`	function	Split a Document into a list of Chunk.
`Document`	class	Parsed document: text, source, doc_type, metadata, id.
`Chunk`	class	One chunk: text, doc_id, index, metadata, id.
`available(kind)`	function	List registered names for a plugin kind.

ragforge.core

core.models — models.py

Name	Description
`Document`	@dataclass — text, source, doc_type, metadata, id; .token_count property.
`Chunk`	@dataclass — text, doc_id, index, metadata, id; .token_count property.
`estimate_tokens(text)`	~4 chars/token approximation, no tokenizer dependency.

core.registry — registry.py

Name	Description
`@register(kind, name)`	Decorator to register a class under a (kind, name) pair.
`get(kind, name)`	Look up a registered class.
`available(kind)`	List all registered names for a kind.
`registered_info()`	Full registry dump across all kinds.

python

from ragforge.core.registry import register, get, available

@register("chunker", "fixed")
class FixedChunker(...):
    ...

chunker_cls = get("chunker", "fixed")
print(available("chunker"))  # ['fixed', 'structure', ...]

ragforge.parsing

Name	Description
`parse_file(path, parser=None)`	Auto-detect by extension and parse; optional override.
`TextParser`	Registered as 'text' — handles .txt and .md.
`HtmlParser`	Registered as 'html' — strips HTML tags.
`PdfParser`	Registered as 'pdf' — requires [pdf] extra (pypdf).
`DoclingParser`	Registered as 'docling' — for complex layouts. Requires [docling].

ragforge.chunking

Name	Description
`chunk_document(doc, strategy, **kw)`	Dispatch to strategy; returns list[Chunk].
`FixedChunker`	'fixed' — sliding window by token count (chunk_tokens, overlap).
`StructureChunker`	'structure' — split on Markdown headings, respecting max_tokens.
`DoclingChunker`	'docling' — Docling-aware; keeps tables and code blocks intact.

ragforge.pipeline

KnowledgeBase — knowledge.py

Method	Description
`KnowledgeBase.build(...)`	classmethod — parse, chunk, embed, persist a new KB.
`KnowledgeBase.load(name)`	classmethod — load a previously built KB.
`kb.query(question, top_k, mode, rerank)`	list[(Chunk, score)] — dense/BM25/hybrid retrieval.

Module-level functions

Function	Description
`build_knowledge_base(name, sources, ...)`	Convenience wrapper around KnowledgeBase.build().
`query_knowledge_base(knowledge, question, ...)`	Wrapper around KnowledgeBase.load().query() + optional generation.

Sub-modules

Module	Description
`pipeline.embeddings`	Embedder ABC; DefaultEmbedder (hash-based, zero deps); SentenceTransformerEmbedder; OpenAIEmbedder.
`pipeline.store`	InMemoryStore — vector store with .add(), .search(), .save(), .load().
`pipeline.bm25`	BM25Index — keyword index with RRF fusion.
`pipeline.retriever`	Retriever — dense/BM25/hybrid with optional cross-encoder reranking.
`pipeline.generation`	LLMProvider ABC; OpenAI/Anthropic/Ollama; grounded answers with refusal.

ragforge.evaluation

Name	Description
`GoldenItem`	Dataclass: question, expected_answer, relevant_chunk_ids, relevant_sources, notes.
`GoldenDataset`	.load(path) (JSON or CSV), .save(path).
`Evaluator(kb).run(golden, metrics, ...)`	Run evaluation → EvalReport.
`Evaluator.compare(a, b, golden, ...)`	A/B comparison with delta table.
`generate_golden_draft(...)`	LLM-bootstrapped draft golden dataset (review required).
`RETRIEVAL_METRICS`	['hit_rate', 'precision_at_k', 'recall_at_k', 'mrr'].

ragforge.quantization

See quantizer.py.

Name	Description
`quantize_and_compare(target, knowledge, options)`	Quantize an embedding model and return a before/after CostQualityReport.

ragforge.migration

Name	Description
`migrate_knowledge_base(knowledge, from_model, to_model, validate, options)`	Full shadow-index migration.

ragforge.coordination

Blackboard

Name	Description
`BlackboardEntry`	key, value, author, timestamp, tags, version.
`Blackboard(name)`	SQLite-backed (WAL mode), thread-safe, persistent.
`InMemoryBlackboard(name)`	In-memory variant with the same API.
`board.write/read/read_all/read_by_tag`	Read/write entries with optional tag filters.

Agent + Orchestrator

Name	Description
`Agent(id, trigger_fn, action_fn)`	Agent with trigger condition and action.
`Orchestrator(board, agents, goal, max_steps)`	Loop until goal met or quiescence.
`orchestrator.run()`	Returns OrchestrationResult (steps, tokens, cost, reason).
`run_benchmark(task)`	Blackboard vs direct-messaging cost comparison.

ragforge.tracing

Name	Description
`Tracer()`	Pipeline tracer backed by SQLite at ~/.ragforge/traces.db.
`tracer.trace(query)`	Context manager — records start/end, persists on exit.
`t.step(name, **data)`	Record a named pipeline step with arbitrary data.
`TraceStore.list_traces / get_trace`	Query the trace store.

← Previous

HTTP API

Evaluation