Python modules
Public surface area of the Python library, sourced from ragforge/.
ragforge (top-level)
Re-exports for the most common workflow.
| Name | Kind | Description |
|---|---|---|
parse_file(path) | function | Parse any supported file into a Document. |
chunk_document(doc, strategy) | function | Split a Document into a list of Chunk. |
Document | class | Parsed document: text, source, doc_type, metadata, id. |
Chunk | class | One chunk: text, doc_id, index, metadata, id. |
available(kind) | function | List registered names for a plugin kind. |
ragforge.core
core.models — models.py
| Name | Description |
|---|---|
Document | @dataclass — text, source, doc_type, metadata, id; .token_count property. |
Chunk | @dataclass — text, doc_id, index, metadata, id; .token_count property. |
estimate_tokens(text) | ~4 chars/token approximation, no tokenizer dependency. |
core.registry — registry.py
| Name | Description |
|---|---|
@register(kind, name) | Decorator to register a class under a (kind, name) pair. |
get(kind, name) | Look up a registered class. |
available(kind) | List all registered names for a kind. |
registered_info() | Full registry dump across all kinds. |
python
from ragforge.core.registry import register, get, available
@register("chunker", "fixed")
class FixedChunker(...):
...
chunker_cls = get("chunker", "fixed")
print(available("chunker")) # ['fixed', 'structure', ...]ragforge.parsing
| Name | Description |
|---|---|
parse_file(path, parser=None) | Auto-detect by extension and parse; optional override. |
TextParser | Registered as 'text' — handles .txt and .md. |
HtmlParser | Registered as 'html' — strips HTML tags. |
PdfParser | Registered as 'pdf' — requires [pdf] extra (pypdf). |
DoclingParser | Registered as 'docling' — for complex layouts. Requires [docling]. |
ragforge.chunking
| Name | Description |
|---|---|
chunk_document(doc, strategy, **kw) | Dispatch to strategy; returns list[Chunk]. |
FixedChunker | 'fixed' — sliding window by token count (chunk_tokens, overlap). |
StructureChunker | 'structure' — split on Markdown headings, respecting max_tokens. |
DoclingChunker | 'docling' — Docling-aware; keeps tables and code blocks intact. |
ragforge.pipeline
KnowledgeBase — knowledge.py
| Method | Description |
|---|---|
KnowledgeBase.build(...) | classmethod — parse, chunk, embed, persist a new KB. |
KnowledgeBase.load(name) | classmethod — load a previously built KB. |
kb.query(question, top_k, mode, rerank) | list[(Chunk, score)] — dense/BM25/hybrid retrieval. |
Module-level functions
| Function | Description |
|---|---|
build_knowledge_base(name, sources, ...) | Convenience wrapper around KnowledgeBase.build(). |
query_knowledge_base(knowledge, question, ...) | Wrapper around KnowledgeBase.load().query() + optional generation. |
Sub-modules
| Module | Description |
|---|---|
pipeline.embeddings | Embedder ABC; DefaultEmbedder (hash-based, zero deps); SentenceTransformerEmbedder; OpenAIEmbedder. |
pipeline.store | InMemoryStore — vector store with .add(), .search(), .save(), .load(). |
pipeline.bm25 | BM25Index — keyword index with RRF fusion. |
pipeline.retriever | Retriever — dense/BM25/hybrid with optional cross-encoder reranking. |
pipeline.generation | LLMProvider ABC; OpenAI/Anthropic/Ollama; grounded answers with refusal. |
ragforge.evaluation
| Name | Description |
|---|---|
GoldenItem | Dataclass: question, expected_answer, relevant_chunk_ids, relevant_sources, notes. |
GoldenDataset | .load(path) (JSON or CSV), .save(path). |
Evaluator(kb).run(golden, metrics, ...) | Run evaluation → EvalReport. |
Evaluator.compare(a, b, golden, ...) | A/B comparison with delta table. |
generate_golden_draft(...) | LLM-bootstrapped draft golden dataset (review required). |
RETRIEVAL_METRICS | ['hit_rate', 'precision_at_k', 'recall_at_k', 'mrr']. |
ragforge.quantization
See quantizer.py.
| Name | Description |
|---|---|
quantize_and_compare(target, knowledge, options) | Quantize an embedding model and return a before/after CostQualityReport. |
ragforge.migration
| Name | Description |
|---|---|
migrate_knowledge_base(knowledge, from_model, to_model, validate, options) | Full shadow-index migration. |
ragforge.coordination
Blackboard
| Name | Description |
|---|---|
BlackboardEntry | key, value, author, timestamp, tags, version. |
Blackboard(name) | SQLite-backed (WAL mode), thread-safe, persistent. |
InMemoryBlackboard(name) | In-memory variant with the same API. |
board.write/read/read_all/read_by_tag | Read/write entries with optional tag filters. |
Agent + Orchestrator
| Name | Description |
|---|---|
Agent(id, trigger_fn, action_fn) | Agent with trigger condition and action. |
Orchestrator(board, agents, goal, max_steps) | Loop until goal met or quiescence. |
orchestrator.run() | Returns OrchestrationResult (steps, tokens, cost, reason). |
run_benchmark(task) | Blackboard vs direct-messaging cost comparison. |
ragforge.tracing
| Name | Description |
|---|---|
Tracer() | Pipeline tracer backed by SQLite at ~/.ragforge/traces.db. |
tracer.trace(query) | Context manager — records start/end, persists on exit. |
t.step(name, **data) | Record a named pipeline step with arbitrary data. |
TraceStore.list_traces / get_trace | Query the trace store. |