Architecture
How the modules connect, the plugin registry, and where data is stored. Sourced from README.md and core/registry.py.
Module layout
text
ragforge/
├── core/ # Shared models (Document, Chunk) + plugin registry
├── parsing/ # File → Document (txt, md, html, pdf, docling)
├── chunking/ # Document → Chunks (fixed, structure, docling)
├── pipeline/ # Embed + store + retrieve + generate
│ ├── embeddings.py
│ ├── store.py
│ ├── bm25.py
│ ├── retriever.py
│ ├── generation.py
│ └── knowledge.py
├── evaluation/ # Measure retrieval + generation quality
├── quantization/ # Compress embeddings + measure tradeoff
├── migration/ # Swap embedding models safely
├── coordination/ # Multi-agent blackboard
├── tracing.py # SQLite-backed trace store
├── api/ # FastAPI HTTP endpoints
│ └── routes/
├── cli.py # Command-line interface
└── ui_static/ # Pre-built dashboard assetsPlugin registry
Every capability registers itself via @register(kind, name). The CLI and API look components up by name with get(kind, name) — they never import implementations directly. Adding a new parser, chunker, or metric is one new file. Current registry kinds: "parser", "chunker", "embedder", "metric".
python
from ragforge.core.registry import register, get, available
@register("chunker", "fixed")
class FixedChunker(...):
...
chunker_cls = get("chunker", "fixed")
print(available("chunker")) # ['fixed', 'structure', ...]Request flow — ingest
text
CLI: ragforge knowledge build my-kb ./docs/
↓
pipeline.build_knowledge_base()
↓
for each source file:
parsing.parse_file() → Document
chunking.chunk_document() → list[Chunk]
↓
pipeline.embeddings.encode(chunks) → vectors
↓
pipeline.store.InMemoryStore.add(chunks, vectors)
pipeline.bm25.BM25Index.add(chunks)
↓
persist to ~/.ragforge/knowledge_bases/<name>/
vectors.json, bm25_index.json, meta.jsonRequest flow — query
text
CLI/API: ragforge query my-kb "How do refunds work?"
↓
pipeline.query_knowledge_base()
↓
KnowledgeBase.load("my-kb") ← reads from ~/.ragforge/
↓
kb.query(question, mode="hybrid")
↓ dense: embedder.encode_single(question) → store.search()
↓ bm25: bm25_index.search(question)
↓ hybrid: RRF fusion of both ranked lists
↓
[optional] cross-encoder reranking
↓
[optional] generation.LLMProvider.generate(prompt, context_chunks)
↓
tracing.Tracer records each step → ~/.ragforge/traces.db
↓
return { chunks, answer }Data storage
| Data | Path |
|---|---|
| Knowledge bases | ~/.ragforge/knowledge_bases/<name>/ |
| Vector store | vectors.json |
| BM25 index | bm25_index.json |
| KB metadata | meta.json |
| Migration backup | vectors_backup.json |
| Traces | ~/.ragforge/traces.db |
| Blackboards | ~/.ragforge/<board>.db |
Environment & config
- No dedicated
.envtemplate; LLM providers use their own SDK env vars (OPENAI_API_KEY,ANTHROPIC_API_KEY). - Ollama defaults to
http://localhost:11434. - Server defaults:
--host 0.0.0.0 --port 8000.