Build AI that reads your documents — and answers honestly.
Parsing, chunking, retrieval, grounded answers, and evaluation — all in one place. Usable from any programming language. Runs on your own machine. Free and open source.

Building RAG is messy
You have a pile of documents. You want AI to answer questions from them — accurately. Doing it yourself means gluing together six fragile tools.
Documents get mangled
Tables and code get chopped in half, so the AI reads broken pieces and gives wrong answers.
You can't trust the answers
The AI sometimes makes things up, with no sources you can check.
AI agents burn money
When several agents talk to each other, every message costs — and it adds up fast.
The broken pipeline most people end up with
Skip any one of these stages and the next one inherits the mess.
One workshop for everything RAG
Watch your question travel through the forge — each stage hands clean data to the next.
Everything is exposed over a simple HTTP API, so you can use it from any language — not just Python.
Don't migrate blind
When a better embedding model comes out, should you even switch? RAGForge answers that before you spend a fortune re-embedding. Often a model that wins on public benchmarks loses on your domain.
Freeze your real queries as a golden set, then score the new embedding model against your current one on recall@k, precision@k, and MRR — on YOUR corpus.
The migration is blocked automatically if the new model regresses. It only proceeds when the new model wins (or ties within a margin you set) on your real queries — so you never waste a full re-embed on a worse model.
Re-embed only the chunks your queries actually hit first, then backfill the cold tail — instead of re-embedding everything up front.
After cutover, a smoke test replays golden queries against the new index to confirm the migration actually worked — not just that a command returned OK.
Real run on SciFact (5,183 docs, 300 labelled queries). We compared all-MiniLM-L6-v2 against a smaller candidate (paraphrase-MiniLM-L3-v2). The candidate regressed on every metric — recall@5 fell 16 points — so RAGForge's gate returned NO_GO and blocked the migration before any full re-embed. That's the point: the gate turns "is this model actually better on our data?" into a measured, automatic decision.
Nine building blocks
Use what you need. Ignore the rest.
Parsing
Read PDFs, Word, HTML, and more.
Chunking
Split smartly; keep tables and code intact.
Retrieval
Hybrid search (dense + BM25) with reranking.
Answers
Grounded responses that cite sources and refuse to guess.
Evaluation
Score retrieval and answer quality; A/B compare setups.
Quantization
Shrink embeddings to cut storage and cost.
Migration
Swap embedding models safely, with quality validation.
Multi-Agent
Coordinate agents through shared state, not expensive direct messaging.
Dashboard
Local UI to trace pipelines, run evaluations, and chat with your KB.
How agents stop wasting your money
Instead of agents repeating each other through chat, they read and write to one shared board.
No secret algorithms
Just everything you need, made simple.
All-in-one
One install instead of six libraries to stitch together.
Any language
It's a simple HTTP API — call it from Python, JavaScript, Go, anything.
Local & private
Runs on your own machine; your documents never have to leave.
Free & open source
Apache-2.0. Use it, change it, build on top of it.
Plain HTTP. Any client.
It's just a JSON API. Call it from anywhere.
import requests
r = requests.post("http://localhost:8000/query", json={
"knowledge": "my-kb",
"question": "How do refunds work?",
"top_k": 5,
"generate": True,
})
print(r.json()["answer"])