Open-source RAG toolkit

Build AI that reads your documents — and answers honestly.

Parsing, chunking, retrieval, grounded answers, and evaluation — all in one place. Usable from any programming language. Runs on your own machine. Free and open source.

View on GitHub

Apache-2.0 Runs locally Any language via HTTP

RAGForge mascot — a robot blacksmith forging a glowing AI cube on an anvil

The problem

Building RAG is messy

You have a pile of documents. You want AI to answer questions from them — accurately. Doing it yourself means gluing together six fragile tools.

Documents get mangled

Tables and code get chopped in half, so the AI reads broken pieces and gives wrong answers.

You can't trust the answers

The AI sometimes makes things up, with no sources you can check.

AI agents burn money

When several agents talk to each other, every message costs — and it adds up fast.

What goes wrong

The broken pipeline most people end up with

Skip any one of these stages and the next one inherits the mess.

Bad parse

Tables broken

Wrong chunks

Context lost

Made-up answer

No sources

$$$ wasted

Tokens burned

Bad parse

Tables broken

Wrong chunks

Context lost

Made-up answer

No sources

$$$ wasted

Tokens burned

How it works

One workshop for everything RAG

Watch your question travel through the forge — each stage hands clean data to the next.

Your documents

PDF · Word · HTML

Parse

Read files

Chunk

Split smart

Embed

Store vectors

Hybrid + rerank

Answer

With sources

Parse

Read files

Chunk

Split smart

Embed

Store vectors

Hybrid + rerank

Answer

With sources

Grounded answer

with cited sources

Everything is exposed over a simple HTTP API, so you can use it from any language — not just Python.

Featured

Don't migrate blind

When a better embedding model comes out, should you even switch? RAGForge answers that before you spend a fortune re-embedding. Often a model that wins on public benchmarks loses on your domain.

Test on your corpus

Freeze your real queries as a golden set, then score the new embedding model against your current one on recall@k, precision@k, and MRR — on YOUR corpus.

Gate the cutover

The migration is blocked automatically if the new model regresses. It only proceeds when the new model wins (or ties within a margin you set) on your real queries — so you never waste a full re-embed on a worse model.

Migrate the hot set first

Re-embed only the chunks your queries actually hit first, then backfill the cold tail — instead of re-embedding everything up front.

How the safe migration flows

Your queries

Real logs

Golden set

Frozen truth

Compare

Old vs new

Hot-set first

Cheap cutover

Your queries

Real logs

Golden set

Frozen truth

Compare

Old vs new

Hot-set first

Cheap cutover

After cutover, a smoke test replays golden queries against the new index to confirm the migration actually worked — not just that a command returned OK.

ragforge migrate gate my-kb golden.json --old default --new openai --metric recall_at_k

Real benchmark

NO_GO — migration blocked

all-MiniLM-L6-v2 (baseline) paraphrase-MiniLM-L3-v2 (candidate)

0.738

0.575

recall@5

0.783

0.649

recall@10

0.600

0.468

MRR

Real run on SciFact (5,183 docs, 300 labelled queries). We compared all-MiniLM-L6-v2 against a smaller candidate (paraphrase-MiniLM-L3-v2). The candidate regressed on every metric — recall@5 fell 16 points — so RAGForge's gate returned NO_GO and blocked the migration before any full re-embed. That's the point: the gate turns "is this model actually better on our data?" into a measured, automatic decision.

Everything inside

Nine building blocks

Use what you need. Ignore the rest.

Parsing

Read PDFs, Word, HTML, and more.

Chunking

Split smartly; keep tables and code intact.

Retrieval

Hybrid search (dense + BM25) with reranking.

Answers

Grounded responses that cite sources and refuse to guess.

Evaluation

Score retrieval and answer quality; A/B compare setups.

Quantization

Shrink embeddings to cut storage and cost.

Migration

Swap embedding models safely, with quality validation.

Multi-Agent

Coordinate agents through shared state, not expensive direct messaging.

Dashboard

Local UI to trace pipelines, run evaluations, and chat with your KB.

Multi-agent, plain English

How agents stop wasting your money

Instead of agents repeating each other through chat, they read and write to one shared board.

Agents

Share a board

Blackboard

Common state

Reuse work

No re-asking

Lower cost

Fewer tokens

Agents

Share a board

Blackboard

Common state

Reuse work

No re-asking

Lower cost

Fewer tokens

Why RAGForge

No secret algorithms

Just everything you need, made simple.

All-in-one

One install instead of six libraries to stitch together.

Any language

It's a simple HTTP API — call it from Python, JavaScript, Go, anything.

Local & private

Runs on your own machine; your documents never have to leave.

Free & open source

Apache-2.0. Use it, change it, build on top of it.

Any language

Plain HTTP. Any client.

It's just a JSON API. Call it from anywhere.

import requests

r = requests.post("http://localhost:8000/query", json={
    "knowledge": "my-kb",
    "question": "How do refunds work?",
    "top_k": 5,
    "generate": True,
})
print(r.json()["answer"])

Built. Working. Open source.

Try it. Break it. Tell us what's missing.

View on GitHub

Built by a developer who got tired of tool sprawl

RAGForge is built and maintained by Samsul Jahith — a developer working on open-source RAG tooling. I built it to bring the messy parts of RAG into one place. Feedback and contributions are welcome.

GitHub LinkedIn Email