RedHop: A Simpler LlamaIndex Alternative for Document RAG
If you’re searching for a LlamaIndex alternative, you’re probably hitting one of these walls:
- The framework assumes you need a vector store. Even for one PDF,
LlamaIndex’s default path is
VectorStoreIndex— embed every chunk, store the vectors, query via an embedding model. Most document QA doesn’t need that. - The mental model is its own thing to learn. Indexes, node parsers, query engines, response synthesizers, retrievers, post-processors. To answer one question about a contract.
- It’s Python-first. TypeScript port exists but trails behind; nothing for Rust services.
- No visibility into the decision. When the wrong chunk surfaces, you instrument LlamaIndex yourself.
RedHop is a focused alternative: an in-process retrieval + context library that does one thing — turn a document and a question into the right LLM prompt context — and tells you exactly what it kept, dropped, and why.
import redhop
doc = redhop.Document.from_file("contract.pdf")ctx = doc.context("What is the governing law?")answer = llm.generate(ctx.text())
print(ctx.report) # what was kept, dropped, and whyThat’s the whole surface. Three calls. No vector store. No query engine. Python, Node, and Rust over a Rust core — all in-process.
Should you switch from LlamaIndex to RedHop?
Section titled “Should you switch from LlamaIndex to RedHop?”The honest answer: it depends on what you’re building.
| If you need… | Pick |
|---|---|
| Document QA with citations and a Decision Report | RedHop |
| In-process retrieval, no vector store, no infra | RedHop |
| The same API in Python, Node, and Rust | RedHop |
| Composable indices (Tree, KeywordTable, mixed) | LlamaIndex |
| Specialized query engines (sub-question, multi-step, citation) | LlamaIndex |
| Hosted / managed RAG with a dashboard | LlamaCloud (LlamaIndex’s offering) |
| Best-in-class legalese / contract parsing | LlamaIndex (measured edge — see below) |
| LlamaHub ecosystem of loaders / tools / readers | LlamaIndex |
LlamaIndex is a framework purpose-built for RAG. RedHop is a library that does the one bounded step — here’s the file, here’s the question, give me the right context with a decision report. If you need LlamaIndex’s composition layer, stay there. If you just need the three-call shape with observability, RedHop is simpler.
The same question, two ways
Section titled “The same question, two ways”Same contract.pdf. Same question. RedHop on the left tab, LlamaIndex on
the right.
import redhopfrom openai import OpenAI
query = "What is the governing law?"
ctx = redhop.Document.from_file("contract.pdf").context(query)# parsed, chunked, retrieved, and token-budgeted internally
response = OpenAI().chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": f"{ctx.text()}\n\nQuestion: {query}"}],)print(response.choices[0].message.content)What you stand up: nothing. Point it at the file and ask; parsing, chunking, retrieval, and token-budgeting happen inside — and every call returns a Decision Report explaining what it kept and why.
from llama_index.core import VectorStoreIndex, Settingsfrom llama_index.core.node_parser import SentenceSplitterfrom llama_index.readers.file import PyMuPDFReaderfrom llama_index.embeddings.openai import OpenAIEmbeddingfrom llama_index.llms.openai import OpenAI
query = "What is the governing law?"
Settings.embed_model = OpenAIEmbedding()Settings.llm = OpenAI(model="gpt-4o-mini")
docs = PyMuPDFReader().load(file_path="contract.pdf")
index = VectorStoreIndex.from_documents( docs, transformations=[SentenceSplitter(chunk_size=512, chunk_overlap=50)],)
engine = index.as_query_engine(similarity_top_k=4)print(engine.query(query))What you stand up: a node parser, an embedding model, a vector index, and a query engine. Cleaner than LangChain, but still an embed-and-index pipeline you own and pay for.
LlamaIndex is cleaner than LangChain — node parser, embedding model, vector index, query engine is a more linear mental model than loader, splitter, embedder, vector store, retriever, prompt, chain. But it’s still an embed-and-index pipeline. RedHop has one concept: document → context. Everything else is an implementation detail.
The full head-to-head benchmark (evidence retention + downstream answer quality on CUAD contracts and HotpotQA multi-hop) is on the Comparison page — same documents, same BM25 retriever for fairness, same token budget.
What LlamaIndex gives you that RedHop doesn’t
Section titled “What LlamaIndex gives you that RedHop doesn’t”Be clear about this. LlamaIndex has things RedHop doesn’t even try to be:
- Composable indices.
TreeIndexfor hierarchical summarization,KeywordTableIndexfor keyword routing,VectorStoreIndexfor dense retrieval,KnowledgeGraphIndexfor graph-shaped corpora — and you can compose them. RedHop is one path: chunk → BM25 (or hybrid) → assemble. - Specialized query engines.
SubQuestionQueryEnginebreaks a complex question into sub-questions;MultiStepQueryEngineruns iterative retrieval;CitationQueryEngineenforces source citations in the response. RedHop returns context + citations; you compose any retrieval-shape logic outside. - LlamaCloud. Hosted, managed RAG with a dashboard and a billing page. RedHop is OSS only, in-process; you run it.
- LlamaHub. A library of loaders (Notion, Slack, Confluence, S3, Postgres, …), tools, and prompt templates. RedHop has built-in parsers for PDF / DOCX / PPTX / XLSX / Markdown / code, and that’s it.
- Better on legalese contracts (measured). Our own benchmark on CUAD shows LlamaIndex edging RedHop on contract extraction (its node parsing seems to suit legal text). Read the benchmark. If your workload is heavy contract analysis specifically, LlamaIndex’s edge is real.
If you need any of the above, stay on LlamaIndex.
What RedHop gives you that LlamaIndex doesn’t
Section titled “What RedHop gives you that LlamaIndex doesn’t”1. A Decision Report on every call
Section titled “1. A Decision Report on every call”Every doc.context(query) returns a ctx.report describing exactly what
happened — what was kept, what was dropped, whether the engine intervened,
why it chose what it chose.
RedHop Decision Report======================
Decision: Auto → passthrough (small context, no intervention needed)
Why: - 1,240 tokens — below the dilution gate (1,500 tokens) - pruning a small clean context risks dropping reasoning evidence Result: - kept all 8 retrieved chunks - evidence retained 100%, second-hop links preservedLlamaIndex returns a Response with source_nodes, but no structured
report explaining the retrieval and assembly decision. With RedHop, the
report is structured data on every call — auto_decision,
total_tokens, n_input_chunks, n_selected, retained_evidence_ratio,
second_hop_rescue_count. You can also run doc.analyze(query) to get
the same diagnostics without assembling a context.
2. No vector store required
Section titled “2. No vector store required”LlamaIndex’s default index assumes vectors — VectorStoreIndex.from_documents()
embeds every chunk on construction. Even for one PDF, you’re paying the
embed cost upfront and standing up a vector store.
RedHop’s default tier is BM25. Zero model download, zero embedding cost, sub-100ms warm queries. Most document QA — code, API references, runbooks, financial reports, handbooks — works on lexical alone, because the words in the question are usually the words in the answer.
If you need semantic retrieval, opt into retrieval="hybrid" with a
small embedding model (bge-small, ~80MB, auto-downloaded). Even then,
retrieval is exact cosine over your in-memory chunks — no ANN index,
no vector store, no embedded service.
3. Three calls cover the surface
Section titled “3. Three calls cover the surface”Load. Ask. Read. That’s the API.
doc = redhop.Document.from_file("contract.pdf") # load (or .from_folder, .from_text, .from_bytes)ctx = doc.context("What is the governing law?") # askprint(ctx.text()) # the prompt for your LLMfor c in ctx.citations: ... # source / page / heading / line per chunkprint(ctx.report) # the decisionCompare to LlamaIndex’s load → parse → index → engine → query → response shape. Each piece is its own concept with its own config.
4. The same API in Python, Node, and Rust
Section titled “4. The same API in Python, Node, and Rust”LlamaIndex is Python primarily, with a TypeScript port (llamaindex-ts)
that trails behind; nothing for Rust. RedHop ships the same surface in
Python, Node, and Rust over a single Rust core. Prototype in Python,
ship the same API in your Rust service or Electron app.
5. In-process, no SaaS, no network calls
Section titled “5. In-process, no SaaS, no network calls”RedHop runs in your process. No service to call, no hosted endpoint, no API key. The optional embedding model is downloaded once and runs locally via ONNX. Your documents never leave the box. For finance / legal / health teams with data residency requirements, this is the shape of the answer.
Migrating from LlamaIndex to RedHop
Section titled “Migrating from LlamaIndex to RedHop”If you’ve got an existing LlamaIndex pipeline doing document QA, here’s the equivalent in RedHop.
Loading + indexing
Section titled “Loading + indexing”LlamaIndex:
from llama_index.core import VectorStoreIndex, Settingsfrom llama_index.core.node_parser import SentenceSplitterfrom llama_index.readers.file import PyMuPDFReaderfrom llama_index.embeddings.openai import OpenAIEmbedding
Settings.embed_model = OpenAIEmbedding()docs = PyMuPDFReader().load(file_path="contract.pdf")index = VectorStoreIndex.from_documents( docs, transformations=[SentenceSplitter(chunk_size=512, chunk_overlap=50)],)RedHop:
import redhopdoc = redhop.Document.from_file("contract.pdf")That’s it. PDF parsing, chunking, indexing — all behind the API. No
embedding call (default tier is BM25). For semantic retrieval add
retrieval="hybrid", model="bge-small" to the constructor.
Querying
Section titled “Querying”LlamaIndex:
engine = index.as_query_engine(similarity_top_k=4)response = engine.query("What is the governing law?")print(response.response)RedHop (LLM-agnostic — bring your own):
ctx = doc.context("What is the governing law?")answer = OpenAI().responses.create( model="gpt-4o-mini", input=f"{ctx.text()}\n\nQuestion: What is the governing law?",).output_textLlamaIndex bundles the LLM call in its query engine. RedHop hands you the prompt string and lets you call any provider — no lock-in to a wrapper.
Citations / source nodes
Section titled “Citations / source nodes”LlamaIndex:
for node in response.source_nodes: print(node.metadata, node.text)RedHop:
for c in ctx.citations: print(c["source"], c["page"], c["heading"], c["line"])Same shape, simpler keys. source plus whichever of page / heading /
line the format provides — no separate metadata layer.
Folder of files
Section titled “Folder of files”LlamaIndex:
from llama_index.core import SimpleDirectoryReaderdocs = SimpleDirectoryReader("./docs").load_data()index = VectorStoreIndex.from_documents(docs)RedHop:
doc = redhop.Document.from_folder("./docs", persist=True)from_folder honors .gitignore, accepts custom ignore patterns, and
optionally writes an incremental on-disk index — reload is O(changed
files), not O(all files).
Multi-step / sub-question queries
Section titled “Multi-step / sub-question queries”LlamaIndex has dedicated query engines for these (SubQuestionQueryEngine,
MultiStepQueryEngine). RedHop doesn’t — it returns context for one
question per call. If your workload is genuinely
sub-question / multi-step, LlamaIndex is the better tool and you can
still call RedHop for the per-step retrieval inside it.
Pick the right tool
Section titled “Pick the right tool”| Workload | RedHop | LlamaIndex |
|---|---|---|
| Document QA with one or many files | ✅ shorter, observable | ✅ flexible |
| Composable indices / multi-step retrieval | ❌ out of scope | ✅ flagship |
| Need to plug in a specific LLM provider | ✅ (any — you call it) | ✅ (built-in integration) |
| Hosted / managed RAG with a dashboard | ❌ | ✅ LlamaCloud |
| Visibility into retrieval decisions | ✅ Decision Report | ❌ DIY observability |
| Best on legalese contracts specifically | — | ✅ (measured edge) |
| In-process, no vector store, no infra | ✅ | ❌ |
| Same API in Python / Node / Rust | ✅ | ❌ Python + partial TS |
| Apache-2.0, no commercial gating | ✅ | ✅ (with LlamaCloud as a paid layer) |
If your workload sits firmly in document QA and you’ve been wondering why LlamaIndex’s index → query engine → response shape feels heavy for what you’re doing — RedHop is the alternative you’re looking for. If you’re doing composable retrieval, sub-question decomposition, or you specifically need contract extraction at LlamaIndex’s quality, stay on LlamaIndex.
Get started
Section titled “Get started”pip install redhop # Pythoncargo add redhop --features files,semantic # Rustnpm install redhop # Node.js -- on npm- Quickstart — the three-call surface
- Choosing a configuration — when to use which retrieval tier
- The full benchmark vs LangChain & LlamaIndex — same datasets, same retriever, head-to-head
- Other alternatives — per-framework deep-dives (LangChain, Haystack)
- llms.txt — single-file context for AI coding agents
Open source under Apache-2.0. Bug reports and use-case feedback welcome at github.com/vysakh0/redhop.