Skip to content

RedHop: A Simpler LlamaIndex Alternative for Document RAG

If you’re searching for a LlamaIndex alternative, you’re probably hitting one of these walls:

  • The framework assumes you need a vector store. Even for one PDF, LlamaIndex’s default path is VectorStoreIndex — embed every chunk, store the vectors, query via an embedding model. Most document QA doesn’t need that.
  • The mental model is its own thing to learn. Indexes, node parsers, query engines, response synthesizers, retrievers, post-processors. To answer one question about a contract.
  • It’s Python-first. TypeScript port exists but trails behind; nothing for Rust services.
  • No visibility into the decision. When the wrong chunk surfaces, you instrument LlamaIndex yourself.

RedHop is a focused alternative: an in-process retrieval + context library that does one thing — turn a document and a question into the right LLM prompt context — and tells you exactly what it kept, dropped, and why.

import redhop
doc = redhop.Document.from_file("contract.pdf")
ctx = doc.context("What is the governing law?")
answer = llm.generate(ctx.text())
print(ctx.report) # what was kept, dropped, and why

That’s the whole surface. Three calls. No vector store. No query engine. Python, Node, and Rust over a Rust core — all in-process.


Should you switch from LlamaIndex to RedHop?

Section titled “Should you switch from LlamaIndex to RedHop?”

The honest answer: it depends on what you’re building.

If you need…Pick
Document QA with citations and a Decision ReportRedHop
In-process retrieval, no vector store, no infraRedHop
The same API in Python, Node, and RustRedHop
Composable indices (Tree, KeywordTable, mixed)LlamaIndex
Specialized query engines (sub-question, multi-step, citation)LlamaIndex
Hosted / managed RAG with a dashboardLlamaCloud (LlamaIndex’s offering)
Best-in-class legalese / contract parsingLlamaIndex (measured edge — see below)
LlamaHub ecosystem of loaders / tools / readersLlamaIndex

LlamaIndex is a framework purpose-built for RAG. RedHop is a library that does the one bounded stephere’s the file, here’s the question, give me the right context with a decision report. If you need LlamaIndex’s composition layer, stay there. If you just need the three-call shape with observability, RedHop is simpler.


Same contract.pdf. Same question. RedHop on the left tab, LlamaIndex on the right.

import redhop
from openai import OpenAI
query = "What is the governing law?"
ctx = redhop.Document.from_file("contract.pdf").context(query)
# parsed, chunked, retrieved, and token-budgeted internally
response = OpenAI().chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"{ctx.text()}\n\nQuestion: {query}"}],
)
print(response.choices[0].message.content)

What you stand up: nothing. Point it at the file and ask; parsing, chunking, retrieval, and token-budgeting happen inside — and every call returns a Decision Report explaining what it kept and why.

LlamaIndex is cleaner than LangChain — node parser, embedding model, vector index, query engine is a more linear mental model than loader, splitter, embedder, vector store, retriever, prompt, chain. But it’s still an embed-and-index pipeline. RedHop has one concept: document → context. Everything else is an implementation detail.

The full head-to-head benchmark (evidence retention + downstream answer quality on CUAD contracts and HotpotQA multi-hop) is on the Comparison page — same documents, same BM25 retriever for fairness, same token budget.


What LlamaIndex gives you that RedHop doesn’t

Section titled “What LlamaIndex gives you that RedHop doesn’t”

Be clear about this. LlamaIndex has things RedHop doesn’t even try to be:

  • Composable indices. TreeIndex for hierarchical summarization, KeywordTableIndex for keyword routing, VectorStoreIndex for dense retrieval, KnowledgeGraphIndex for graph-shaped corpora — and you can compose them. RedHop is one path: chunk → BM25 (or hybrid) → assemble.
  • Specialized query engines. SubQuestionQueryEngine breaks a complex question into sub-questions; MultiStepQueryEngine runs iterative retrieval; CitationQueryEngine enforces source citations in the response. RedHop returns context + citations; you compose any retrieval-shape logic outside.
  • LlamaCloud. Hosted, managed RAG with a dashboard and a billing page. RedHop is OSS only, in-process; you run it.
  • LlamaHub. A library of loaders (Notion, Slack, Confluence, S3, Postgres, …), tools, and prompt templates. RedHop has built-in parsers for PDF / DOCX / PPTX / XLSX / Markdown / code, and that’s it.
  • Better on legalese contracts (measured). Our own benchmark on CUAD shows LlamaIndex edging RedHop on contract extraction (its node parsing seems to suit legal text). Read the benchmark. If your workload is heavy contract analysis specifically, LlamaIndex’s edge is real.

If you need any of the above, stay on LlamaIndex.


What RedHop gives you that LlamaIndex doesn’t

Section titled “What RedHop gives you that LlamaIndex doesn’t”

Every doc.context(query) returns a ctx.report describing exactly what happened — what was kept, what was dropped, whether the engine intervened, why it chose what it chose.

RedHop Decision Report
======================
Decision: Auto → passthrough (small context, no intervention needed)
Why:
- 1,240 tokens — below the dilution gate (1,500 tokens)
- pruning a small clean context risks dropping reasoning evidence
Result:
- kept all 8 retrieved chunks
- evidence retained 100%, second-hop links preserved

LlamaIndex returns a Response with source_nodes, but no structured report explaining the retrieval and assembly decision. With RedHop, the report is structured data on every call — auto_decision, total_tokens, n_input_chunks, n_selected, retained_evidence_ratio, second_hop_rescue_count. You can also run doc.analyze(query) to get the same diagnostics without assembling a context.

LlamaIndex’s default index assumes vectors — VectorStoreIndex.from_documents() embeds every chunk on construction. Even for one PDF, you’re paying the embed cost upfront and standing up a vector store.

RedHop’s default tier is BM25. Zero model download, zero embedding cost, sub-100ms warm queries. Most document QA — code, API references, runbooks, financial reports, handbooks — works on lexical alone, because the words in the question are usually the words in the answer.

If you need semantic retrieval, opt into retrieval="hybrid" with a small embedding model (bge-small, ~80MB, auto-downloaded). Even then, retrieval is exact cosine over your in-memory chunks — no ANN index, no vector store, no embedded service.

Load. Ask. Read. That’s the API.

doc = redhop.Document.from_file("contract.pdf") # load (or .from_folder, .from_text, .from_bytes)
ctx = doc.context("What is the governing law?") # ask
print(ctx.text()) # the prompt for your LLM
for c in ctx.citations: ... # source / page / heading / line per chunk
print(ctx.report) # the decision

Compare to LlamaIndex’s load → parse → index → engine → query → response shape. Each piece is its own concept with its own config.

LlamaIndex is Python primarily, with a TypeScript port (llamaindex-ts) that trails behind; nothing for Rust. RedHop ships the same surface in Python, Node, and Rust over a single Rust core. Prototype in Python, ship the same API in your Rust service or Electron app.

RedHop runs in your process. No service to call, no hosted endpoint, no API key. The optional embedding model is downloaded once and runs locally via ONNX. Your documents never leave the box. For finance / legal / health teams with data residency requirements, this is the shape of the answer.


If you’ve got an existing LlamaIndex pipeline doing document QA, here’s the equivalent in RedHop.

LlamaIndex:

from llama_index.core import VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.readers.file import PyMuPDFReader
from llama_index.embeddings.openai import OpenAIEmbedding
Settings.embed_model = OpenAIEmbedding()
docs = PyMuPDFReader().load(file_path="contract.pdf")
index = VectorStoreIndex.from_documents(
docs,
transformations=[SentenceSplitter(chunk_size=512, chunk_overlap=50)],
)

RedHop:

import redhop
doc = redhop.Document.from_file("contract.pdf")

That’s it. PDF parsing, chunking, indexing — all behind the API. No embedding call (default tier is BM25). For semantic retrieval add retrieval="hybrid", model="bge-small" to the constructor.

LlamaIndex:

engine = index.as_query_engine(similarity_top_k=4)
response = engine.query("What is the governing law?")
print(response.response)

RedHop (LLM-agnostic — bring your own):

ctx = doc.context("What is the governing law?")
answer = OpenAI().responses.create(
model="gpt-4o-mini",
input=f"{ctx.text()}\n\nQuestion: What is the governing law?",
).output_text

LlamaIndex bundles the LLM call in its query engine. RedHop hands you the prompt string and lets you call any provider — no lock-in to a wrapper.

LlamaIndex:

for node in response.source_nodes:
print(node.metadata, node.text)

RedHop:

for c in ctx.citations:
print(c["source"], c["page"], c["heading"], c["line"])

Same shape, simpler keys. source plus whichever of page / heading / line the format provides — no separate metadata layer.

LlamaIndex:

from llama_index.core import SimpleDirectoryReader
docs = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(docs)

RedHop:

doc = redhop.Document.from_folder("./docs", persist=True)

from_folder honors .gitignore, accepts custom ignore patterns, and optionally writes an incremental on-disk index — reload is O(changed files), not O(all files).

LlamaIndex has dedicated query engines for these (SubQuestionQueryEngine, MultiStepQueryEngine). RedHop doesn’t — it returns context for one question per call. If your workload is genuinely sub-question / multi-step, LlamaIndex is the better tool and you can still call RedHop for the per-step retrieval inside it.


WorkloadRedHopLlamaIndex
Document QA with one or many files✅ shorter, observable✅ flexible
Composable indices / multi-step retrieval❌ out of scope✅ flagship
Need to plug in a specific LLM provider✅ (any — you call it)✅ (built-in integration)
Hosted / managed RAG with a dashboard✅ LlamaCloud
Visibility into retrieval decisions✅ Decision Report❌ DIY observability
Best on legalese contracts specifically✅ (measured edge)
In-process, no vector store, no infra
Same API in Python / Node / Rust❌ Python + partial TS
Apache-2.0, no commercial gating✅ (with LlamaCloud as a paid layer)

If your workload sits firmly in document QA and you’ve been wondering why LlamaIndex’s index → query engine → response shape feels heavy for what you’re doing — RedHop is the alternative you’re looking for. If you’re doing composable retrieval, sub-question decomposition, or you specifically need contract extraction at LlamaIndex’s quality, stay on LlamaIndex.


Terminal window
pip install redhop # Python
cargo add redhop --features files,semantic # Rust
npm install redhop # Node.js -- on npm

Open source under Apache-2.0. Bug reports and use-case feedback welcome at github.com/vysakh0/redhop.