RedHop: A Simpler Haystack Alternative for Document RAG
If you’re searching for a Haystack alternative, you’re probably hitting one of these walls:
- The pipeline DAG is heavy for simple cases. Two pipelines (indexing +
query), components for each step, explicit socket wiring with
connect()— for one PDF and one question. - Document store assumed from day one. Even the in-memory store is its own object you manage. For prototyping document QA, you want the file in, the answer out — without standing up infrastructure first.
- Verbose for a small surface. Twenty-plus lines for a basic RAG path that mirrors what other libraries do in five. Production-grade, but heavy for the common case.
- Python only. No TypeScript or Rust story; if you’re shipping to a non-Python service, you’re rewriting from scratch.
- No visibility into the retrieval decision. The pipeline runs and returns a result; when the wrong chunk surfaces, you instrument it yourself.
RedHop is a focused alternative: an in-process retrieval + context library that does one thing — turn a document and a question into the right LLM prompt context — and tells you exactly what it kept, dropped, and why.
import redhop
doc = redhop.Document.from_file("contract.pdf")ctx = doc.context("What is the governing law?")answer = llm.generate(ctx.text())
print(ctx.report) # what was kept, dropped, and whyThat’s the whole surface. Three calls. No pipelines, no components, no document store. Python, Node, and Rust over a Rust core — all in-process.
Should you switch from Haystack to RedHop?
Section titled “Should you switch from Haystack to RedHop?”The honest answer: it depends on what you’re building.
| If you need… | Pick |
|---|---|
| Document QA with citations and a Decision Report | RedHop |
| In-process retrieval, no document store, no infra | RedHop |
| The same API in Python, Node, and Rust | RedHop |
| Multi-step pipelines with branching, loops, conditionals | Haystack |
| Component reuse and swappable pieces in production | Haystack |
Strong evaluation framework (haystack-experimental) | Haystack |
| deepset Cloud (hosted, managed) | Haystack (via deepset) |
| Mature production deployments at scale | Haystack |
Haystack is a production-grade pipeline framework built for composable NLP/RAG workflows. RedHop is a library that does the one bounded step — here’s the file, here’s the question, give me the right context with a decision report. If you need Haystack’s pipeline composition, stay there. If you just need the three-call shape with observability, RedHop is simpler.
The same question, two ways
Section titled “The same question, two ways”Same contract.pdf. Same question. RedHop on the left tab, Haystack on
the right.
import redhopfrom openai import OpenAI
query = "What is the governing law?"
ctx = redhop.Document.from_file("contract.pdf").context(query)# parsed, chunked, retrieved, and token-budgeted internally
response = OpenAI().chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": f"{ctx.text()}\n\nQuestion: {query}"}],)print(response.choices[0].message.content)What you stand up: nothing. Point it at the file and ask; parsing, chunking, retrieval, and token-budgeting happen inside — and every call returns a Decision Report explaining what it kept and why.
from haystack import Pipelinefrom haystack.components.converters import PyPDFToDocumentfrom haystack.components.preprocessors import DocumentSplitterfrom haystack.components.embedders import OpenAITextEmbedder, OpenAIDocumentEmbedderfrom haystack.components.retrievers.in_memory import InMemoryEmbeddingRetrieverfrom haystack.components.writers import DocumentWriterfrom haystack.document_stores.in_memory import InMemoryDocumentStorefrom haystack.components.builders import ChatPromptBuilderfrom haystack.components.generators.chat import OpenAIChatGeneratorfrom haystack.dataclasses import ChatMessage
query = "What is the governing law?"doc_store = InMemoryDocumentStore()
# Indexing pipelineindexing = Pipeline()indexing.add_component("converter", PyPDFToDocument())indexing.add_component("splitter", DocumentSplitter(split_by="word", split_length=200))indexing.add_component("embedder", OpenAIDocumentEmbedder())indexing.add_component("writer", DocumentWriter(document_store=doc_store))indexing.connect("converter", "splitter")indexing.connect("splitter", "embedder")indexing.connect("embedder", "writer")indexing.run({"converter": {"sources": ["contract.pdf"]}})
# Query pipelinetemplate = [ChatMessage.from_user( "Answer using only the context.\n\n" "{% for d in documents %}{{d.content}}\n{% endfor %}\n" "Question: {{query}}")]
querying = Pipeline()querying.add_component("embedder", OpenAITextEmbedder())querying.add_component("retriever", InMemoryEmbeddingRetriever(document_store=doc_store))querying.add_component("prompt", ChatPromptBuilder(template=template))querying.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini"))querying.connect("embedder.embedding", "retriever.query_embedding")querying.connect("retriever.documents", "prompt.documents")querying.connect("prompt.prompt", "llm.messages")result = querying.run({"embedder": {"text": query}, "prompt": {"query": query}})
print(result["llm"]["replies"][0].text)What you stand up: two pipelines (indexing + query), a converter, a splitter, two embedders (one for docs, one for the query), a document store, a retriever, a chat-prompt builder, and a generator — explicit DAG wiring, every socket connection written out. Production-friendly, but verbose for one PDF.
Haystack’s component model is well-engineered — every step is a discrete piece with named input/output sockets, which makes it easy to swap pieces in production. But for one PDF and one question, that machinery is overhead. RedHop has one concept: document → context. Everything else is an implementation detail.
The broader head-to-head benchmark on the Comparison page covers LangChain and LlamaIndex specifically — Haystack isn’t in those numbers yet, so the comparison above is structural (code-vs-code) rather than measured retention / answer-quality scores.
What Haystack gives you that RedHop doesn’t
Section titled “What Haystack gives you that RedHop doesn’t”Be clear about this. Haystack has things RedHop doesn’t even try to be:
- Composable pipelines with arbitrary branching. Multi-step retrieval, conditional routing, loop-based agentic flows — Haystack’s pipeline graph supports all of it. RedHop is one path: chunk → BM25 (or hybrid) → assemble.
- A large component ecosystem. Many converters, preprocessors, embedders, retrievers, rankers, generators — for Postgres, Elasticsearch, Pinecone, Weaviate, OpenSearch, Qdrant, you name it. RedHop has built-in parsers for PDF / DOCX / PPTX / XLSX / Markdown / code, BM25 by default, optional ONNX embeddings. That’s it.
- deepset Cloud. Managed Haystack hosting with a UI, evaluation dashboards, prompt management. RedHop is OSS only, in-process; you run it.
- Strong evaluation framework. Haystack ships with eval harnesses
(
haystack-experimental) for retrieval and answer quality metrics across components. RedHop ships with the Decision Report on every call and benchmark scripts in the repo, but no formal eval harness yet. - A mature production track record. deepset has been shipping Haystack since 2019; battle-tested at enterprise scale. RedHop is alpha.
If you need any of the above, stay on Haystack — or use the two together (RedHop as a component inside a Haystack pipeline for the document-context step).
What RedHop gives you that Haystack doesn’t
Section titled “What RedHop gives you that Haystack doesn’t”1. A Decision Report on every call
Section titled “1. A Decision Report on every call”Every doc.context(query) returns a ctx.report describing exactly what
happened — what was kept, what was dropped, whether the engine
intervened, why it chose what it chose.
RedHop Decision Report======================
Decision: Auto → passthrough (small context, no intervention needed)
Why: - 1,240 tokens — below the dilution gate (1,500 tokens) - pruning a small clean context risks dropping reasoning evidence Result: - kept all 8 retrieved chunks - evidence retained 100%, second-hop links preservedHaystack returns a result dict with whatever the last pipeline component
produced; observability is what you instrument yourself. With RedHop, the
report is structured data on every call — auto_decision, total_tokens,
n_input_chunks, n_selected, retained_evidence_ratio,
second_hop_rescue_count. You can also run doc.analyze(query) to get
the same diagnostics without assembling a context.
2. No document store, no pipeline graph
Section titled “2. No document store, no pipeline graph”Haystack’s default in-memory document store is its own object, lives
separately from the pipeline, and you wire components into it explicitly
via DocumentWriter on the indexing side and InMemoryEmbeddingRetriever
on the query side. Two pipelines, ten components, lots of connect()
calls.
RedHop’s default tier is BM25 — no document store, no separate index object you manage, no pipeline DAG to wire. Zero model download, zero embedding cost, sub-100ms warm queries. Most document QA — code, API references, runbooks, financial reports, handbooks — works on lexical alone, because the words in the question are usually the words in the answer.
If you need semantic retrieval, opt into retrieval="hybrid" with a
small embedding model (bge-small, ~80MB, auto-downloaded). Even then,
retrieval is exact cosine over your in-memory chunks — no ANN index,
no vector store, no embedded service.
3. Three calls cover the surface
Section titled “3. Three calls cover the surface”Load. Ask. Read. That’s the API.
doc = redhop.Document.from_file("contract.pdf") # load (or .from_folder, .from_text, .from_bytes)ctx = doc.context("What is the governing law?") # askprint(ctx.text()) # the prompt for your LLMfor c in ctx.citations: ... # source / page / heading / line per chunkprint(ctx.report) # the decisionCompare to Haystack’s Pipeline → components → DocumentStore → run with nested input dict shape. Each piece is its own concept with its own configuration surface.
4. The same API in Python, Node, and Rust
Section titled “4. The same API in Python, Node, and Rust”Haystack is Python-only — no official TypeScript port, no Rust. RedHop ships the same surface in Python, Node, and Rust over a single Rust core. Prototype in Python, ship the same API in your Rust service or Electron app.
5. In-process, no SaaS, no network calls
Section titled “5. In-process, no SaaS, no network calls”RedHop runs in your process. No service to call, no hosted endpoint, no API key. The optional embedding model is downloaded once and runs locally via ONNX. Your documents never leave the box. For finance / legal / health teams with data residency requirements, this is the shape of the answer.
Migrating from Haystack to RedHop
Section titled “Migrating from Haystack to RedHop”If you’ve got an existing Haystack RAG pipeline doing document QA, here’s the equivalent in RedHop.
Loading + indexing
Section titled “Loading + indexing”Haystack:
from haystack import Pipelinefrom haystack.components.converters import PyPDFToDocumentfrom haystack.components.preprocessors import DocumentSplitterfrom haystack.components.embedders import OpenAIDocumentEmbedderfrom haystack.components.writers import DocumentWriterfrom haystack.document_stores.in_memory import InMemoryDocumentStore
doc_store = InMemoryDocumentStore()indexing = Pipeline()indexing.add_component("converter", PyPDFToDocument())indexing.add_component("splitter", DocumentSplitter(split_by="word", split_length=200))indexing.add_component("embedder", OpenAIDocumentEmbedder())indexing.add_component("writer", DocumentWriter(document_store=doc_store))indexing.connect("converter", "splitter")indexing.connect("splitter", "embedder")indexing.connect("embedder", "writer")indexing.run({"converter": {"sources": ["contract.pdf"]}})RedHop:
import redhopdoc = redhop.Document.from_file("contract.pdf")That’s it. PDF parsing, chunking, indexing — all behind the API. No
embedding call (default tier is BM25). For semantic retrieval add
retrieval="hybrid", model="bge-small" to the constructor.
Querying
Section titled “Querying”Haystack:
from haystack.components.embedders import OpenAITextEmbedderfrom haystack.components.retrievers.in_memory import InMemoryEmbeddingRetrieverfrom haystack.components.builders import ChatPromptBuilderfrom haystack.components.generators.chat import OpenAIChatGeneratorfrom haystack.dataclasses import ChatMessage
template = [ChatMessage.from_user( "Answer using only the context.\n\n" "{% for d in documents %}{{d.content}}\n{% endfor %}\n" "Question: {{query}}")]querying = Pipeline()querying.add_component("embedder", OpenAITextEmbedder())querying.add_component("retriever", InMemoryEmbeddingRetriever(document_store=doc_store))querying.add_component("prompt", ChatPromptBuilder(template=template))querying.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini"))querying.connect("embedder.embedding", "retriever.query_embedding")querying.connect("retriever.documents", "prompt.documents")querying.connect("prompt.prompt", "llm.messages")answer = querying.run({"embedder": {"text": query}, "prompt": {"query": query}})["llm"]["replies"][0].textRedHop (LLM-agnostic — bring your own):
ctx = doc.context("What is the governing law?")answer = OpenAI().chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": f"{ctx.text()}\n\nQuestion: What is the governing law?"}],).choices[0].message.contentHaystack wraps the LLM call in its pipeline. RedHop hands you the prompt string and lets you call any provider directly — no component wrapping, no socket wiring.
Citations / source documents
Section titled “Citations / source documents”Haystack:
retrieved = querying.run(...)["retriever"]["documents"]for d in retrieved: print(d.meta, d.content)RedHop:
for c in ctx.citations: print(c["source"], c["page"], c["heading"], c["line"])Same shape, simpler keys. source plus whichever of page / heading /
line the format provides — no separate metadata layer.
Folder of files
Section titled “Folder of files”Haystack:
from pathlib import Pathindexing.run({"converter": {"sources": list(Path("./docs").glob("**/*.pdf"))}})RedHop:
doc = redhop.Document.from_folder("./docs", persist=True)from_folder honors .gitignore, accepts custom ignore patterns, and
optionally writes an incremental on-disk index — reload is O(changed
files), not O(all files).
Pick the right tool
Section titled “Pick the right tool”| Workload | RedHop | Haystack |
|---|---|---|
| Document QA with one or many files | ✅ shorter, observable | ✅ verbose but flexible |
| Multi-step pipelines / conditional flows | ❌ out of scope | ✅ flagship |
| Production deployment at enterprise scale | ⚠️ alpha | ✅ mature |
| Hosted / managed RAG with a dashboard | ❌ | ✅ deepset Cloud |
| Visibility into retrieval decisions | ✅ Decision Report | ❌ DIY observability |
| In-process, no document store, no infra | ✅ | ❌ |
| Same API in Python / Node / Rust | ✅ | ❌ Python only |
| Strong evaluation harness | ⚠️ benchmark scripts | ✅ haystack-experimental |
| Apache-2.0, no commercial gating | ✅ | ✅ (deepset Cloud is paid) |
If your workload sits firmly in document QA and you’ve been wondering why Haystack’s pipeline model feels heavy for a file-in-answer-out flow — RedHop is the alternative you’re looking for. If you’re building multi-step RAG, branching flows, conditional routing, or deploying to enterprise infrastructure — Haystack’s pipeline composition is the better tool.
Get started
Section titled “Get started”pip install redhop # Pythoncargo add redhop --features files,semantic # Rustnpm install redhop # Node.js -- on npm- Quickstart — the three-call surface
- Choosing a configuration — when to use which retrieval tier
- Comparison — RedHop vs LangChain vs LlamaIndex benchmarks
- Other alternatives — per-framework deep-dives (LangChain, LlamaIndex)
- llms.txt — single-file context for AI coding agents
Open source under Apache-2.0. Bug reports and use-case feedback welcome at github.com/vysakh0/redhop.