RedHop: a Haystack alternative for document RAG

If you’re searching for a Haystack alternative, you’re probably hitting one of these walls:

The pipeline DAG is heavy for simple cases. Two pipelines (indexing + query), components for each step, explicit socket wiring with connect(), all for one PDF and one question.
Document store assumed from day one. Even the in-memory store is its own object you manage. For prototyping document QA, you want the file in, the answer out, without standing up infrastructure first.
Verbose for a small surface. Twenty-plus lines for a basic RAG path that mirrors what other libraries do in five. Production-grade, but heavy for the common case.
Python only. No TypeScript or Rust story. If you’re shipping to a non-Python service, you’re rewriting from scratch.
No visibility into the retrieval decision. The pipeline runs and returns a result, and when the wrong chunk surfaces, you instrument it yourself.

RedHop is a focused alternative: an in-process retrieval + context library that does one thing (turn a document and a question into the right LLM prompt context) and tells you exactly what it kept, dropped, and why.

import redhop

doc = redhop.Document.from_file("contract.pdf")
ctx = doc.context("What is the governing law?")
answer = llm.generate(ctx.text())

print(ctx.report)   # what was kept, dropped, and why

That’s the whole surface. Three calls. No pipelines, no components, no document store. Python, Node, and Rust over a Rust core, all in-process.

Should you switch from Haystack to RedHop?

The honest answer: it depends on what you’re building.

If you need…	Pick
Document QA with citations and a Decision Report	RedHop
In-process retrieval, no document store, no infra	RedHop
The same API in Python, Node, and Rust	RedHop
Multi-step pipelines with branching, loops, conditionals	Haystack
Component reuse and swappable pieces in production	Haystack
Strong evaluation framework (`haystack-experimental`)	Haystack
deepset Cloud (hosted, managed)	Haystack (via deepset)
Mature production deployments at scale	Haystack

Haystack is a production-grade pipeline framework built for composable NLP/RAG workflows. RedHop is a library that does the one bounded step: here’s the file, here’s the question, give me the right context with a decision report. If you need Haystack’s pipeline composition, stay there. If you just need the three-call shape with observability, RedHop is simpler.

The same question, two ways

Same contract.pdf. Same question. RedHop on the left tab, Haystack on the right.

RedHop
Haystack

import redhop
from openai import OpenAI

query = "What is the governing law?"

ctx = redhop.Document.from_file("contract.pdf").context(query)
#  parsed, chunked, retrieved, and token-budgeted internally

response = OpenAI().chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": f"{ctx.text()}\n\nQuestion: {query}"}],
)
print(response.choices[0].message.content)

What you stand up: nothing. Point it at the file and ask; parsing, chunking, retrieval, and token-budgeting happen inside — and every call returns a Decision Report explaining what it kept and why.

from haystack import Pipeline
from haystack.components.converters import PyPDFToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.embedders import OpenAITextEmbedder, OpenAIDocumentEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage

query = "What is the governing law?"
doc_store = InMemoryDocumentStore()

# Indexing pipeline
indexing = Pipeline()
indexing.add_component("converter", PyPDFToDocument())
indexing.add_component("splitter", DocumentSplitter(split_by="word", split_length=200))
indexing.add_component("embedder", OpenAIDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store=doc_store))
indexing.connect("converter", "splitter")
indexing.connect("splitter", "embedder")
indexing.connect("embedder", "writer")
indexing.run({"converter": {"sources": ["contract.pdf"]}})

# Query pipeline
template = [ChatMessage.from_user(
    "Answer using only the context.\n\n"
    "{% for d in documents %}{{d.content}}\n{% endfor %}\n"
    "Question: {{query}}"
)]

querying = Pipeline()
querying.add_component("embedder", OpenAITextEmbedder())
querying.add_component("retriever", InMemoryEmbeddingRetriever(document_store=doc_store))
querying.add_component("prompt", ChatPromptBuilder(template=template))
querying.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini"))
querying.connect("embedder.embedding", "retriever.query_embedding")
querying.connect("retriever.documents", "prompt.documents")
querying.connect("prompt.prompt", "llm.messages")
result = querying.run({"embedder": {"text": query}, "prompt": {"query": query}})

print(result["llm"]["replies"][0].text)

What you stand up: two pipelines (indexing + query), a converter, a splitter, two embedders (one for docs, one for the query), a document store, a retriever, a chat-prompt builder, and a generator — explicit DAG wiring, every socket connection written out. Production-friendly, but verbose for one PDF.

Haystack’s component model is well-engineered: every step is a discrete piece with named input/output sockets, which makes it easy to swap pieces in production. But for one PDF and one question, that machinery is overhead. RedHop has one concept: document → context. Everything else is an implementation detail.

The broader head-to-head benchmark on the Comparison page covers LangChain and LlamaIndex specifically. Haystack isn’t in those numbers yet, so the comparison above is structural (code-vs-code) rather than measured retention / answer-quality scores.

What Haystack gives you that RedHop doesn’t

Be clear about this. Haystack has things RedHop doesn’t even try to be:

Composable pipelines with arbitrary branching. Multi-step retrieval, conditional routing, loop-based agentic flows: Haystack’s pipeline graph supports all of it. RedHop is one path: chunk → BM25 (or hybrid) → assemble.
A large component ecosystem. Many converters, preprocessors, embedders, retrievers, rankers, generators for Postgres, Elasticsearch, Pinecone, Weaviate, OpenSearch, Qdrant, you name it. RedHop has built-in parsers for PDF / DOCX / PPTX / XLSX / Markdown / code, BM25 by default, optional ONNX embeddings. That’s it.
deepset Cloud. Managed Haystack hosting with a UI, evaluation dashboards, prompt management. RedHop is OSS only, in-process. You run it.
Strong evaluation framework. Haystack ships with eval harnesses (haystack-experimental) for retrieval and answer quality metrics across components. RedHop ships with the Decision Report on every call and benchmark scripts in the repo, but no formal eval harness yet.
A mature production track record. deepset has been shipping Haystack since 2019, battle-tested at enterprise scale. RedHop is alpha.

If you need any of the above, stay on Haystack, or use the two together (RedHop as a component inside a Haystack pipeline for the document-context step).

What RedHop gives you that Haystack doesn’t

1. A Decision Report on every call

Every doc.context(query) returns a ctx.report describing exactly what happened: what was kept, what was dropped, whether the engine intervened, why it chose what it chose.

RedHop Decision Report
======================

Decision: Auto → passthrough (small context, no intervention needed)

  Why:
    - 1,240 tokens — below the dilution gate (1,500 tokens)
    - pruning a small clean context risks dropping reasoning evidence
  Result:
    - kept all 8 retrieved chunks
    - evidence retained 100%, second-hop links preserved

Haystack returns a result dict with whatever the last pipeline component produced. Observability is what you instrument yourself. With RedHop, the report is structured data on every call: auto_decision, total_tokens, n_input_chunks, n_selected, retained_evidence_ratio, second_hop_rescue_count. You can also run doc.analyze(query) to get the same diagnostics without assembling a context.

2. No document store, no pipeline graph

Haystack’s default in-memory document store is its own object, lives separately from the pipeline, and you wire components into it explicitly via DocumentWriter on the indexing side and InMemoryEmbeddingRetriever on the query side. Two pipelines, ten components, lots of connect() calls.

RedHop’s default tier is BM25: no document store, no separate index object you manage, no pipeline DAG to wire. Zero model download, zero embedding cost, sub-100ms warm queries. Most document QA (code, API references, runbooks, financial reports, handbooks) works on lexical alone, because the words in the question are usually the words in the answer.

If you need semantic retrieval, opt into retrieval="hybrid" with a small embedding model (bge-small, ~80MB, auto-downloaded). Even then, retrieval is exact cosine over your in-memory chunks: no ANN index, no vector store, no embedded service.

3. Three calls cover the surface

Load. Ask. Read. That’s the API.

doc = redhop.Document.from_file("contract.pdf")   # load (or .from_folder, .from_text, .from_bytes)
ctx = doc.context("What is the governing law?")   # ask
print(ctx.text())                                 # the prompt for your LLM
for c in ctx.citations: ...                        # source / page / heading / line per chunk
print(ctx.report)                                 # the decision

Compare to Haystack’s Pipeline → components → DocumentStore → run with nested input dict shape. Each piece is its own concept with its own configuration surface.

4. The same API in Python, Node, and Rust

Haystack is Python-only: no official TypeScript port, no Rust. RedHop ships the same surface in Python, Node, and Rust over a single Rust core. Prototype in Python, ship the same API in your Rust service or Electron app.

5. In-process, no SaaS, no network calls

RedHop runs in your process. No service to call, no hosted endpoint, no API key. The optional embedding model is downloaded once and runs locally via ONNX. Your documents never leave the box. For finance / legal / health teams with data residency requirements, this is the shape of the answer.

Migrating from Haystack to RedHop

If you’ve got an existing Haystack RAG pipeline doing document QA, here’s the equivalent in RedHop.

Loading + indexing

Haystack:

from haystack import Pipeline
from haystack.components.converters import PyPDFToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.embedders import OpenAIDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore

doc_store = InMemoryDocumentStore()
indexing = Pipeline()
indexing.add_component("converter", PyPDFToDocument())
indexing.add_component("splitter", DocumentSplitter(split_by="word", split_length=200))
indexing.add_component("embedder", OpenAIDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store=doc_store))
indexing.connect("converter", "splitter")
indexing.connect("splitter", "embedder")
indexing.connect("embedder", "writer")
indexing.run({"converter": {"sources": ["contract.pdf"]}})

RedHop:

import redhop
doc = redhop.Document.from_file("contract.pdf")

That’s it. PDF parsing, chunking, indexing: all behind the API. No embedding call (default tier is BM25). For semantic retrieval add retrieval="hybrid", model="bge-small" to the constructor.

Querying

Haystack:

from haystack.components.embedders import OpenAITextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage

template = [ChatMessage.from_user(
    "Answer using only the context.\n\n"
    "{% for d in documents %}{{d.content}}\n{% endfor %}\n"
    "Question: {{query}}"
)]
querying = Pipeline()
querying.add_component("embedder", OpenAITextEmbedder())
querying.add_component("retriever", InMemoryEmbeddingRetriever(document_store=doc_store))
querying.add_component("prompt", ChatPromptBuilder(template=template))
querying.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini"))
querying.connect("embedder.embedding", "retriever.query_embedding")
querying.connect("retriever.documents", "prompt.documents")
querying.connect("prompt.prompt", "llm.messages")
answer = querying.run({"embedder": {"text": query}, "prompt": {"query": query}})["llm"]["replies"][0].text

RedHop (LLM-agnostic, bring your own):

ctx = doc.context("What is the governing law?")
answer = OpenAI().chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": f"{ctx.text()}\n\nQuestion: What is the governing law?"}],
).choices[0].message.content

Haystack wraps the LLM call in its pipeline. RedHop hands you the prompt string and lets you call any provider directly, no component wrapping, no socket wiring.

Citations / source documents

Haystack:

retrieved = querying.run(...)["retriever"]["documents"]
for d in retrieved:
    print(d.meta, d.content)

RedHop:

for c in ctx.citations:
    print(c["source"], c["page"], c["heading"], c["line"])

Same shape, simpler keys. source plus whichever of page / heading / line the format provides, no separate metadata layer.

Folder of files

Haystack:

from pathlib import Path
indexing.run({"converter": {"sources": list(Path("./docs").glob("**/*.pdf"))}})

RedHop:

doc = redhop.Document.from_folder("./docs", options=redhop.FolderOptions(persist=True))

from_folder honors .gitignore, accepts custom ignore patterns, and optionally writes an incremental on-disk index: reload is O(changed files), not O(all files).

Pick the right tool

Workload	RedHop	Haystack
Document QA with one or many files	✅ shorter, observable	✅ verbose but flexible
Multi-step pipelines / conditional flows	❌ out of scope	✅ flagship
Production deployment at enterprise scale	⚠️ alpha	✅ mature
Hosted / managed RAG with a dashboard	❌	✅ deepset Cloud
Visibility into retrieval decisions	✅ Decision Report	❌ DIY observability
In-process, no document store, no infra	✅	❌
Same API in Python / Node / Rust	✅	❌ Python only
Strong evaluation harness	⚠️ benchmark scripts	✅ haystack-experimental
Apache-2.0, no commercial gating	✅	✅ (deepset Cloud is paid)

If your workload sits firmly in document QA and you’ve been wondering why Haystack’s pipeline model feels heavy for a file-in-answer-out flow, RedHop is the alternative you’re looking for. If you’re building multi-step RAG, branching flows, conditional routing, or deploying to enterprise infrastructure, Haystack’s pipeline composition is the better tool.

Get started

pip install redhop                            # Python
cargo add redhop --features files,semantic    # Rust
npm install redhop                            # Node.js -- on npm

Quickstart: the three-call surface
Choosing a configuration: when to use which retrieval tier
Comparison: RedHop vs LangChain vs LlamaIndex benchmarks
Other alternatives: per-framework deep-dives (LangChain, LlamaIndex)
llms.txt: single-file context for AI coding agents

Open source under Apache-2.0. Bug reports and use-case feedback welcome at github.com/vysakh0/redhop.