Skip to content

RedHop: A Simpler LangChain Alternative for Document RAG

If you’re searching for a LangChain alternative, you’re probably hitting one of three walls:

  • Too much surface area. Chains, agents, retrievers, embedders, vector stores, output parsers, callbacks — all to answer questions about a PDF.
  • A vector DB you don’t need. Most document QA workloads don’t actually need Pinecone or Weaviate; lexical BM25 handles them fine. LangChain’s default examples push you to one anyway.
  • No visibility into the retrieval decision. When the answer is wrong, you don’t know if the retriever missed the chunk, the chain dropped it, or the reranker pruned it.

RedHop is a focused alternative: an in-process retrieval + context library that does one thing — turn a document and a question into the right LLM prompt context — and tells you exactly what it kept, dropped, and why.

import redhop
doc = redhop.Document.from_file("contract.pdf")
ctx = doc.context("What is the governing law?")
answer = llm.generate(ctx.text())
print(ctx.report) # what was kept, dropped, and why

That’s the whole surface. Three calls. No vector DB. No chains. Python, Node, and Rust over a Rust core — all in-process.


Should you switch from LangChain to RedHop?

Section titled “Should you switch from LangChain to RedHop?”

The honest answer: it depends on what you’re building.

If you need…Pick
Document QA with citations and a Decision ReportRedHop
In-process retrieval, no vector DB, no infraRedHop
The same API in Python, Node, and RustRedHop
Agents that call tools and choose actionsLangChain
Chain / DAG / multi-step orchestrationLangChain
100+ LLM provider integrations out of the boxLangChain
Per-user conversational memoryLangChain (or a dedicated memory product)
Production-tested ecosystem with many integrationsLangChain

This isn’t an attack on LangChain. LangChain is a framework that does many things; RedHop is a library that does one. If you want the framework, LangChain is the right choice. If you want just the bit between your documents and the LLM — with the decision exposed — RedHop is simpler.


Same contract.pdf. Same question. RedHop on the left tab, LangChain on the right.

import redhop
from openai import OpenAI
query = "What is the governing law?"
ctx = redhop.Document.from_file("contract.pdf").context(query)
# parsed, chunked, retrieved, and token-budgeted internally
response = OpenAI().chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"{ctx.text()}\n\nQuestion: {query}"}],
)
print(response.choices[0].message.content)

What you stand up: nothing. Point it at the file and ask; parsing, chunking, retrieval, and token-budgeting happen inside — and every call returns a Decision Report explaining what it kept and why.

Notice the difference in shape — not just length. LangChain makes you wire a loader, a splitter, an embedder, a vector store, a retriever, a prompt, a chain. RedHop has one concept: document → context. Everything else is an implementation detail behind the API.

The full head-to-head benchmark (evidence retention + downstream answer quality on CUAD contracts and HotpotQA multi-hop) is on the Comparison page — same documents, same BM25 retriever for fairness, same token budget. TL;DR: RedHop ties or edges LangChain on answer quality at a fraction of the code surface.


What LangChain gives you that RedHop doesn’t

Section titled “What LangChain gives you that RedHop doesn’t”

Be clear about this. LangChain has things RedHop doesn’t even try to be:

  • Agents and tool-use. ReAct, OpenAI Functions, structured tool calling, agent executors, custom tools. RedHop is stateless per-query — it has no agent loop.
  • Chains and workflow orchestration. SequentialChain, RouterChain, multi-step DAGs. RedHop does one step: assemble context.
  • Conversational memory. ConversationBufferMemory, summary memory, vector memory. RedHop doesn’t track conversation state — that’s a different problem (try Supermemory or LangChain’s memory for that).
  • Massive integration surface. Hundreds of LLM providers, vector stores, document loaders, tools. RedHop is opinionated and small: built-in PDF / DOCX / PPTX / XLSX / Markdown / code parsers, BM25 by default, optional ONNX-backed embeddings.
  • An established ecosystem. LangSmith for observability, LangServe for serving, LangGraph for graphs. RedHop is alpha — useful, but young.

If you need any of the above, stay on LangChain or use the two together (RedHop for the document-context step, LangChain for the chain / agent that wraps it).


What RedHop gives you that LangChain doesn’t

Section titled “What RedHop gives you that LangChain doesn’t”

Every doc.context(query) returns a ctx.report describing exactly what happened — what was kept, what was dropped, whether the engine intervened, why it chose what it chose.

RedHop Decision Report
======================
Decision: Auto → passthrough (small context, no intervention needed)
Why:
- 1,240 tokens — below the dilution gate (1,500 tokens)
- pruning a small clean context risks dropping reasoning evidence
Result:
- kept all 8 retrieved chunks
- evidence retained 100%, second-hop links preserved

LangChain’s retrieval is opaque. When the answer is wrong, you instrument the retriever yourself. With RedHop, the report is structured data on every single call — auto_decision, total_tokens, n_input_chunks, n_selected, retained_evidence_ratio, second_hop_rescue_count. You can also run doc.analyze(query) to get the same diagnostics without assembling a context — pure observability before you act.

The default tier is BM25. Zero model download, zero ONNX runtime, fully offline, sub-100ms warm queries. Most document QA — code, API references, runbooks, financial reports, handbooks — works on lexical alone, because the words in the question are usually the words in the answer.

If you need semantic retrieval, opt into retrieval="hybrid" with a small embedding model (bge-small, ~80MB, auto-downloaded). Even then, retrieval is exact cosine over your in-memory chunks — no ANN index, no vector store, no embedded service. LangChain’s hybrid retriever requires you to stand up FAISS / Chroma / Weaviate / Qdrant / Pinecone or similar.

Load. Ask. Read. That’s the whole API.

doc = redhop.Document.from_file("contract.pdf") # load (or .from_folder, .from_text, .from_bytes)
ctx = doc.context("What is the governing law?") # ask
print(ctx.text()) # the prompt for your LLM
for c in ctx.citations: ... # source / page / heading / line per chunk
print(ctx.report) # the decision

Compare to a typical LangChain RAG: a loader + a splitter + an embedder + a vector store + a retriever + a prompt + a chain. Each piece has its own config surface. The cognitive overhead compounds.

LangChain is Python (langchain-python) and JavaScript (langchain-js) — and the JS port doesn’t fully mirror the Python one. RedHop ships the same surface in Python, Node, and Rust over a single Rust core. Build a prototype in Python, ship the same API in your Rust service or Electron desktop app.

RedHop runs in your process. No service to call, no hosted endpoint, no API key. The optional embedding model is downloaded once (cached locally) and runs locally via ONNX. Your documents never leave the box. For finance / legal / health teams with data residency requirements, this is the shape of the answer.


If you’ve got an existing LangChain RAG pipeline doing document QA, here’s the equivalent in RedHop.

LangChain:

from langchain_community.document_loaders import PyMuPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
pages = PyMuPDFLoader("contract.pdf").load()
chunks = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200).split_documents(pages)
store = FAISS.from_documents(chunks, OpenAIEmbeddings())
retriever = store.as_retriever(search_kwargs={"k": 4})

RedHop:

import redhop
doc = redhop.Document.from_file("contract.pdf")

That’s it. PDF parsing, chunking, indexing — all behind the API. No embedding call (default tier is BM25). For semantic retrieval add retrieval="hybrid", model="bge-small" to the constructor.

LangChain:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
prompt = ChatPromptTemplate.from_template("Answer using only the context.\n\n{context}\n\nQuestion: {input}")
chain = (
{"context": retriever, "input": RunnablePassthrough()}
| prompt
| ChatOpenAI(model="gpt-4o-mini")
| StrOutputParser()
)
answer = chain.invoke("What is the governing law?")

RedHop:

ctx = doc.context("What is the governing law?")
answer = OpenAI().responses.create(model="gpt-4o-mini",
input=f"{ctx.text()}\n\nQuestion: What is the governing law?").output_text

Bring your own LLM client. RedHop hands you a prompt string — no chain abstraction, no lock-in to a provider, no callback machinery.

LangChain: thread document metadata through the chain, parse it back out from the retrieved docs, hope the field names match.

RedHop:

for c in ctx.citations:
print(c["source"], c["page"], c["heading"])

Citations come for free with ctx.citationssource, page, heading, line, and text per surviving chunk, in reading order.

LangChain: loop over files, load each, split, index into one store.

RedHop:

doc = redhop.Document.from_folder("./docs", persist=True)
ctx = doc.context("Where is the refund policy?")

from_folder honors .gitignore, accepts custom ignore patterns, and optionally writes an incremental on-disk index — reload is O(changed files), not O(all files).


WorkloadRedHopLangChain
Document QA with one or many files✅ shorter, observable✅ flexible
Agentic workflows / tool use❌ out of scope✅ flagship feature
Need to plug in a specific LLM provider✅ (any — you call it)✅ (built-in integration)
Conversational memory across sessions❌ stateless per query
Production-tested at billion-vector scale❌ (in-memory)✅ (with a vector DB)
Visibility into retrieval decisions✅ Decision Report❌ DIY observability
Apache-2.0, no commercial gating
Same API in Python / Node / Rust❌ (Python + partial JS)

If your workload sits firmly in document QA and you’ve been wondering why LangChain feels like 10 imports to do one thing — RedHop is the alternative you’re looking for. If you’ve graduated to agents and tool-use, stay on LangChain (and consider using RedHop inside an agent for the document-context step).


Terminal window
pip install redhop # Python
cargo add redhop --features files,semantic # Rust
npm install redhop # Node.js -- on npm

Open source under Apache-2.0. Bug reports and use-case feedback welcome at github.com/vysakh0/redhop.