RedHop: A Simpler LangChain Alternative for Document RAG
If you’re searching for a LangChain alternative, you’re probably hitting one of three walls:
- Too much surface area. Chains, agents, retrievers, embedders, vector stores, output parsers, callbacks — all to answer questions about a PDF.
- A vector DB you don’t need. Most document QA workloads don’t actually need Pinecone or Weaviate; lexical BM25 handles them fine. LangChain’s default examples push you to one anyway.
- No visibility into the retrieval decision. When the answer is wrong, you don’t know if the retriever missed the chunk, the chain dropped it, or the reranker pruned it.
RedHop is a focused alternative: an in-process retrieval + context library that does one thing — turn a document and a question into the right LLM prompt context — and tells you exactly what it kept, dropped, and why.
import redhop
doc = redhop.Document.from_file("contract.pdf")ctx = doc.context("What is the governing law?")answer = llm.generate(ctx.text())
print(ctx.report) # what was kept, dropped, and whyThat’s the whole surface. Three calls. No vector DB. No chains. Python, Node, and Rust over a Rust core — all in-process.
Should you switch from LangChain to RedHop?
Section titled “Should you switch from LangChain to RedHop?”The honest answer: it depends on what you’re building.
| If you need… | Pick |
|---|---|
| Document QA with citations and a Decision Report | RedHop |
| In-process retrieval, no vector DB, no infra | RedHop |
| The same API in Python, Node, and Rust | RedHop |
| Agents that call tools and choose actions | LangChain |
| Chain / DAG / multi-step orchestration | LangChain |
| 100+ LLM provider integrations out of the box | LangChain |
| Per-user conversational memory | LangChain (or a dedicated memory product) |
| Production-tested ecosystem with many integrations | LangChain |
This isn’t an attack on LangChain. LangChain is a framework that does many things; RedHop is a library that does one. If you want the framework, LangChain is the right choice. If you want just the bit between your documents and the LLM — with the decision exposed — RedHop is simpler.
The same question, two ways
Section titled “The same question, two ways”Same contract.pdf. Same question. RedHop on the left tab, LangChain on the
right.
import redhopfrom openai import OpenAI
query = "What is the governing law?"
ctx = redhop.Document.from_file("contract.pdf").context(query)# parsed, chunked, retrieved, and token-budgeted internally
response = OpenAI().chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": f"{ctx.text()}\n\nQuestion: {query}"}],)print(response.choices[0].message.content)What you stand up: nothing. Point it at the file and ask; parsing, chunking, retrieval, and token-budgeting happen inside — and every call returns a Decision Report explaining what it kept and why.
from langchain_community.document_loaders import PyMuPDFLoaderfrom langchain_text_splitters import RecursiveCharacterTextSplitterfrom langchain_openai import OpenAIEmbeddings, ChatOpenAIfrom langchain_community.vectorstores import FAISSfrom langchain_core.prompts import ChatPromptTemplatefrom langchain_core.runnables import RunnablePassthroughfrom langchain_core.output_parsers import StrOutputParser
query = "What is the governing law?"
pages = PyMuPDFLoader("contract.pdf").load()chunks = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200,).split_documents(pages)
store = FAISS.from_documents(chunks, OpenAIEmbeddings())retriever = store.as_retriever(search_kwargs={"k": 4})
prompt = ChatPromptTemplate.from_template( "Answer using only the context.\n\n{context}\n\nQuestion: {input}")
chain = ( {"context": retriever, "input": RunnablePassthrough()} | prompt | ChatOpenAI(model="gpt-4o-mini") | StrOutputParser())
print(chain.invoke(query))What you stand up: a splitter (you choose
chunk_size/overlap), an embedding model, a FAISS vector
store, a retriever, a prompt template, and a retrieval chain — six wired pieces,
and embeddings cost a call per chunk.
Notice the difference in shape — not just length. LangChain makes you wire a loader, a splitter, an embedder, a vector store, a retriever, a prompt, a chain. RedHop has one concept: document → context. Everything else is an implementation detail behind the API.
The full head-to-head benchmark (evidence retention + downstream answer quality on CUAD contracts and HotpotQA multi-hop) is on the Comparison page — same documents, same BM25 retriever for fairness, same token budget. TL;DR: RedHop ties or edges LangChain on answer quality at a fraction of the code surface.
What LangChain gives you that RedHop doesn’t
Section titled “What LangChain gives you that RedHop doesn’t”Be clear about this. LangChain has things RedHop doesn’t even try to be:
- Agents and tool-use. ReAct, OpenAI Functions, structured tool calling, agent executors, custom tools. RedHop is stateless per-query — it has no agent loop.
- Chains and workflow orchestration. SequentialChain, RouterChain, multi-step DAGs. RedHop does one step: assemble context.
- Conversational memory. ConversationBufferMemory, summary memory, vector memory. RedHop doesn’t track conversation state — that’s a different problem (try Supermemory or LangChain’s memory for that).
- Massive integration surface. Hundreds of LLM providers, vector stores, document loaders, tools. RedHop is opinionated and small: built-in PDF / DOCX / PPTX / XLSX / Markdown / code parsers, BM25 by default, optional ONNX-backed embeddings.
- An established ecosystem. LangSmith for observability, LangServe for serving, LangGraph for graphs. RedHop is alpha — useful, but young.
If you need any of the above, stay on LangChain or use the two together (RedHop for the document-context step, LangChain for the chain / agent that wraps it).
What RedHop gives you that LangChain doesn’t
Section titled “What RedHop gives you that LangChain doesn’t”1. A Decision Report on every call
Section titled “1. A Decision Report on every call”Every doc.context(query) returns a ctx.report describing exactly what
happened — what was kept, what was dropped, whether the engine intervened, why
it chose what it chose.
RedHop Decision Report======================
Decision: Auto → passthrough (small context, no intervention needed)
Why: - 1,240 tokens — below the dilution gate (1,500 tokens) - pruning a small clean context risks dropping reasoning evidence Result: - kept all 8 retrieved chunks - evidence retained 100%, second-hop links preservedLangChain’s retrieval is opaque. When the answer is wrong, you instrument the
retriever yourself. With RedHop, the report is structured data on every
single call — auto_decision, total_tokens, n_input_chunks, n_selected,
retained_evidence_ratio, second_hop_rescue_count. You can also run
doc.analyze(query) to get the same diagnostics without assembling a
context — pure observability before you act.
2. No vector database required
Section titled “2. No vector database required”The default tier is BM25. Zero model download, zero ONNX runtime, fully offline, sub-100ms warm queries. Most document QA — code, API references, runbooks, financial reports, handbooks — works on lexical alone, because the words in the question are usually the words in the answer.
If you need semantic retrieval, opt into retrieval="hybrid" with a small
embedding model (bge-small, ~80MB, auto-downloaded). Even then, retrieval
is exact cosine over your in-memory chunks — no ANN index, no vector
store, no embedded service. LangChain’s hybrid retriever requires you to
stand up FAISS / Chroma / Weaviate / Qdrant / Pinecone or similar.
3. Three calls cover the surface
Section titled “3. Three calls cover the surface”Load. Ask. Read. That’s the whole API.
doc = redhop.Document.from_file("contract.pdf") # load (or .from_folder, .from_text, .from_bytes)ctx = doc.context("What is the governing law?") # askprint(ctx.text()) # the prompt for your LLMfor c in ctx.citations: ... # source / page / heading / line per chunkprint(ctx.report) # the decisionCompare to a typical LangChain RAG: a loader + a splitter + an embedder + a vector store + a retriever + a prompt + a chain. Each piece has its own config surface. The cognitive overhead compounds.
4. The same API in Python, Node, and Rust
Section titled “4. The same API in Python, Node, and Rust”LangChain is Python (langchain-python) and JavaScript (langchain-js) — and the JS port doesn’t fully mirror the Python one. RedHop ships the same surface in Python, Node, and Rust over a single Rust core. Build a prototype in Python, ship the same API in your Rust service or Electron desktop app.
5. In-process, no SaaS, no network calls
Section titled “5. In-process, no SaaS, no network calls”RedHop runs in your process. No service to call, no hosted endpoint, no API key. The optional embedding model is downloaded once (cached locally) and runs locally via ONNX. Your documents never leave the box. For finance / legal / health teams with data residency requirements, this is the shape of the answer.
Migrating from LangChain to RedHop
Section titled “Migrating from LangChain to RedHop”If you’ve got an existing LangChain RAG pipeline doing document QA, here’s the equivalent in RedHop.
Loading + chunking + retrieval
Section titled “Loading + chunking + retrieval”LangChain:
from langchain_community.document_loaders import PyMuPDFLoaderfrom langchain_text_splitters import RecursiveCharacterTextSplitterfrom langchain_openai import OpenAIEmbeddingsfrom langchain_community.vectorstores import FAISS
pages = PyMuPDFLoader("contract.pdf").load()chunks = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200).split_documents(pages)store = FAISS.from_documents(chunks, OpenAIEmbeddings())retriever = store.as_retriever(search_kwargs={"k": 4})RedHop:
import redhopdoc = redhop.Document.from_file("contract.pdf")That’s it. PDF parsing, chunking, indexing — all behind the API. No
embedding call (default tier is BM25). For semantic retrieval add
retrieval="hybrid", model="bge-small" to the constructor.
Getting the context for the LLM
Section titled “Getting the context for the LLM”LangChain:
from langchain_core.prompts import ChatPromptTemplatefrom langchain_core.runnables import RunnablePassthroughfrom langchain_core.output_parsers import StrOutputParserfrom langchain_openai import ChatOpenAI
prompt = ChatPromptTemplate.from_template("Answer using only the context.\n\n{context}\n\nQuestion: {input}")
chain = ( {"context": retriever, "input": RunnablePassthrough()} | prompt | ChatOpenAI(model="gpt-4o-mini") | StrOutputParser())answer = chain.invoke("What is the governing law?")RedHop:
ctx = doc.context("What is the governing law?")answer = OpenAI().responses.create(model="gpt-4o-mini", input=f"{ctx.text()}\n\nQuestion: What is the governing law?").output_textBring your own LLM client. RedHop hands you a prompt string — no chain abstraction, no lock-in to a provider, no callback machinery.
Adding citations
Section titled “Adding citations”LangChain: thread document metadata through the chain, parse it back out from the retrieved docs, hope the field names match.
RedHop:
for c in ctx.citations: print(c["source"], c["page"], c["heading"])Citations come for free with ctx.citations — source, page, heading,
line, and text per surviving chunk, in reading order.
Folder-of-files RAG (the common case)
Section titled “Folder-of-files RAG (the common case)”LangChain: loop over files, load each, split, index into one store.
RedHop:
doc = redhop.Document.from_folder("./docs", persist=True)ctx = doc.context("Where is the refund policy?")from_folder honors .gitignore, accepts custom ignore patterns, and
optionally writes an incremental on-disk index — reload is O(changed
files), not O(all files).
Pick the right tool
Section titled “Pick the right tool”| Workload | RedHop | LangChain |
|---|---|---|
| Document QA with one or many files | ✅ shorter, observable | ✅ flexible |
| Agentic workflows / tool use | ❌ out of scope | ✅ flagship feature |
| Need to plug in a specific LLM provider | ✅ (any — you call it) | ✅ (built-in integration) |
| Conversational memory across sessions | ❌ stateless per query | ✅ |
| Production-tested at billion-vector scale | ❌ (in-memory) | ✅ (with a vector DB) |
| Visibility into retrieval decisions | ✅ Decision Report | ❌ DIY observability |
| Apache-2.0, no commercial gating | ✅ | ✅ |
| Same API in Python / Node / Rust | ✅ | ❌ (Python + partial JS) |
If your workload sits firmly in document QA and you’ve been wondering why LangChain feels like 10 imports to do one thing — RedHop is the alternative you’re looking for. If you’ve graduated to agents and tool-use, stay on LangChain (and consider using RedHop inside an agent for the document-context step).
Get started
Section titled “Get started”pip install redhop # Pythoncargo add redhop --features files,semantic # Rustnpm install redhop # Node.js -- on npm- Quickstart — the three-call surface
- Choosing a configuration — when to use which retrieval tier
- The full benchmark vs LangChain & LlamaIndex — same datasets, same retriever, head-to-head
- Other alternatives — per-framework deep-dives (LlamaIndex, Haystack)
- llms.txt — single-file context for AI coding agents
Open source under Apache-2.0. Bug reports and use-case feedback welcome at github.com/vysakh0/redhop.