RedHop: a LangChain alternative for document RAG

If you’re searching for a LangChain alternative, you’re probably hitting one of three walls:

Too much surface area. Chains, agents, retrievers, embedders, vector stores, output parsers, callbacks, all to answer questions about a PDF.
A vector DB you don’t need. Most document QA workloads don’t actually need Pinecone or Weaviate. Lexical BM25 handles them fine. LangChain’s default examples push you to one anyway.
No visibility into the retrieval decision. When the answer is wrong, you don’t know if the retriever missed the chunk, the chain dropped it, or the reranker pruned it.

RedHop is a focused alternative: an in-process retrieval + context library that does one thing (turn a document and a question into the right LLM prompt context) and tells you exactly what it kept, dropped, and why.

import redhop

doc = redhop.Document.from_file("contract.pdf")
ctx = doc.context("What is the governing law?")
answer = llm.generate(ctx.text())

print(ctx.report)   # what was kept, dropped, and why

That’s the whole surface. Three calls. No vector DB. No chains. Python, Node, and Rust over a Rust core, all in-process.

Should you switch from LangChain to RedHop?

The honest answer: it depends on what you’re building.

If you need…	Pick
Document QA with citations and a Decision Report	RedHop
In-process retrieval, no vector DB, no infra	RedHop
The same API in Python, Node, and Rust	RedHop
Agents that call tools and choose actions	LangChain
Chain / DAG / multi-step orchestration	LangChain
100+ LLM provider integrations out of the box	LangChain
Per-user conversational memory	LangChain (or a dedicated memory product)
Production-tested ecosystem with many integrations	LangChain

This isn’t an attack on LangChain. LangChain is a framework that does many things. RedHop is a library that does one. If you want the framework, LangChain is the right choice. If you want just the bit between your documents and the LLM, with the decision exposed, RedHop is simpler.

The same question, two ways

Same contract.pdf. Same question. RedHop on the left tab, LangChain on the right.

RedHop
LangChain

import redhop
from openai import OpenAI

query = "What is the governing law?"

ctx = redhop.Document.from_file("contract.pdf").context(query)
#  parsed, chunked, retrieved, and token-budgeted internally

response = OpenAI().chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": f"{ctx.text()}\n\nQuestion: {query}"}],
)
print(response.choices[0].message.content)

What you stand up: nothing. Point it at the file and ask; parsing, chunking, retrieval, and token-budgeting happen inside — and every call returns a Decision Report explaining what it kept and why.

from langchain_community.document_loaders import PyMuPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

query = "What is the governing law?"

pages = PyMuPDFLoader("contract.pdf").load()
chunks = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200,
).split_documents(pages)

store = FAISS.from_documents(chunks, OpenAIEmbeddings())
retriever = store.as_retriever(search_kwargs={"k": 4})

prompt = ChatPromptTemplate.from_template(
    "Answer using only the context.\n\n{context}\n\nQuestion: {input}"
)

chain = (
    {"context": retriever, "input": RunnablePassthrough()}
    | prompt
    | ChatOpenAI(model="gpt-4o-mini")
    | StrOutputParser()
)

print(chain.invoke(query))

What you stand up: a splitter (you choose chunk_size/overlap), an embedding model, a FAISS vector store, a retriever, a prompt template, and a retrieval chain — six wired pieces, and embeddings cost a call per chunk.

Notice the difference in shape, not just length. LangChain makes you wire a loader, a splitter, an embedder, a vector store, a retriever, a prompt, a chain. RedHop has one concept: document → context. Everything else is an implementation detail behind the API.

The full head-to-head benchmark (evidence retention + downstream answer quality on CUAD contracts and HotpotQA multi-hop) is on the Comparison page: same documents, same BM25 retriever for fairness, same token budget. In short: RedHop ties or edges LangChain on answer quality at a fraction of the code surface.

What LangChain gives you that RedHop doesn’t

Be clear about this. LangChain has things RedHop doesn’t even try to be:

Agents and tool-use. ReAct, OpenAI Functions, structured tool calling, agent executors, custom tools. RedHop is stateless per-query. It has no agent loop.
Chains and workflow orchestration. SequentialChain, RouterChain, multi-step DAGs. RedHop does one step: assemble context.
Conversational memory. ConversationBufferMemory, summary memory, vector memory. RedHop doesn’t track conversation state. That’s a different problem (try Supermemory or LangChain’s memory for that).
Massive integration surface. Hundreds of LLM providers, vector stores, document loaders, tools. RedHop is opinionated and small: built-in PDF / DOCX / PPTX / XLSX / Markdown / code parsers, BM25 by default, optional ONNX-backed embeddings.
An established ecosystem. LangSmith for observability, LangServe for serving, LangGraph for graphs. RedHop is alpha: useful, but young.

If you need any of the above, stay on LangChain or use the two together (RedHop for the document-context step, LangChain for the chain / agent that wraps it).

What RedHop gives you that LangChain doesn’t

1. A Decision Report on every call

Every doc.context(query) returns a ctx.report describing exactly what happened: what was kept, what was dropped, whether the engine intervened, why it chose what it chose.

RedHop Decision Report
======================

Decision: Auto → passthrough (small context, no intervention needed)

  Why:
    - 1,240 tokens — below the dilution gate (1,500 tokens)
    - pruning a small clean context risks dropping reasoning evidence
  Result:
    - kept all 8 retrieved chunks
    - evidence retained 100%, second-hop links preserved

LangChain’s retrieval is opaque. When the answer is wrong, you instrument the retriever yourself. With RedHop, the report is structured data on every single call: auto_decision, total_tokens, n_input_chunks, n_selected, retained_evidence_ratio, second_hop_rescue_count. You can also run doc.analyze(query) to get the same diagnostics without assembling a context, pure observability before you act.

2. No vector database required

The default tier is BM25. Zero model download, zero ONNX runtime, fully offline, sub-100ms warm queries. Most document QA (code, API references, runbooks, financial reports, handbooks) works on lexical alone, because the words in the question are usually the words in the answer.

If you need semantic retrieval, opt into retrieval="hybrid" with a small embedding model (bge-small, ~80MB, auto-downloaded). Even then, retrieval is exact cosine over your in-memory chunks: no ANN index, no vector store, no embedded service. LangChain’s hybrid retriever requires you to stand up FAISS / Chroma / Weaviate / Qdrant / Pinecone or similar.

3. Three calls cover the surface

Load. Ask. Read. That’s the whole API.

doc = redhop.Document.from_file("contract.pdf")   # load (or .from_folder, .from_text, .from_bytes)
ctx = doc.context("What is the governing law?")   # ask
print(ctx.text())                                 # the prompt for your LLM
for c in ctx.citations: ...                        # source / page / heading / line per chunk
print(ctx.report)                                 # the decision

Compare to a typical LangChain RAG: a loader + a splitter + an embedder + a vector store + a retriever + a prompt + a chain. Each piece has its own config surface. The cognitive overhead compounds.

4. The same API in Python, Node, and Rust

LangChain is Python (langchain-python) and JavaScript (langchain-js), and the JS port doesn’t fully mirror the Python one. RedHop ships the same surface in Python, Node, and Rust over a single Rust core. Build a prototype in Python, ship the same API in your Rust service or Electron desktop app.

5. In-process, no SaaS, no network calls

RedHop runs in your process. No service to call, no hosted endpoint, no API key. The optional embedding model is downloaded once (cached locally) and runs locally via ONNX. Your documents never leave the box. For finance / legal / health teams with data residency requirements, this is the shape of the answer.

Migrating from LangChain to RedHop

If you’ve got an existing LangChain RAG pipeline doing document QA, here’s the equivalent in RedHop.

Loading + chunking + retrieval

LangChain:

from langchain_community.document_loaders import PyMuPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

pages = PyMuPDFLoader("contract.pdf").load()
chunks = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200).split_documents(pages)
store = FAISS.from_documents(chunks, OpenAIEmbeddings())
retriever = store.as_retriever(search_kwargs={"k": 4})

RedHop:

import redhop
doc = redhop.Document.from_file("contract.pdf")

That’s it. PDF parsing, chunking, indexing: all behind the API. No embedding call (default tier is BM25). For semantic retrieval add retrieval="hybrid", model="bge-small" to the constructor.

Getting the context for the LLM

LangChain:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_template("Answer using only the context.\n\n{context}\n\nQuestion: {input}")

chain = (
    {"context": retriever, "input": RunnablePassthrough()}
    | prompt
    | ChatOpenAI(model="gpt-4o-mini")
    | StrOutputParser()
)
answer = chain.invoke("What is the governing law?")

RedHop:

ctx = doc.context("What is the governing law?")
answer = OpenAI().responses.create(model="gpt-4o-mini",
    input=f"{ctx.text()}\n\nQuestion: What is the governing law?").output_text

Bring your own LLM client. RedHop hands you a prompt string: no chain abstraction, no lock-in to a provider, no callback machinery.

Adding citations

LangChain: thread document metadata through the chain, parse it back out from the retrieved docs, hope the field names match.

RedHop:

for c in ctx.citations:
    print(c["source"], c["page"], c["heading"])

Citations come for free with ctx.citations: source, page, heading, line, and text per surviving chunk, in reading order.

Folder-of-files RAG (the common case)

LangChain: loop over files, load each, split, index into one store.

RedHop:

doc = redhop.Document.from_folder("./docs", options=redhop.FolderOptions(persist=True))
ctx = doc.context("Where is the refund policy?")

from_folder honors .gitignore, accepts custom ignore patterns, and optionally writes an incremental on-disk index: reload is O(changed files), not O(all files).

Pick the right tool

Workload	RedHop	LangChain
Document QA with one or many files	✅ shorter, observable	✅ flexible
Agentic workflows / tool use	❌ out of scope	✅ flagship feature
Need to plug in a specific LLM provider	✅ (any, you call it)	✅ (built-in integration)
Conversational memory across sessions	❌ stateless per query	✅
Production-tested at billion-vector scale	❌ (in-memory)	✅ (with a vector DB)
Visibility into retrieval decisions	✅ Decision Report	❌ DIY observability
Apache-2.0, no commercial gating	✅	✅
Same API in Python / Node / Rust	✅	❌ (Python + partial JS)

If your workload sits firmly in document QA and you’ve been wondering why LangChain feels like 10 imports to do one thing, RedHop is the alternative you’re looking for. If you’ve graduated to agents and tool-use, stay on LangChain (and consider using RedHop inside an agent for the document-context step).

Get started

pip install redhop                            # Python
cargo add redhop --features files,semantic    # Rust
npm install redhop                            # Node.js -- on npm

Quickstart: the three-call surface
Choosing a configuration: when to use which retrieval tier
The full benchmark vs LangChain & LlamaIndex: same datasets, same retriever, head-to-head
Other alternatives: per-framework deep-dives (LlamaIndex, Haystack)
llms.txt: single-file context for AI coding agents

Open source under Apache-2.0. Bug reports and use-case feedback welcome at github.com/vysakh0/redhop.