Reads your files, out of the box
from_file parses PDF, DOCX, PPTX, XLSX, Markdown, and code natively —
no parser to wire up. Or point from_folder at a whole directory,
indexed once and reloaded incrementally.
RedHop makes RAG easy. Hand it your documents and a question, and it pulls just the sections that matter, hands them to your LLM, and explains every decision. Python, Node, and Rust over a Rust core — chunking, retrieval, and token-budgeting run in-process in milliseconds, with nothing to wire and no services to run.
import redhop
from openai import OpenAI
query = "What is the governing law?"
doc = redhop.Document.from_file("contract.pdf") # parsed + indexed
ctx = doc.context(query) # just the sections that matter
resp = OpenAI().responses.create(
model="gpt-4o-mini",
input=f"{ctx.text()}\n\nQuestion: {query}",
) const { Document } = require("redhop");
const OpenAI = require("openai");
const query = "What is the governing law?";
const doc = Document.fromFile("contract.pdf"); // parsed + indexed
const ctx = doc.context(query); // just the sections that matter
const resp = await new OpenAI().responses.create({
model: "gpt-4o-mini",
input: `${ctx.text}\n\nQuestion: ${query}`,
}); use redhop::read_file;
let query = "What is the governing law?";
let mut doc = read_file("contract.pdf")?; // parsed + indexed
let ctx = doc.context(query)?; // just the sections that matter
// hand ctx.text() to any LLM client — no lock-in:
let prompt = format!(
"{}\n\nQuestion: {query}", ctx.text(),
);
let answer = llm.complete(&prompt).await?; Load a doc — or a folder. Ask. Read the decision and the citations off the returned context. That’s the whole surface.
import redhop
# 1 · Load — a single file or a whole directory in one index.doc = redhop.Document.from_file("contract.pdf")# doc = redhop.Document.from_folder("./policies") # multi-file → same Document
# 2 · Ask — chunking, retrieval, token-budgeting happen in-process.ctx = doc.context("What is the governing law?")prompt = ctx.text() # hand to any LLM — no lock-in
# 3 · Show your work — provenance and the decision, on the same context object.for c in ctx.citations: print(c["source"], c["page"], c["heading"]) # contract.pdf 12 "9.1 Governing Law"
print(ctx.report) # the Decision Report — see belowThe Decision Report is the thing you don’t get anywhere else. Every call returns one — what it kept, what it dropped, and why it chose not to intervene:
RedHop Decision Report══════════════════════
Decision: Auto → passthrough (small context, no intervention needed)
Why: - 1,240 tokens — below the dilution gate (1,500 tokens) - pruning a small clean context risks dropping reasoning evidence Result: - kept all 8 retrieved chunks - evidence retained 100%, second-hop links preservedRead fields off the object: ctx.report.auto_decision, ctx.report.total_tokens,
ctx.report.retained_evidence_ratio. Or call doc.analyze(query) to get the
report without assembling the context.
You have a contract.pdf and one question: “What is the governing law?” Here’s
the code path to get the LLM the right context in each library — same answer
quality, with the full head-to-head benchmark on the
comparison page.
import redhopfrom openai import OpenAI
query = "What is the governing law?"
ctx = redhop.Document.from_file("contract.pdf").context(query)# parsed, chunked, retrieved, and token-budgeted internally
response = OpenAI().responses.create( model="gpt-4o-mini", input=f"{ctx.text()}\n\nQuestion: {query}",)print(response.output_text)What you stand up: nothing. Point it at the file and ask; parsing, chunking, retrieval, and token-budgeting happen inside — and every call returns a Decision Report explaining what it kept and why.
from langchain_community.document_loaders import PyMuPDFLoaderfrom langchain_text_splitters import RecursiveCharacterTextSplitterfrom langchain_openai import OpenAIEmbeddings, ChatOpenAIfrom langchain_community.vectorstores import FAISSfrom langchain_core.prompts import ChatPromptTemplatefrom langchain.chains import create_retrieval_chainfrom langchain.chains.combine_documents import create_stuff_documents_chain
query = "What is the governing law?"
pages = PyMuPDFLoader("contract.pdf").load()chunks = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200,).split_documents(pages)
store = FAISS.from_documents(chunks, OpenAIEmbeddings())retriever = store.as_retriever(search_kwargs={"k": 4})
prompt = ChatPromptTemplate.from_template( "Answer using only the context.\n\n{context}\n\nQuestion: {input}")combine = create_stuff_documents_chain(ChatOpenAI(model="gpt-4o-mini"), prompt)chain = create_retrieval_chain(retriever, combine)
print(chain.invoke({"input": query})["answer"])What you stand up: a splitter (you choose
chunk_size/overlap), an embedding model, a FAISS vector
store, a retriever, a prompt template, and a retrieval chain — six wired pieces,
and embeddings cost a call per chunk.
from llama_index.core import VectorStoreIndex, Settingsfrom llama_index.core.node_parser import SentenceSplitterfrom llama_index.readers.file import PyMuPDFReaderfrom llama_index.embeddings.openai import OpenAIEmbeddingfrom llama_index.llms.openai import OpenAI
query = "What is the governing law?"
Settings.embed_model = OpenAIEmbedding()Settings.llm = OpenAI(model="gpt-4o-mini")
docs = PyMuPDFReader().load(file_path="contract.pdf")
index = VectorStoreIndex.from_documents( docs, transformations=[SentenceSplitter(chunk_size=512, chunk_overlap=50)],)
engine = index.as_query_engine(similarity_top_k=4)print(engine.query(query))What you stand up: a node parser, an embedding model, a vector index, and a query engine. Cleaner than LangChain, but still an embed-and-index pipeline you own and pay for.
Three configurations cover the practical space. Pick by what your docs look like, not by what feels sophisticated. Most RAG libraries push you to a vector DB before you have a reason to need one — RedHop’s defaults assume you don’t.
Default — for most docs. Code, API refs, internal docs, runbooks, financial reports, handbooks, mixed folders: the words in the question are the words in the answer.
doc = redhop.Document.from_file("contract.pdf")ctx = doc.context("What is the governing law?")No model download, no ONNX runtime, ~50ms warm queries.
Structured docs with parallel clauses. A contract with “EU override of §X”, “UK override of §X”; a policy with per-region sub-sections. Heading awareness disambiguates them; a small embedding model handles the semantic mapping.
doc = redhop.Document.from_file("msa.pdf", retrieval="hybrid", model="bge-small")ctx = doc.context("What law applies in the UK?", include_heading=True, neighbors=1)~80MB embedding model on first run, then cached.
Synonym-heavy corpora. Support FAQs, HR KBs — anywhere queries and
answers reliably share no surface words. Cross-encoder reads each
(query, passage) pair jointly, at the cost of 5–10× query latency.
Verify it helps on your corpus before adopting; it isn’t always worth it.
doc = redhop.Document.from_file("support.md", retrieval="hybrid", model="bge-small", rerank="cross-encoder")Full decision guide with trade-offs and query-writing tips: Choosing a configuration →
Reads your files, out of the box
from_file parses PDF, DOCX, PPTX, XLSX, Markdown, and code natively —
no parser to wire up. Or point from_folder at a whole directory,
indexed once and reloaded incrementally.
A Rust core, in-process
One install (pip / npm / cargo) gives you the same Rust engine.
Chunking, indexing, retrieval, and token-budgeting run in-process in
milliseconds — no service, no network round-trip. A whole contract is
query-ready in about a millisecond; see the numbers.
Conditional & measured
It prunes only when the context is large and diluted, and leaves small ones alone. Every default traces to a benchmark in docs/findings/.