Skip to content

Quickstart

RedHop has the same API in three languages — pick your tab; the choice follows you down the page.

Terminal window
pip install redhop

One package, no services, no vector DB. Document parsing (PDF/DOCX/PPTX/XLSX) and the optional semantic model are built in.

Point RedHop at a file. It parses, chunks, and indexes it, then hands you back just the context your question needs — which you give to any LLM:

import redhop
from openai import OpenAI
doc = redhop.Document.from_file("contract.pdf") # parse + chunk + index
question = "What is the governing law of this contract?"
ctx = doc.context(question)
# Hand ctx.text() to any provider — no lock-in.
resp = OpenAI().chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Use only this context:\n\n{ctx.text()}\n\nQ: {question}"}],
)
print(resp.choices[0].message.content)
print(ctx.report) # the Decision Report ↓

Every call explains itself — including when RedHop deliberately does nothing:

RedHop Decision Report
══════════════════════
Decision: Auto → passthrough (left the context intact)
Why:
- input is small: 91 tokens ≤ 1500 gate
- under headroom, pruning is measured to be wash-to-harmful
- intervention predicted to add no signal density here
Result:
- kept all retrieved chunks — full evidence preserved
- avoided unnecessary intervention
Economics retrieved / final tokens, savings, density, retained evidence
Diagnostics chunks, distractor ratio, second-hop rescues, …

The decision is also available programmatically:

ctx.report.auto_decision # "passthrough" | "prune"
ctx.report.total_tokens
ctx.report.retained_evidence_ratio

Every selected chunk remembers where it came from, so you can show the model’s evidence trail, not just paste it:

for c in ctx.citations:
print(c["source"], c["page"]) # e.g. contract.pdf 3 → "from contract.pdf, p.3"

Loading a file is the quickest start, but it’s one of several on-ramps — all return a Document:

# Text you already have (your own parser/OCR, a DB field).
doc = redhop.Document.from_text(open("notes.md").read())
# Already chunked it yourself.
doc = redhop.Document.from_chunks(["clause one …", "clause two …"])
# A whole folder — one combined index, citations per file.
doc = redhop.Document.from_folder("./docs")
# Bytes from S3 / Azure / GCS / HTTP.
doc = redhop.Document.from_bytes(s3_object_bytes, source="contract.pdf")

See all the loaders → — including a persistent, incremental on-disk index over thousands of files.

doc = redhop.Document.from_file(
"contract.pdf",
chunk_size=128, # index-time: how the doc is split
strategy="auto", # size-gated: prune only under dilution
)
ctx = doc.context(query, budget=2000) # query-time: vary freely, no re-indexing

chunk_size is fixed at construction (it’s how the index is built); the per-query budget is free to vary. Every parameter has a default — see Options for the full list.

Next: Loaders — every way to get documents in · Overview — the one idea, and how it works · Retrieval options — when BM25 isn’t enough.