Build a RAG app with RedHop
You’re going to build a small program that loads a PDF, takes a question, and sends an LLM the context it needs to answer well. Three calls do the work. The whole script is around thirty lines.
You’ll need Python 3.9+ or Node 18+ or Rust 1.75+, an API key for any LLM provider (OpenAI in the examples below), and a PDF to ask questions about. Any text or markdown file works too. The code path is the same.
Install
Section titled “Install”pip install redhop openainpm install redhop openaiThe Node binding is a native addon, so it doesn’t run on Cloudflare Workers or Vercel Edge. See the Node.js library for RAG guide if your target is an edge runtime.
cargo add redhop --features files,semanticcargo add tokio --features macros,rt-multi-threadcargo add async-openai anyhowfiles pulls in the document parsers, and semantic adds the optional
ONNX embedder for the dense retrieval tier. The lean build with just
BM25 omits both.
Set your API key in the shell:
export OPENAI_API_KEY="sk-..."Load a document
Section titled “Load a document”import redhop
doc = redhop.Document.from_file("contract.pdf")print(f"indexed {doc.n_chunks} chunks across {doc.n_files} file(s)")import { Document } from "redhop";
const doc = Document.fromFile("contract.pdf");console.log(`indexed ${doc.chunkCount} chunks`);use redhop::read_file;
let mut doc = read_file("contract.pdf")?;println!("indexed {} chunks", doc.n_chunks());from_file handles parsing, sentence-aware chunking, and an in-memory
BM25 index. A 50-page PDF is ready in a millisecond or two. There is
no vector database to provision.
Ask a question
Section titled “Ask a question”ctx = doc.context("What is the governing law of this contract?")print(ctx.text())const ctx = doc.context("What is the governing law of this contract?");console.log(ctx.text);let ctx = doc.context("What is the governing law of this contract?")?;println!("{}", ctx.text());context() runs retrieval, budgets the result against the model’s prompt
window, and returns the assembled string. The output is around a kilobyte
of relevant clauses rather than the whole 50-page document.
Show the sources
Section titled “Show the sources”for c in ctx.citations: print(f" {c['source']} p{c['page']} {c['heading']}")
print()print(ctx.report)for (const c of ctx.citations) { console.log(` ${c.source} p${c.page ?? "?"} ${c.heading ?? ""}`);}
console.log();console.log(ctx.report.rendered);for c in &ctx.citations { println!(" {} p{:?} {:?}", c.source, c.page, c.heading);}
println!();println!("{}", ctx.report.rendered);ctx.citations is a list with one entry per chunk that made it into
the context. The fields are source, page, heading, line, and
the raw text. Render them however the UI wants.
The report on the same object describes the assembly decision. For a small clean context it passes the input through unchanged. For a large diluted one it prunes. Either way you see which path it took:
RedHop Decision Report══════════════════════
Decision: Auto → passthrough (small context, no intervention needed)
Why: - 1,240 tokens, below the dilution gate (1,500 tokens) - pruning a small clean context risks dropping reasoning evidence Result: - kept all 8 retrieved chunks - evidence retained 100%, second-hop links preservedCall the LLM
Section titled “Call the LLM”from openai import OpenAI
query = "What is the governing law of this contract?"response = OpenAI().chat.completions.create( model="gpt-4o-mini", messages=[{ "role": "user", "content": f"{ctx.text()}\n\nQuestion: {query}", }],)print(response.choices[0].message.content)import OpenAI from "openai";
const query = "What is the governing law of this contract?";const response = await new OpenAI().chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: `${ctx.text}\n\nQuestion: ${query}` }],});console.log(response.choices[0].message.content);use async_openai::{Client, types::{ CreateChatCompletionRequestArgs, ChatCompletionRequestUserMessageArgs,}};
let query = "What is the governing law of this contract?";let req = CreateChatCompletionRequestArgs::default() .model("gpt-4o-mini") .messages([ChatCompletionRequestUserMessageArgs::default() .content(format!("{}\n\nQuestion: {}", ctx.text(), query)) .build()? .into()]) .build()?;
let response = Client::new().chat().create(req).await?;println!("{}", response.choices[0].message.content.as_deref().unwrap_or(""));The prompt string is yours to send anywhere. OpenAI, Anthropic, a local Ollama, your own model. RedHop never makes the LLM call itself, which keeps the library single-purpose and lets you change providers without touching retrieval.
The whole script
Section titled “The whole script”import redhopfrom openai import OpenAI
QUERY = "What is the governing law of this contract?"
doc = redhop.Document.from_file("contract.pdf")
ctx = doc.context(QUERY)for c in ctx.citations: print(f" {c['source']} p{c['page']} {c['heading']}")
response = OpenAI().chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": f"{ctx.text()}\n\nQuestion: {QUERY}"}],)print(response.choices[0].message.content)
print()print(ctx.report)python rag.py
import { Document } from "redhop";import OpenAI from "openai";
const QUERY = "What is the governing law of this contract?";
const doc = Document.fromFile("contract.pdf");
const ctx = doc.context(QUERY);for (const c of ctx.citations) { console.log(` ${c.source} p${c.page ?? "?"} ${c.heading ?? ""}`);}
const response = await new OpenAI().chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: `${ctx.text}\n\nQuestion: ${QUERY}` }],});console.log(response.choices[0].message.content);
console.log();console.log(ctx.report.rendered);node rag.mjs
use redhop::read_file;use async_openai::{Client, types::{ CreateChatCompletionRequestArgs, ChatCompletionRequestUserMessageArgs,}};
#[tokio::main]async fn main() -> anyhow::Result<()> { let query = "What is the governing law of this contract?";
let mut doc = read_file("contract.pdf")?;
let ctx = doc.context(query)?; for c in &ctx.citations { println!(" {} p{:?} {:?}", c.source, c.page, c.heading); }
let req = CreateChatCompletionRequestArgs::default() .model("gpt-4o-mini") .messages([ChatCompletionRequestUserMessageArgs::default() .content(format!("{}\n\nQuestion: {}", ctx.text(), query)) .build()?.into()]) .build()?; let response = Client::new().chat().create(req).await?; println!("{}", response.choices[0].message.content.as_deref().unwrap_or(""));
println!("\n{}", ctx.report.rendered); Ok(())}cargo run
What to try next
Section titled “What to try next”A folder of files instead of a single document:
Document.from_folder("./docs", persist=True). The persistent index
makes subsequent loads sub-second on large corpora.
Hybrid retrieval for queries the keyword tier misses, such as HR FAQs
or support knowledge bases where users phrase things differently from
the docs: pass retrieval="hybrid", model="bge-small". The choice
between tiers is covered in Choosing a configuration.
Wrapping this behind an HTTP service: Deploy to production.
More patterns by use case: Examples.
Tested against RedHop 0.3.x.