Skip to content

Speed

RedHop runs in your process over a Rust core — no network round-trip, no service to call — so the numbers below are dominated by real work (parsing, indexing, scoring), not overhead. All measurements are CPU-only on a single machine with a warm index; absolute milliseconds drift ~10–15% run-to-run, so read the shape, not the last digit.

The default lexical tier (BM25) needs no model and no embedding step, so a document is queryable almost immediately:

contract.pdf path, ~189k tokensRedHop (BM25)
time to first answer0.02s
warm per-query~1ms

Each query also prunes to budget and emits a Decision Report, so it does more than a bare retriever and still answers in about a millisecond.

Reproduce: cargo run -p redhop-examples --example eval_cuad_documents --release

Semantic — a one-time cost, then fast forever

Section titled “Semantic — a one-time cost, then fast forever”

The opt-in semantic / hybrid tiers embed your chunks once (cached), then score every query by exact cosine over those cached vectors. So the cost is setup once, fast forever:

corpusembed-all (one-time setup)warm per-query
~13k tokens (1 contract)~2s~6ms
~38k tokens (5 contracts)~7s~6ms
~189k tokens (15 contracts)~17s~6ms

Warm queries land at ~6ms — the query embedding dominates, and exact cosine over the cached vectors is cheap. The only real cost is embedding everything up front, and you pay it only if you opt into a dense tier — the lexical default skips it entirely. With from_folder(persist=True) the embeddings are written to disk, so the embed-all is paid once and reloaded on every later run.

Reproduce: bench/.venv/bin/python bench/speed_compare.py

The most important property for interactive use: per-query time barely moves as the document gets bigger — BM25 lookup is independent of corpus size, so a 4,000-page PDF answers as fast as a 1-page one once it’s loaded. Time-to-first-answer is dominated by parsing the PDF (~2.5ms/page, linear), with chunking, indexing, and the query negligible on top:

PagesChunksTime to first answerWarm query
1,0001,0002.3s~2ms
2,0002,0005.0s~2ms
4,0004,00011.5s~2ms

A thousands-of-page document is fully interactive after its one-time load. (Adding the semantic tier adds the embed-all — ~11s per 1,000 chunks — which persist=True makes a one-time cost.) Measured on synthetic PDFs via from_file on the lexical default — a latency measurement (parse + index + query), not an answer-quality one.

Reproduce: bench/.venv/bin/python bench/large_pdf.py · bench/.venv/bin/python bench/large_pdf.py --semantic


Speed is one axis; answer quality and evidence retention are the other — those, with the head-to-head against LangChain and LlamaIndex, live on the Benchmarks page.