Rust library for RAG: the landscape, what's mature, and how to choose

If you’re building RAG and you’d rather not pull in Python, this guide is for you. It’s a tour of the Rust RAG ecosystem as of 2026: what’s mature, what’s half-built, what’s missing entirely, and which library to pick for which job. We’ll cover end-to-end RAG libraries, the building blocks (embeddings, ML frameworks, tokenizers), and the vector databases that work natively from Rust.

We’ll be honest about the tradeoffs. Rust isn’t strictly better than Python for RAG. It’s better for specific reasons, in specific deployments. By the end of this guide you should know whether you’re in that “specific” or not.

Why pick Rust for RAG?

Three real reasons, plus one honest pushback:

1. You already ship Rust. This is the most common case. You have a Rust service (an Axum API, a Tauri desktop app, a CLI, an edge worker) and you need to add document QA or semantic search. Calling out to a Python sidecar is a deployment headache. A Rust crate fits the existing build.

2. Latency and resource footprint matter. Rust RAG runtimes can answer warm queries in single-digit milliseconds with tiny memory footprints. For in-process retrieval at the edge (Cloudflare Workers via WASM, on-device mobile, embedded), Python isn’t an option. Rust is the default.

3. Static binary deployment. Ship one file with no Python interpreter, no pip dependencies, no system Python version to manage. For finance / legal / health / air-gapped environments, this is often the shape that gets the security review through.

The honest pushback: Python’s RAG ecosystem is far deeper. LangChain, LlamaIndex, and Haystack have hundreds of integrations, mature evaluation harnesses, and the bulk of the community. If your stack is already Python and you’re not constrained by latency or deployment, picking Rust is usually trading away ecosystem breadth for a marginal win. Pick Rust when the deployment / integration / footprint reasons are real, not because “Rust is faster” in a benchmark.

The Rust RAG landscape (2026)

The ecosystem has three layers. Most people only need the top one.

Layer 1: End-to-end RAG libraries

These are the “drop in and go” libraries. You hand them documents and a query, and they handle chunking, retrieval, and context assembly.

Library	Shape	Best for
RedHop	3-call library (`load → ask → read`)	Document QA, smallest surface, in-process
rig	LLM framework (agents + RAG)	Agents, tool-use, multi-provider LLM apps
swiftide	Async indexing pipeline + RAG	Production batch ingestion at scale
anyrag	Lightweight RAG with adapters	Smaller projects, pick-and-mix backends

Each one is taking a different bet about what RAG-in-Rust should look like.

RedHop is the smallest surface. Three calls: Document::from_file(path), doc.context(query), then read the assembled prompt + Decision Report off the returned context. BM25 by default (no model download, no ONNX runtime), optional dense retrieval via a small embedding model. Designed for document context optimization: turning a file and a question into the right LLM prompt, with full observability into the decision. The same library is also published to PyPI and npm, so the API is identical across Python, Node, and Rust services. Best when you want one bounded step you can drop into a larger app.

rig is broader. It’s a Rust answer to “LangChain for Rust”: agents, tool-use, multi-step workflows, provider integrations (OpenAI, Anthropic, Cohere, Gemini, etc.). RAG is one capability among many. Best when you’re building an agent app where RAG is one tool the agent can call. Heavier surface than RedHop, closer in spirit to LangChain.

swiftide is an async indexing-and-RAG pipeline framework, focused on high-throughput document ingestion. It composes nicely with Qdrant, LanceDB, Redis, FastEmbed, and Tree-sitter for code. The pipeline DAG model is similar to Haystack’s. Best when you’re ingesting a large corpus on a schedule and want explicit pipeline composition.

anyrag and similar smaller libraries fill the long tail: minimal, pick-and-mix shapes for narrow use cases. Worth knowing about but smaller communities and surface.

Layer 2: Building blocks (embeddings, ML, tokenizers)

If you’re building your own RAG, or one of the layer-1 libraries doesn’t quite fit, these are the components you’ll wire together.

Crate	What it is
candle	HuggingFace’s pure-Rust ML framework. Run any HF model (embedders, rerankers, LLMs) in Rust
ort	Rust bindings for ONNX Runtime. The fastest way to run small embedding models in Rust
fastembed-rs	Rust port of fastembed: opinionated, easy ONNX embeddings for common models
embed_anything	Multi-modal embeddings library, supports text, images, audio
tokenizers	HuggingFace’s Rust tokenizer (the Rust core under the Python `tokenizers` package)
tiktoken-rs	Rust port of OpenAI’s `tiktoken`

candle vs ort is the big decision in this layer:

ort wraps the C++ ONNX Runtime. Faster on most CPUs for small models (BGE, MiniLM, etc.), more mature, supported by Microsoft. Slightly heavier deps (needs the ONNX Runtime shared library or the download-binaries feature).
candle is pure Rust, no native runtime. Lighter deps, runs anywhere Rust runs (including WASM-friendly targets). Smaller-to-mid model performance is comparable. Large-model performance is improving but ort is generally faster today.

If you’re running 384-dim or 768-dim sentence embedders (BGE-small, all-MiniLM, etc.), ort is the typical pick. If you need maximum portability or want to avoid any non-Rust deps, candle is the answer.

Layer 3: Vector databases (Rust-native or Rust-callable)

If you need persistent vector storage at scale, you’ll reach for one of these. (As we’ll discuss below, you often don’t need one for document QA.)

DB	Shape	Best for
LanceDB	Embedded columnar vector DB, Rust-native	Single-process, embedded use, mid-scale
Qdrant (server + Rust client)	Standalone vector DB written in Rust	Production at scale, cloud or self-host
Chroma (Rust client)	Open-source vector DB	Prototyping, Python-first but works from Rust
usearch	Embedded HNSW index in Rust	Lightweight ANN over in-memory or mmap vectors
pgvector + sqlx/diesel	Postgres extension via Rust DB libs	Already have Postgres, want vectors in the same DB

LanceDB is the most idiomatic Rust choice for an embedded vector DB today: Rust-native, columnar, handles updates and filters well, supports both flat scan and ANN. Qdrant is the production-grade server when you’ve outgrown embedded.

Worth saying explicitly: for document QA, you often don’t need a vector DB at any tier. BM25 with an in-memory inverted index handles most keyword-dense corpora (contracts, API references, runbooks, handbooks) just fine, and exact cosine over an in-memory chunk array handles moderate-sized semantic queries without ANN. The “you need a vector DB” assumption comes from Python tutorials defaulting to one. It doesn’t follow from the math. (RedHop’s whole pitch is built on this observation. See the Comparison page for measured numbers across real corpora.)

A working example: RAG in 30 lines of Rust

Here’s a complete, runnable RAG pipeline with RedHop. From cargo new to asking a question about a contract.

Cargo.toml:

[package]
name = "my-rag"
version = "0.1.0"
edition = "2021"

[dependencies]
redhop = { version = "0.2", features = ["files", "semantic"] }
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }

src/main.rs:

use redhop::read_file;

fn main() -> redhop::Result<()> {
    // 1. Load -- a single file (PDF / DOCX / PPTX / XLSX / Markdown / text /
    //    code). Parsing, chunking, and indexing all happen here.
    let mut doc = read_file("contract.pdf")?;

    // 2. Ask -- chunking, retrieval, and token-budgeting happen in-process.
    let ctx = doc.context("What is the governing law?")?;

    // 3. Read the assembled prompt + the Decision Report.
    println!("Prompt for the LLM:\n{}", ctx.text());

    for c in &ctx.citations {
        println!("  cited: {} p{:?} {:?}",
            c.source, c.page, c.heading);
    }

    println!("\nDecision: {}", ctx.report.auto_decision);
    println!("Tokens: {}, evidence retained: {:.0}%",
        ctx.report.total_tokens,
        ctx.report.retained_evidence_ratio * 100.0);

    Ok(())
}

cargo run, and that’s the whole thing. The files feature pulls in the built-in PDF/DOCX/PPTX/XLSX/text/code parsers. The semantic feature adds the optional ONNX embedder for retrieval="hybrid" (off by default so the lean build pulls only BM25 + Tantivy).

If you want to pass the prompt to an LLM:

let response = openai_client.chat_completion(&ctx.text()).await?;

ctx.text() is just a String: you can hand it to any LLM provider’s Rust crate (async-openai, anthropic-sdk, the OpenAI-compatible endpoints exposed by many providers, etc.). RedHop doesn’t bundle the LLM call. It owns the context step.

What about hybrid / semantic retrieval?

Add retrieval and model to the loader. The first run downloads the embedding model (~80MB for bge-small). Subsequent runs hit the cache.

use redhop::{read_file_with, LoadOptions};

let mut doc = read_file_with("contract.pdf", &LoadOptions {
    retrieval: Some("hybrid".into()),
    model: Some("bge-small".into()),
    ..Default::default()
})?;

For structured docs with parallel clauses (regional overrides etc.), pair hybrid with context_with(.., include_heading: true, neighbors: 1). There’s a decision guide covering when each configuration is the right pick.

A working example with rig (for comparison)

If you’re building agent-shaped applications and want RAG as one piece of a larger flow, rig is the more natural fit. Rough shape:

use rig::providers::openai;
use rig::vector_store::in_memory_store::InMemoryVectorStore;
use rig::embeddings::EmbeddingsBuilder;

let openai_client = openai::Client::from_env();
let embedding_model = openai_client.embedding_model("text-embedding-3-small");

let embeddings = EmbeddingsBuilder::new(embedding_model.clone())
    .documents(documents)?
    .build()
    .await?;

let store = InMemoryVectorStore::from_documents(embeddings);
let agent = openai_client.agent("gpt-4o-mini")
    .preamble("You are a helpful assistant.")
    .dynamic_context(4, store.index(embedding_model))
    .build();

let response = agent.prompt("What is the governing law?").await?;

Compared to RedHop: rig is heavier on the LLM-provider integration side and lighter on the document-context side. You bring your own loader, chunker, and vector store. In return you get agent + tool-calling shapes out of the box. Roughly: rig::RedHop ≈ LangChain::LlamaIndex in Python.

A working example with swiftide (for comparison)

If you’re ingesting a corpus at scale and want an explicit indexing pipeline, swiftide is the right shape:

use swiftide::indexing::{Pipeline, transformers, loaders};
use swiftide::integrations::{qdrant::Qdrant, fastembed::FastEmbed};

Pipeline::from_loader(loaders::FileLoader::new("./docs"))
    .then_chunk(transformers::ChunkMarkdown::default())
    .then_in_batch(50, transformers::Embed::new(FastEmbed::default()))
    .then_store_with(Qdrant::try_from_url("http://localhost:6334")?.build()?)
    .run()
    .await?;

Then a query pipeline does retrieval + answer assembly. Best when the ingestion side is the heavy lift: large corpora, scheduled refreshes, integration with Qdrant / LanceDB / Redis.

Where the Rust RAG ecosystem is still weaker than Python

We promised honesty, so:

Evaluation harnesses. Python has TruLens, RAGAS, deepeval, Haystack’s eval framework, LlamaIndex evaluators. Rust has… benchmark scripts in individual repos. If you need rigorous RAG eval out of the box, Python still wins.
Document loader breadth. LlamaIndex’s LlamaHub and LangChain’s community loaders cover hundreds of sources (Notion, Slack, S3, Confluence, ServiceNow, …). Rust libraries mostly cover the basics (files, folders, URLs). Anything exotic, you’ll write yourself.
LLM provider integration breadth. rig leads here in Rust with ~10 providers. LangChain has 100+. For obscure providers, expect to wire the HTTP client yourself.
Multi-step / agentic flows. LangGraph and LlamaIndex’s agent frameworks are far more mature than rig’s agent layer. If your problem needs sophisticated tool-calling + planning, the Python ecosystem is still the right place.
Notebooks / experimentation. Iterating in Jupyter is faster than iterating in a Rust crate. For prompt engineering and config tuning, many teams prototype in Python even if production is Rust.

If any of those gaps matters to you, the right answer is often a Rust-Python hybrid: Rust for the latency-sensitive service path, Python for evaluation / experimentation. Or just pick Python.

When to pick Rust for RAG (and when not to)

A quick decision table:

Your situation	Pick
Existing Rust service that needs document QA	Rust (RedHop or rig)
Edge / WASM / on-device deployment	Rust
Need single-static-binary deployment for security review	Rust
Sub-10ms warm-query budget	Rust
Prototyping or experimenting	Python (LangChain / LlamaIndex)
Need 100+ LLM provider integrations	Python (LangChain)
Need exotic document loaders (Notion, ServiceNow, …)	Python (LlamaHub)
Sophisticated agent / tool-calling flows	Python (LangGraph / LlamaIndex)
Doing serious RAG evaluation	Python (RAGAS / TruLens)
Stack is already Python and there’s no specific reason to switch	Python

Quick library-by-library decision

Among Rust RAG libraries specifically:

Document QA with a small API surface and observability → RedHop. Three calls, BM25 default, Decision Report, in-process. Best for “add document QA to an existing Rust service.”
Agents that call tools and choose actions, with RAG as one capability → rig. Closer to LangChain in scope, and multi-provider LLM support is its strongest suit.
Production indexing pipelines for large corpora → swiftide. Async pipelines, batch ingestion, integration with Qdrant / LanceDB / Redis. Heaviest of the three but production-shaped.

Resources

RedHop quickstart (Rust): the three-call surface
Choosing a configuration: when to use which retrieval tier
llms.txt: single-file context for AI coding agents
RedHop on crates.io
rig docs
swiftide docs
candle and ort: embedding model runtimes
LanceDB: embedded vector DB

If you’re new to RAG itself (not just the Rust side), start with the Intro to RAG guide and the Retrieval & context tips for the parts that aren’t language-specific. If you’re weighing Node.js against Rust for your RAG service, the Node.js library for RAG guide covers that side with the same shape.