← All guides
Build guides

RAG Over Live Web Pages (2026 Guide)

Ground your LLM in the live web, not a stale vector store — search, fetch clean markdown, and answer with citations through one Auxiliar API key.

Updated 2026-06-30 · Auxiliar

Classic RAG retrieves from a vector store you built ahead of time. That’s perfect for your own documents — and useless for anything that changed this morning. Live-web RAG flips it: retrieve from the open web at query time, so the model answers from current sources. This guide builds it with one Auxiliar key handling both the retrieval (search) and the fetch (scrape-to-markdown).

Why fetch the page, not just the snippet?

Search APIs return short snippets. For grounding, snippets are thin — they drop the numbers, caveats and context that make an answer correct. So the reliable pattern is: search to find the right pages, then fetch those pages as clean markdown and feed the real text to the model. The fetch step is where scrape quality matters; garbled HTML poisons the answer.

The pipeline

import os, requests
from anthropic import Anthropic

AUX = "https://api.auxiliar.ai"
H = {"Authorization": f"Bearer {os.environ['AUXILIAR_API_KEY']}"}
llm = Anthropic()

def retrieve(question, k=4):
    # Tavily is agent-native: returns relevance-scored results in one call.
    r = requests.post(f"{AUX}/tavily/search", headers=H,
                      json={"query": question, "max_results": k}, timeout=30)
    r.raise_for_status()
    return [hit["url"] for hit in r.json().get("results", [])]

def to_markdown(url):
    # Firecrawl returns clean, LLM-ready markdown (top markdown quality in our benchmark).
    r = requests.post(f"{AUX}/firecrawl/v1/scrape", headers=H,
                      json={"url": url, "formats": ["markdown"]}, timeout=60)
    return r.json().get("data", {}).get("markdown", "") if r.ok else ""

def answer(question):
    docs = [(u, to_markdown(u)) for u in retrieve(question)]
    context = "\n\n".join(f"[{i+1}] {u}\n{md[:5000]}" for i, (u, md) in enumerate(docs) if md)
    msg = llm.messages.create(
        model="claude-sonnet-5", max_tokens=800,
        messages=[{"role": "user", "content":
            f"Using only the sources below, answer and cite with [n]. If unsure, say so.\n\n"
            f"Q: {question}\n\n{context}"}])
    return msg.content[0].text

print(answer("What's the current status of the EU AI Act's rules for general-purpose models?"))

No vector database, no embeddings, no re-indexing job — just fresh retrieval at query time. Add embeddings later if you want to rank or cache; for many agents, live retrieval alone is enough.

Getting the retrieval right

The quality ceiling of live-web RAG is set by two choices, and both are one-line swaps on the gateway:

  • The index. Tavily is agent-native; Exa is neural/semantic; Serper is raw Google. They surface different pages for the same query. Compare them in best search API for AI agents.
  • The reader. Markdown quality varies a lot between scrapers, and on protected sites the fetch can fail entirely. See Firecrawl vs Jina for the two leading URL-to-markdown options, and scrape without getting blocked for hard targets.

Because retrieval and fetch share one Auxiliar key, you can mix and match — Tavily to find, Firecrawl to read, a stealth scraper as fallback — without a single extra signup.

One key. Every provider on this page.

Stop juggling signups and invoices. One Auxiliar API key calls all of them — upstream keys injected server-side, usage billed to a single balance. Swap the base URL and go.

curl https://api.auxiliar.ai/tavily/search \
  -H "Authorization: Bearer $AUXILIAR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "latest changes to the eu ai act"}'

Keep building