← All guides
Build guides

How to Build a Research Agent (Search → Scrape → Synthesize)

Build a web research agent that searches, reads the best sources as clean markdown, and writes a cited answer — all through one Auxiliar API key.

Updated 2026-06-30 · Auxiliar

A research agent does what a person does when they investigate a question: search, open the most promising results, read them, and write up the answer with sources. This guide builds that loop in plain Python — search → scrape → synthesize — using one Auxiliar key for every web call, so you can pick the best provider for each step without collecting API keys.

The architecture

Three web-access steps, one LLM step:

  1. Search the web for the question (Serper — fast, cheap Google).
  2. Read the top results as clean markdown (Firecrawl — best markdown quality in our benchmark).
  3. Synthesize an answer from the collected text, with citations.

All web calls go through https://api.auxiliar.ai/<provider>/... on a single key.

The code

import os, requests
from anthropic import Anthropic

AUX = "https://api.auxiliar.ai"
H = {"Authorization": f"Bearer {os.environ['AUXILIAR_API_KEY']}"}
llm = Anthropic()

def search(query, k=5):
    r = requests.post(f"{AUX}/serper/search", headers=H, json={"q": query}, timeout=30)
    r.raise_for_status()
    return [o["link"] for o in r.json().get("organic", [])[:k]]

def read(url):
    r = requests.post(f"{AUX}/firecrawl/v1/scrape", headers=H,
                      json={"url": url, "formats": ["markdown"]}, timeout=60)
    if not r.ok:
        return ""
    return r.json().get("data", {}).get("markdown", "")[:6000]

def research(question):
    urls = search(question)
    sources = [(u, read(u)) for u in urls]
    context = "\n\n".join(f"SOURCE {u}:\n{md}" for u, md in sources if md)
    msg = llm.messages.create(
        model="claude-sonnet-5", max_tokens=1024,
        messages=[{"role": "user", "content":
            f"Answer the question using only these sources. Cite each claim with its URL.\n\n"
            f"QUESTION: {question}\n\n{context}"}],
    )
    return msg.content[0].text

print(research("What are the most-used open-source AI agents in 2026?"))

That’s a working research agent in ~30 lines. It searches, reads the real pages (not just snippets), and grounds the answer in them.

Making it better

  • Harder targets: if a source is behind anti-bot protection and read() comes back empty, retry through a stealth scraper — see scrape without getting blocked. Because every scraper is on the same key, a fallback is one line.
  • Better sources: swap Serper for an agent-native index like Tavily (/tavily/search) or a neural one like Exa (/exa/search) — compare them in best search API for AI agents.
  • Skip the loop entirely: some providers do the whole search-and-synthesize step in one call. See the best AI answer & research API ranking; you can call those through the same key too.

The point of the gateway is that all of these are experiments you run by changing a path — not new accounts you open.

One key. Every provider on this page.

Stop juggling signups and invoices. One Auxiliar API key calls all of them — upstream keys injected server-side, usage billed to a single balance. Swap the base URL and go.

curl https://api.auxiliar.ai/serper/search \
  -H "Authorization: Bearer $AUXILIAR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"q": "state of ai agents 2026"}'

Keep building