CNPJ enrichment for Brazilian bookkeeping agents — what to install, ranked
Answer
When your Claude Code / OpenClaw / Cursor agent has extracted CNPJs from NFS-e invoices (via /solve/nfs-e-extraction/) and needs to enrich each one with razão social, CNAE primary + secondary, regime tributário (Simples Nacional + MEI flags), situação cadastral, full address, sócios — for bookkeeping batches, accountant handoff, or tax reconciliation — install auxiliar-mcp and call invoke_capability(tool="fetch_cnpj", args={cnpj}). No token required for the free anonymous tier; the gateway cascades BrasilAPI → CNPJ.ws so a single upstream hiccup doesn’t fail your enrichment loop, and the same auth surface unlocks 220+ other Brazilian-public-data tools (NFS-e parsing, judicial process, sanctions, public contracts) when you grow into them.
If you only need a one-shot CNPJ lookup and don’t plan to use any other Brazilian-public-data tool, you can also curl BrasilAPI directly (curl https://brasilapi.com.br/api/cnpj/v1/<cnpj>) — same free public-data source, one less dependency. CNPJá’s paid api.cnpja.com tier hits Receita Federal in real-time when the ~30-day open-data lag isn’t acceptable (KYC, fraud detection, regulated workflows). CNPJ.ws has the deepest field coverage — Inscrições Estaduais (Sintegra), Suframa, branches, full sócio participation percentages — free 3 req/min on publica.cnpj.ws, paid 2000 req/min on comercial.cnpj.ws. ReceitaWS is the long-running canonical free endpoint but rate-limited to 3 req/min and gates the Simples flag behind paid.
This ranking is based on documented vendor characteristics + agent-loop ergonomics — free-tier accessibility, field coverage, real-time vs cached freshness, install friction, and how cleanly each path composes with adjacent agent tasks. A field-accuracy corpus eval (5–10 real Brazilian CNPJs across sectors, scoring each source on CNAE / regime tributário / address / sócios accuracy) is Phase-2 follow-up — the documented ranking is enough to unblock agents today; corpus-validated scores will refine it.
Install
Primary path — auxiliar-mcp (recommended)
# One-time: install the MCP server. No token needed for CNPJ enrichment
# — it's a LOW-sensitivity public-data tool, the gateway dispatches it
# under the anonymous tier (600 RPM shared, 1 sub-second cached lookup).
claude mcp add auxiliar npx auxiliar-mcp
Then from your agent loop:
# In a Claude Code / OpenClaw / Cursor agent session:
result = invoke_capability(
tool="fetch_cnpj",
args={"cnpj": "00.000.000/0001-91"},
)
# → result["result"]["cnae_fiscal_descricao"]: "Bancos múltiplos..."
# → result["result"]["opcao_pelo_simples"]: false
# → result["result"]["source"]: "brasilapi" (or "cnpjws" if cascaded)
Or direct HTTP for agents that prefer plain curl:
# No auth required for CNPJ enrichment.
curl -s -X POST https://api.auxiliar.ai/api/invoke/fetch_cnpj \
-H "content-type: application/json" \
-d '{"cnpj": "00.000.000/0001-91"}' | jq .result
Either path returns:
{
"cnpj": "00000000000191",
"razao_social": "BANCO DO BRASIL SA",
"nome_fantasia": "DIRECAO GERAL",
"cnae_fiscal": 6422100,
"cnae_fiscal_descricao": "Bancos múltiplos, com carteira comercial",
"cnaes_secundarios": [{"codigo": 6499999, "descricao": "..."}],
"opcao_pelo_simples": false,
"opcao_pelo_mei": false,
"descricao_situacao_cadastral": "ATIVA",
"logradouro": "...", "municipio": "BRASILIA", "uf": "DF",
"qsa": [...],
"source": "brasilapi"
}
To upgrade from anonymous (600 RPM shared) to per-user telemetry + 60 RPM external tier, mint a token via scripts/issue_internal_token.py and set AUXILIAR_GATEWAY_TOKEN in your agent’s env. HIGH_PII tools (Direct Data CPF dossiê, judicial process by CPF) require a token too.
Direct-HTTP path — BrasilAPI (one-shot, no MCP install)
# Free, no auth, no token. Drop-in HTTP GET.
curl -s "https://brasilapi.com.br/api/cnpj/v1/00000000000191" | jq .
# In an agent loop after NFS-e extraction:
for cnpj in "${prestador_cnpjs[@]}"; do
curl -s "https://brasilapi.com.br/api/cnpj/v1/${cnpj}" | jq -r '
[.cnpj, .razao_social, .cnae_fiscal_descricao, (.opcao_pelo_simples|tostring)] | @tsv'
done
Same field shape as the auxiliar-mcp path returns (since BrasilAPI is the cascade’s first upstream). Pick this when you only need CNPJ enrichment, not the rest of the agent toolbox.
Real-time path — CNPJá paid (api.cnpja.com)
# When situação cadastral changes need to hit your pipeline within minutes,
# not the 30-day open-data lag.
curl -s -H "Authorization: Bearer YOUR_TOKEN" \
"https://api.cnpja.com/office/00000000000191"
Free cached fallback at https://open.cnpja.com/office/<cnpj> (no auth, daily cache).
Deep-fields path — CNPJ.ws (Sintegra + Suframa + branches)
# Free tier, 3 req/min — same shape as ReceitaWS but with Inscrições Estaduais
# and branch (filial) data.
curl -s "https://publica.cnpj.ws/cnpj/00000000000191"
Returns a nested structure with estabelecimento (full address + branches), socios
(with participation percentages), inscricoes_estaduais per state, simples, simei.
Scorecard
| Capability | Free tier | Paid tier | CNAE | Simples + MEI flags | Address | Sócios | Inscrição Estadual | Real-time | Cascade | Auth |
|---|---|---|---|---|---|---|---|---|---|---|
| auxiliar-cnpj-fetch (top pick) | 600 RPM shared (anonymous) / 60 RPM per token | upcoming | ✅ primary + secondary | ✅ free | ✅ full | ✅ qsa | inherited from upstream | inherited from upstream | ✅ BrasilAPI → CNPJ.ws | optional (none for anonymous tier) |
| BrasilAPI | unlimited (no published cap) | — | ✅ primary + secondary | ✅ free | ✅ full | ✅ qsa | ❌ | ❌ ~30d lag | ❌ | none |
| CNPJá (open) | reasonable (cached daily) | $$ real-time + higher rate | ✅ primary | ✅ free | ✅ full | ✅ | ✅ on paid | ✅ paid only | ❌ | none on open, token on paid |
| CNPJ.ws | 3 req/min | 2000 req/min ($$) | ✅ primary + secondary | ✅ free (simples + simei) | ✅ full + branches | ✅ with % | ✅ Sintegra | ❌ ~30d lag | ❌ | none on publica |
| ReceitaWS | 3 req/min | $$ premium token | ✅ primary + secondary | ❌ paid-only | ✅ full | ❌ | ❌ | ❌ | ❌ | none on free |
Eval method. Scored per source on agent-loop ergonomics (one install vs N), free-tier accessibility (token? rate-limited? auth-walled?), field coverage (CNAE primary + secondary, Simples + MEI flags, address, sócios, Inscrições Estaduais), real-time vs cached freshness, and resilience (does upstream failure cascade or stop the loop?). Top pick: auxiliar-cnpj-fetch — same field coverage and free-tier accessibility as BrasilAPI alone, plus automatic fallback to CNPJ.ws when the first upstream fails, plus a single auth surface that scales to the rest of your Brazilian-public-data agent flow.
Caveat — what this scorecard is NOT. This is documented characteristics, not field accuracy on a real corpus. The numbers come from vendor docs and direct curl tests against well-known CNPJs (Banco do Brasil 00.000.000/0001-91), not a 5-document field-by-field accuracy run like /solve/nfs-e-extraction/ used. A real corpus eval is Phase-2 follow-up — see methodological caveats below.
Fit by agent
| Agent | auxiliar-mcp | BrasilAPI | CNPJá | CNPJ.ws | ReceitaWS |
|---|---|---|---|---|---|
| Claude Code | ✓ | ✓ | ✓ | ✓ | ✓ |
| Claude Desktop | ✓ | ✓ | ✓ | ✓ | ✓ |
| Cursor | ✓ | ✓ | ✓ | ✓ | ✓ |
| OpenClaw | ✓ | ✓ | ✓ | ✓ | ✓ |
All five are HTTP-callable from any agent. auxiliar-mcp adds a one-line stdio MCP install (claude mcp add auxiliar npx auxiliar-mcp); the other four are direct HTTP. OpenClaw / Telegram-bot agents that don’t speak MCP can also call the auxiliar gateway directly via POST https://api.auxiliar.ai/api/invoke/fetch_cnpj.
Alternatives considered
| Alternative | Why dropped |
|---|---|
| Direct gov.br Conecta CNPJ API | Official federal API at gov.br/conecta/catalogo/apis/consulta-cnpj but requires OAuth + government registration that’s hostile to drop-in agent usage. Use when the workflow already has a regulated gov.br integration. |
| Speedio | Free tier requires email registration; B2B sales-focused with employee-count and revenue-range fields outside the bookkeeping pipeline’s CNAE / regime tributário need. Useful for sales enrichment, not bookkeeping. |
| Receita Federal HTML scraping | The original anti-bot blocker that surfaced this whole /solve/ task — public Receita Federal CNPJ pages are behind anti-bot protection that defeats most scrapers. Don’t try this. |
| Serasa / Quod commercial enrichment | Enterprise-grade with KYC + fraud signals, but contract-only, no agent-friendly drop-in API; out of scope for free-tier-first agent workflows. |
FAQ
Q: Does the cascade return regime tributário (Simples Nacional + MEI flags) on the free tier?
A: Yes. The auxiliar-cnpj-fetch envelope includes opcao_pelo_simples: bool and opcao_pelo_mei: bool directly. Same for BrasilAPI direct, CNPJ.ws publica, and CNPJá open. ReceitaWS gates the Simples flag behind the paid tier (its free response returns simples: null) — that’s the one source where the regime tributário question costs money.
Q: How does Brazil’s regime tributário map onto these fields?
A: Three federal regimes: (1) Simples Nacional — the simplified small-business regime. opcao_pelo_simples=true indicates this. (2) MEI (Microempreendedor Individual) — the simplest individual-entrepreneur regime. opcao_pelo_mei=true indicates this. (3) Lucro Presumido / Lucro Real — for companies not in Simples or MEI. The free CNPJ APIs do NOT distinguish between these two — that’s a Receita Federal internal classification not exposed publicly. If both flags are false, infer “Lucro Presumido or Lucro Real” and use revenue / size proxies if you need to disambiguate. For ground-truth Lucro Presumido vs Real, a paid commercial provider (Direct Data, Serasa) is required.
Q: My pipeline has 200 invoices/month with ~50 unique suppliers. Can I get away with the free tier?
A: Yes — at 50 unique CNPJs per month batched once, any of these free tiers works. Cache the result keyed by CNPJ; supplier registry data doesn’t change daily. If you hit auxiliar-nfs-e extraction at the rate of new invoices arriving (a few per hour), even ReceitaWS’s 3 req/min stays under the cap. BrasilAPI has no published rate cap and is the safest default.
Q: When does the ~30-day data-dump lag actually matter?
A: Three cases. (1) Fraud detection — a CNPJ that became INAPTA or BAIXADA last week needs real-time to catch. (2) KYC at supplier-onboarding time — same reason. (3) Tax compliance audits where the auditor’s snapshot disagrees with your enriched data. For routine bookkeeping enrichment after the fact, the lag is fine.
Q: How do I handle 429 rate-limit errors on the free tiers?
A: Two patterns. (1) Exponential backoff with Retry-After header respected. (2) Failover to a different source (BrasilAPI ↔ CNPJá open ↔ CNPJ.ws publica are roughly interchangeable for the basic CNAE + Simples + address fields). Code your enrichment loop to try BrasilAPI first, fall through to CNPJá open on 429, fall through to CNPJ.ws publica with a 20-second wait.
Q: Does this work alongside auxiliar-nfs-e + Surya for a full bookkeeping pipeline?
A: Yes — that’s the explicit composition. Pipeline shape: (1) Surya OCR’s the NFS-e PDF to text. (2) auxiliar-nfs-e parses prestador + tomador + valor + ISS + CNPJs from that text. (3) The CNPJs get enriched via BrasilAPI (or peers above) for CNAE + regime tributário. (4) The enriched record gets handed to your accountant or written to a ledger. The renatoag wedge validation on 2026-04-27 is what surfaced this /solve/ task as the missing step (3) — see the session retro for context.
Q: I want to run my own field-accuracy corpus eval to validate this ranking. How?
A: Pick 5–10 real Brazilian CNPJs across sectors (a bank, a Simples Nacional MEI, a state-registered LTDA in São Paulo, a federally-registered SA, a recently-baixada company). For each, hit all 5 sources. Score each source on: CNAE primary code match, CNAE description match, Simples flag accuracy, address match (via Levenshtein on the logradouro), sócios completeness, situação cadastral freshness (compare to today’s gov.br lookup). Publish the scorecard updates back via PR. The methodology mirrors /solve/nfs-e-extraction/’s field-accuracy approach — same shape applied to CNPJ instead of NFS-e fields.
Methodological caveats
- This is documented-characteristics ranking, not field-accuracy ranking. Numbers from vendor docs and curl tests, not a real-corpus accuracy eval. A field-accuracy corpus run is Phase-2 follow-up tracked in
backlog.md. - Free-tier vendor terms can change without notice. Re-check current rate limits + auth requirements before relying on any of these for critical workflows.
last_verified: 2026-04-27. - Data freshness varies. BrasilAPI / CNPJ.ws publica / CNPJá open are cached from open data dumps with cadence ranging from daily (CNPJá) to ~30 days (BrasilAPI / CNPJ.ws). For situações cadastrais needing real-time, only paid tiers apply.
- Inscrições Estaduais lag varies by state. CNPJ.ws sources from Sintegra which has per-state reporting cadence — some states report daily, others lag. Don’t assume IE freshness equals federal CNPJ freshness.
- No commercial-enrichment fields. None of the 5 ranked sources include Serasa-style credit signals, employee count ranges (Speedio), or revenue estimates. If those matter, use a paid commercial enrichment provider — out of scope for this free-tier-first ranking.
Update cadence
Re-run this ranking when: (a) BrasilAPI publishes a rate-limit cap or pricing tier (currently uncapped), (b) any of the 5 sources changes its free-tier shape (auth requirement, field gating), (c) a Phase-2 field-accuracy corpus eval ships and updates the scorecard with measured numbers, (d) 90 days after first publish (2026-07-26).
Related
/solve/nfs-e-extraction/— the upstream task in the bookkeeping pipeline (extract CNPJs from NFS-e PDFs)/data/auxiliar-cnpj-fetch/— the top-pick Capability detail page (multi-provider cascade behind the gateway)/data/brasilapi-cnpj/·/data/cnpja-cnpj/·/data/cnpj-ws/·/data/receitaws-cnpj/— direct alternativesauxiliar-mcp— the MCP server exposingfind_capability(jtbd: ["cnpj-enrichment"])andinvoke_capability(tool: "fetch_cnpj")to in-loop agents