Data

CADE SEI: process-file document retrieval

Stateful Playwright-driven retrieval of CADE Sistema Eletrônico de Informações (SEI) process documents. Input: process_number (e.g. "08700.007894/2023-88") + target_doc_id (SEI document ID like "1735573"). Output: PDF bytes + extracted text. Reference fixture validated 2026-05-01: ≈116,562 chars extracted.

Verified: 2026-05-01 (live-prod-retrieval-2026-05-01)

When to use CADE SEI: process-file document retrieval

Choose if

Your agent needs the actual document file (PDF bytes + extracted text) for a specific CADE administrative process — antitrust merger filings, cartel investigations, market-power assessments. Pick this when you already know `process_number` (e.g. from `br-cade-search`) plus a `target_doc_id`, and need verbatim text rather than just metadata. This is the only path that preserves session cookies through the SEI popup — direct HTTP fetching of the document URL fails because SEI binds the URL to the navigation session.

Avoid if

You only need process metadata (parties, status, type) without the document body — `br-cade-search` already returns those fields and is sub-500ms with no Playwright dependency. Or, you have access to the CADE bulk data export (legal team negotiates direct partnerships) — bulk export is bandwidth-cheaper for systematic backfills than per-call portal scraping.

Risk Flags

  • MEDIUM regulatory CADE may change its SEI portal's HTML structure, navigation flow, or rate limits at any time. Auxiliar.ai owns the ToS conversation (per migration proposal Q4): if CADE reaches out, we respond within 48h and disable the affected capability if requested. Jornalista's local fallback (their own `backend/services/br_cade_sei.py`) provides graceful degradation during any incident.
  • LOW operational PDF byte-equality is not guaranteed across retrievals — CADE may inject download-time watermarks or page-number metadata. Char-for-text equality on extracted content IS the correctness criterion (reference fixture: 116,562 ± 5%). Tests assert text equality, not byte equality.
  • LOW cold_start Lambda Container Image cold-starts run 8–15s on Chromium-bearing images. Combined with ~30s mean nav, cold p99 lands near 50s. Budget 90s timeout (with 10s headroom). If p99 > 50s sustained, ECS Fargate fallback is documented in `docs/proposals/stateful-capability-runtime.md`.

Cost

Type: Free · Free tier: Free at the upstream — CADE's SEI public consultation portal does not charge for process-file access. Auxiliar.ai passes through at cost; estimated infrastructure cost ~$0.001/call (Lambda Container Image runtime + Chromium cold-start amortized + outbound bandwidth for PDF download). Rate-limited to ≤1 req/3s per the SEI ToS-friendly cadence.

Hidden costs

  • Cold-start tax on the Chromium-bearing Lambda Container Image: 8–15s p99 the first time after a deploy. Subsequent invocations stay warm for ~10–15min on most Lambda regions.
  • Per-capability rate limit ≤1 req/3s — bursts of 100 retrievals serialize to ≥5min wall-clock. Plan accordingly for backfills.
  • Snapshot capture on failures writes ~1–10 MB to S3 (HTML+PNG+console+HAR). 30-day lifecycle expiry; counts against the auxiliar-portal-snapshots bucket budget.

Install

Default

curl -X POST https://api.auxiliar.ai/api/invoke/br_cade_sei_process -H "content-type: application/json" -d '{"process_number":"08700.007894/2023-88","target_doc_id":"1735573"}'
# Universal HTTP path — works for any agent. The auxiliar gateway
# routes to the dedicated `br_portal` Lambda Container Image which
# walks the SEI navigation and returns PDF bytes + extracted text.
curl -s -X POST https://api.auxiliar.ai/api/invoke/br_cade_sei_process \
  -H "content-type: application/json" \
  -d '{"process_number":"08700.007894/2023-88","target_doc_id":"1735573"}'
# Returns:
# {
#   "tool": "br_cade_sei_process",
#   "source_module": "backend.sources.br.portal.cade_sei",
#   "elapsed_ms": 28412,
#   "result": {
#     "process_number": "08700.007894/2023-88",
#     "document_id": "1735573",
#     "document_url": "https://sei.cade.gov.br/sei/modulos/pesquisa/...",
#     "document_title": "Anexo - SG Nota Técnica",
#     "pdf_size_bytes": 11534821,
#     "text": "...116562 chars...",
#     "retrieval_status": "fetched_verbatim",
#     "source_tool": "br_cade_sei_process"
#   }
# }

Setup docs →

Estimated time to first success: ~1 min

Claude code

claude mcp add auxiliar npx auxiliar-mcp
# Compose with the resolver:
invoke_capability(tool="br_cade_search", args={"query": "Itaú-Unibanco merger"})
# → pick a process_number from the result
invoke_capability(tool="br_cade_sei_process",
                  args={"process_number": "...", "target_doc_id": "..."})

Claude desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "auxiliar": { "command": "npx", "args": ["-y", "auxiliar-mcp"] }
  }
}

Openclaw

curl -X POST https://api.auxiliar.ai/api/invoke/br_cade_sei_process
# Free-text → process_number → document
# 1) Resolve via the search capability
curl -s -X POST https://api.auxiliar.ai/api/invoke/br_cade_search \
  -H "content-type: application/json" \
  -d '{"query":"Itaú-Unibanco merger"}'
# 2) Fetch the document
curl -s -X POST https://api.auxiliar.ai/api/invoke/br_cade_sei_process \
  -H "content-type: application/json" \
  -d '{"process_number":"08700.007894/2023-88","target_doc_id":"1735573"}'

Dependencies

Minimum runtime: Lambda Container Image (Python 3.11 + Playwright 1.40 + Chromium); not directly installable client-side.

Composes with: CADE process search (br_cade_search)

Distribution

Repository
https://github.com/Tlalvarez/Auxiliar-ai
License
MIT (auxiliar.ai gateway code); upstream content per CADE SEI public-consultation ToS