Home Blog About
5 min read

Launching indonesia-civic-stack: 40 MCP Tools for Indonesian Government Data

How I built an open-source Python SDK and MCP server that connects AI agents to 11 Indonesian government portals — and what I learned along the way.

TechnologyAIMCPOpen SourcePythonCivic Tech

Indonesia has over a dozen government portals publishing public data — business registrations, halal certifications, drug safety, earthquake alerts, election results, wealth declarations. The data is there, but accessing it programmatically is a nightmare. Every portal has its own quirks: different HTML structures, inconsistent APIs, geo-blocking, rate limits, and the occasional reCAPTCHA.

I’ve been scraping these portals for various projects over the past few months. HalalKah needed BPJPH data. LegalKah needed OJK data. Each time, I was writing the same boilerplate — HTTP clients, error handling, response normalisation. So I extracted the common patterns into a single package.

The result is indonesia-civic-stack: a Python SDK that wraps 11 Indonesian government portals into a unified interface, plus 40 MCP tools that let AI agents query them directly.

🚀 What It Does

The package covers 11 portals:

ModulePortalWhat it does
BPOMpom.go.idDrug & food safety registry
BPJPHhalal.go.idHalal product certification
AHUahu.go.idCompany registration lookup
OJKojk.go.idFinancial institution legality
OSSoss.go.idBusiness licensing (NIB)
LPSElpse.go.idGovernment procurement
KPUkpu.go.idElection data & candidates
BPSbps.go.idNational statistics API
BMKGbmkg.go.idEarthquakes & weather
LHKPNkpk.go.idOfficial wealth declarations
SIMBGsimbg.pu.go.idBuilding permits

Every module returns a consistent CivicStackResponse object — same structure whether you’re querying drug registrations or earthquake data. No more parsing HTML soup differently for each portal.

🤖 AI-Agent-First Design

The interesting part isn’t the scraping — it’s the MCP integration. Model Context Protocol lets AI assistants call external tools. Instead of asking a human to look something up on a government website, an AI agent can query the data directly.

# Zero-install remote server
claude mcp add civic-stack --transport http \
  https://mcp-server-production-d1a2.up.railway.app/mcp

# Or install locally
pip install "indonesia-civic-stack[mcp]"
claude mcp add civic-stack -- civic-stack-mcp

Once connected, you can ask Claude things like:

  • “Is this BPOM registration number still active?”
  • “Search for companies named ‘Maju Bersama’ in the AHU registry”
  • “What was the latest earthquake in Indonesia?”
  • “Cross-reference this business across OJK, AHU, and OSS”

The agent figures out which tools to call, chains them together, and synthesises the results. Multi-portal queries that would take a human 30 minutes of tab-switching take seconds.

⚙️ Architecture Decisions

A few choices worth noting:

Unified server over per-module servers. Early versions had separate MCP servers for each portal. Managing 11 server processes was painful. The unified server loads all 40 tools lazily — you pay the import cost only when a tool is actually called.

Consistent error envelopes. Government portals go down. A lot. Every response wraps the result in a status envelope (found, not_found, error, degraded) so agents can handle failures gracefully instead of crashing on unexpected HTML.

Proxy support for geo-blocking. Most .go.id portals are only reliably accessible from Indonesian IPs. The SDK supports a PROXY_URL environment variable for Cloudflare Workers or similar proxies. Not ideal, but practical.

Hatchling build system. Modern Python packaging with pyproject.toml, optional dependency groups ([mcp], [api], [browser], [all]), and CLI entry points. No setup.py archaeology.

🔧 The Hard Parts

Portal instability. Government websites change without notice. URLs shift, auth requirements appear, entire endpoints vanish. The LHKPN module (wealth declarations from the anti-corruption commission) went from a public search API to requiring reCAPTCHA v3 — discovered mid-development. VCR cassettes for tests help, but you’re always one portal update away from a broken module.

Geo-restrictions. Testing from Sydney means most portals return 403s or timeouts. I deployed a Cloudflare Worker as a proxy, but CF-to-CF routing (many .go.id sites use Cloudflare too) creates its own problems. The test suite uses VCR cassettes recorded from Indonesian IPs.

Inconsistent data formats. One portal returns JSON, another returns server-rendered HTML, another requires Playwright for client-side rendering. The OSS and SIMBG modules need a real browser. The SDK abstracts this away, but each module’s scraper is genuinely different code.

📊 Current Status

  • ✅ 40 MCP tools across 11 modules
  • ✅ Published on PyPI (pip install indonesia-civic-stack)
  • ✅ Listed on the MCP Registry
  • ✅ Hosted MCP server on Railway (zero-install remote access)
  • ✅ 63 tests passing, CI green
  • 🔴 LHKPN module degraded (reCAPTCHA v3)
  • ⚠️ Most portals need Indonesian IP or proxy for reliable access

The landing page at datarakyat.id has full documentation, module-by-module API references, and example prompts.

💭 Why Civic Tech + MCP Matters

Government data should be easy to access. These portals exist because Indonesian law mandates transparency — business registrations, halal certifications, official wealth declarations are all public record. But “public” often means “technically available if you know which website to visit and how to navigate it.”

MCP bridges that gap. An AI agent with civic-stack tools can answer questions about Indonesian public data as naturally as it answers questions about the weather. That’s not a technical achievement — it’s an accessibility one.

The code is MIT-licensed. If you’re building something for Indonesian civic data, I hope it saves you the weeks of portal-spelunking it took me.

Links:

Share: