A statewide policy and advocacy group was monitoring hundreds of California public agencies by hand. Every Friday and Saturday for years, their researcher opened agenda packets and read through them looking for the specific regulatory and policy language the group cared about. The work got done, but it took every weekend to do it.
Hundreds of packets per week, many of them over 1,000 pages, all needing to be opened and read through.
A dozen different agency platforms (Legistar, BoardDocs, eScribe, AgendaCenter, and others), each with its own URL conventions and its own ways of hiding the actual packet behind metadata.
Slow downloads, slower skims, impossible to keep pace with at the volume the group needed.
12 platforms·No API·Packets over 1,000 pages
What we built
A bespoke automated platform, not a scraper, built from four subsystems that each keep running when the public web does what the public web does.
01
Per-platform handlers
Twelve agency platforms, one handler each
Twelve programmatic handlers, one per agency platform: Legistar, BoardDocs, eScribe, CivicPlus AgendaCenter, PrimeGov, Granicus, OnBase, IQM2, Simbli, CivicWeb, Highbond, and Diligent. Each handler knows its platform's URL conventions, render quirks, and recovery paths, so a new platform requires a new handler rather than a rewrite of the system.
02
Two-stage AI analysis
From raw packet to verbatim quote
The first stage finds the right agenda URL for the upcoming meeting, and the second stage opens that document (whether PDF or HTML, sometimes hundreds of megabytes), scans it for the exact language the group cares about, and returns verbatim excerpts with a confidence score attached to each one. The Anthropic Claude API drives the analysis throughout.
03
Large-file pipeline
Memory-safe at any size
Packets over 200 MB get page-split before analysis so the system never runs out of memory on a board book. An auto-recovery layer deactivates entities that have already been processed and resumes from where the system stopped, so a single failure doesn't cost a whole run.
04
Reviewer surface
How findings reach the team
Weekly digest emails go directly to the group's reviewers with every flagged item, source link, and verbatim quote attached. A residual queue catches the handful of agencies the system cannot reach automatically, formatted for one-click review by a non-technical user.
What changed
356
public agencies scanned per weekly run
~3 hr
to scan all of them
~95%
accuracy on flagged items
40%
cost reduction vs the manual approach
“My wife is going to be so happy. We can finally go out on the weekends!”