civicsim.xyz
UC Berkeley School of Information
Demographically grounded simulation
CivicSim
Ground it before you simulate it.

Demographically grounded LLM simulations for public-policy testing.

Anagha
Anagha
MIMS · Capstone
Aratrik Paul
Aratrik Paul
MIMS · Capstone
Minkush
Minkush
MIMS · Capstone
Sushanti
Sushanti
MIMS · Capstone
Vikram
Vikram
MIMS · Capstone
Team CivicSim · Capstone 2026 · Faculty Panel civicsim.xyz
01 · The opportunity

AI tests ideas everywhere.
Except where it matters most.

Synthetic personas now de-risk decisions across industries. Public policy still ships untested.

Where simulation already works

Industries using AI today.

  • Product design
    Synthetic users test interfaces before launch.
  • Market research
    Synthetic respondents stand in at scale.
  • Pharma, ads, finance
    Stress-test decisions virtually before reality.
Where it has not landed yet

Public policy is missing.

  • Slow panels
    Narrow, expensive survey cohorts.
  • Untested legislation
    Ships without virtual electorate testing.
  • Reactive learning
    Outcomes known only after the law passes.
Public policy is the next frontier. And the hardest, because representation is not optional.
02 · The problem

Policy simulations fail
before generation begins.

01

Stereotypes,
not populations.

Real example
Ask GPT to "simulate a Texan voter." It returns the average of every Texan stereotype it has read.
02

Survey panels
are not populations.

Real example
Young Black Americans land in the wrong income bracket 32% of the time in survey panels.
03

Wrong variables.
Confidently.

Real example
On tech policy, geography matters more than age. Most simulations skip it.

The model is not the problem. The who being simulated is.

03 · The landscape

Persona platforms are growing fast.
None hit the policy bar.

$50M+
Raised by the leading synthetic-persona platforms combined. None ground their personas in census reality. Each was built for commerce, not consequential decisions.
S
Simile. ~$8M raised · Behavior simulation
D
Ditto ~$2M seed · AI personas
Aaru ~$10M Series A · Reaction modeling
Gap 01

Demographic grounding is shallow.

Personas synthesized from generic priors. Civic populations are not faithfully represented at the joint level.

Gap 02

Built for research, not for decisions.

Designed for marketing studies and qualitative work. Not built to defend a policy choice in a hearing.

Gap 03

Closed, black-box methodology.

Personas and reasoning are opaque. No auditable subgroup breakdown. No inspection.

04 · The solution

Introducing CivicSim.

Pick a U.S. location. Choose a policy question. Run a synthetic electorate that is actually representative.

01

Representative
Population.

Agents sampled from ACS Census microdata. 2.5M records. Distributions match the location.

02

Policy-Specific
Grounding.

Each agent carries an empirical opinion prior from 38,449 Pew respondents. Not LLM guesswork.

03

Transparent
by Design.

Every persona inspectable. Demographics, prior, stance, rationale streamed live and saved.

The goal is not perfect prediction. It is representative, transparent simulation before real-world rollout.
05 · Architecture & technical deep dive

How the system is built.

CLIENT Browser React UI · SSE consumer FRONTEND · VERCEL Next.js 15 /api proxy · TS · Tailwind HTTPS /api/* · SSE API SERVER · FLY.IO FastAPI Backend Pydantic validation · CORS · Server-Sent Events streaming ORCHESTRATES 01 · SAMPLER Agent Generator largest-remainder draw 02 · LOOKUP Opinion Prior P(answer | demographics) 03 · CLIENT LLM Client provider-agnostic ACS · 2.5M rows Pew ATP · parquet OpenAI · Anthropic
Client tier
API server
Core engine
Data sources
06 · At a glance

From policy to insight in seconds.

A complete user journey, end to end.

① Your Policy

Pick a question.

"$20 Federal Minimum Wage"
Economy Labor

What will different demographics think?

② CivicSim Analyzes

Run the simulation.

  • 5,000 census-grounded agents
  • Demographics + opinion priors
  • AI-powered simulation
~5 sec
vs. weeks of polling
③ Instant Insights

See the breakdown.

59%
Support
27%
Oppose
14%
Unsure
By age
18–29
75%
65+
38%
07 · Live demo
Demo time.
Overall Results · Min Wage $15/hr
47%
Strongly support
of simulated population
Strongly support47%
Somewhat support27%
Somewhat oppose13%
Strongly oppose7%
Unsure6%
Breakdown · By Race / Ethnicity
White
Hispanic
Black
Asian
Other
Strongly support
Somewhat support
Somewhat oppose
Strongly oppose
Breakdown · By Age Group
18–29
30–44
45–64
65+
← younger cohorts skew strongly supportive
Strongly support
Somewhat support
Somewhat oppose
Strongly oppose
08 · How it works

Four steps. End to end.

01

Sample

Build N agents matching the location's demographics.

Real example
"Give me 200 people who actually live in Alameda County."
02

Prime

Attach an empirical opinion prior to each agent.

Real example
"How do people like you usually answer this question?"
03

Simulate

Each agent answers with stance plus rationale.

Real example
"Should the city raise minimum wage to $20? Here is why I would say yes."
04

Aggregate

Stream results live. Surface divergence by group.

Real example
"68% support overall. Renters and homeowners diverge sharply."
→ press space or arrow to reveal each step
09 · The data foundation

Two gold-standard datasets
do the heavy lifting.

Population

American Community Survey

U.S. Census microdata · via IPUMS USA
~2.5M
Adult records / year

Why: the only public source with joint demographic distributions at population scale. Marginal-only sampling collapses intersectional groups. ACS preserves them.

Opinion

Pew American Trends Panel

Probability-based panel · waves 2021 to 2024
38,449
Validated respondents

Why: consistent methodology across 80+ waves, broad topical coverage. Compiled into a compact opinion-prior lookup. No PII.

Census for the who. Pew for what they think. Our contribution is in how we combine them.

11 · Experimentation & results

The demographics we assume matter
are not the ones that shape opinion.

10.6%
Conventional
Textbook variables (age, income, education).The status quo.
25.2%
Empirical selection
Variables chosen empirically per domain.Our approach.
=
2.4×
★ Improvement factor
Across 1,426 opinion items.Same models. Better grounding.
Census division (omitted by textbooks) drives a 53.5% signal drop when removed. The variable we would skip is the variable we most need.
12 · Evaluations · 10,000 runs

CivicSim consistently outperforms
both naive baselines at scale.

Metric 1
Total Variation Distance (TVD)
Metric 2
Wasserstein Distance
Condition
TVD ↓
Wasserstein ↓
🏛️ Pew ATP GROUND TRUTH
reference
reference
★ CivicSim BEST
0.101
0.058
Naive OpenAI (gpt4o-mini)
0.188
0.201
Naive Anthropic (claude haiku 4.5)
0.311
0.215
Thresholds: excellent < 0.15 moderate 0.15 – 0.30 poor > 0.30 20 questions · 50 demographic slices · 10,000 runs per condition
12 · Voice of users

What policy researchers
are telling us.

Tested with

I wouldn't use this to replace public opinion polling, but I would absolutely use it to explore and pressure-test policies before taking them into the real world.

Grad StudentGoldman School of Public Policy

I see this as a strong starting point for mixed-methods policy research — not a replacement for consultation, but a way to test and refine ideas before investing in large-scale public engagement.

Policy Lab AdvisorStanford

What's interesting here is not replacing public engagement, but creating a faster way to explore policy directions, identify blind spots, and ask better questions before going to communities directly.

Project LeadPossibility Lab · UC Berkeley

13 · Limits & what's next

Honest about what it is.
Deliberate about where it goes.

Current scope

What we're working with

  • Directional realism, not outcome prediction.
  • Thin cells in rare intersectional subgroups.
  • Synthetic opinion is not lived experience.
→ What's next

Where we're heading

  • Custom datasets for granular policies — e.g. climate impact on low-income Hispanic populations in CA.
  • Real surveys to validate sparse subgroup opinions, inspired by Stanford HCI Lab.
  • Larger evaluations — expanding beyond Pew ATP to CCES.
Representative simulations can support better decisions. They must never replace the voices of real communities.
CivicSim · 2026

Thank you.

Ground it before you simulate it.
Try it live
Special thanks to Joshua Blumenstock, our advisor.
...and the collaborators who couldn't make it tonight 😅
ChatGPT
Claude
Cursor
Scan to visit civicsim.xyz
Scan to try it
Speaker Notes · Slide 1 / 16
← → · space · F = fullscreen · N = notes · P = print · 1-9 = jump