CivicSim · Pitch Deck

civicsim.xyz

Demographically grounded simulation

CivicSim

Ground it before you simulate it.

Demographically grounded LLM simulations for public-policy testing.

Anagha

MIMS · Capstone

Aratrik Paul

MIMS · Capstone

Minkush

MIMS · Capstone

Sushanti

MIMS · Capstone

Vikram

MIMS · Capstone

Team CivicSim · Capstone 2026 · Faculty Panel civicsim.xyz

01 · The opportunity

AI tests ideas everywhere.
Except where it matters most.

Synthetic personas now de-risk decisions across industries. Public policy still ships untested.

✓

Where simulation already works

Industries using AI today.

Product design

Synthetic users test interfaces before launch.
Market research

Synthetic respondents stand in at scale.
Pharma, ads, finance

Stress-test decisions virtually before reality.

✕

Where it has not landed yet

Public policy is missing.

Slow panels

Narrow, expensive survey cohorts.
Untested legislation

Ships without virtual electorate testing.
Reactive learning

Outcomes known only after the law passes.

Public policy is the next frontier. And the hardest, because representation is not optional.

~50 SEC · OPPORTUNITY

AI simulation is already standard practice across industries. Product teams test on synthetic users before launch. Market researchers run synthetic survey panels instead of real ones, saving weeks and thousands of dollars. Pharma, finance, advertising are all stress-testing with simulated populations.

But look at public policy. We still pass laws the old way. Important legislation ships without ever being tested against a synthetic version of the people it affects. We only find out whether it worked after it's already law.

Policy is the natural next frontier for this technology, and also the hardest, because representation isn't optional. If your simulation misses a community, you've automated a bias and called it data. That's the bar we built for.

02 · The problem

Policy simulations fail
before generation begins.

01

Stereotypes,
not populations.

Real example

Ask GPT to "simulate a Texan voter." It returns the average of every Texan stereotype it has read.

02

Survey panels
are not populations.

Real example

Young Black Americans land in the wrong income bracket 32% of the time in survey panels.

03

Wrong variables.
Confidently.

Real example

On tech policy, geography matters more than age. Most simulations skip it.

The model is not the problem. The who being simulated is.

~60 SEC · PROBLEM

There are three failure modes, and all three happen before the LLM generates a single word.

First: stereotype, not population. Prompting "simulate a 35-year-old voter" gives you the model's average caricature with no real demographic distribution behind it. Fine for ad copy, disqualifying for policy.

Second: survey panels aren't populations. They weren't designed to be representative. They were designed to answer specific research questions. We measured this: for young Black Americans, panels misplace 32 percent into the wrong income bracket.

Third: the wrong variables, confidently. Age, income, education sound important. But geography often explains more. Almost everyone skips it.

The bottom line on this slide: the model isn't the problem. The "who" is.

03 · The landscape

Persona platforms are growing fast.
None hit the policy bar.

$50M+

Raised by the leading synthetic-persona platforms combined. None ground their personas in census reality. Each was built for commerce, not consequential decisions.

S

Simile. ~$8M raised · Behavior simulation

D

Ditto ~$2M seed · AI personas

⧉

Aaru ~$10M Series A · Reaction modeling

Three structural gaps for policy use

✕

Gap 01

Demographic grounding is shallow.

Personas synthesized from generic priors. Civic populations are not faithfully represented at the joint level.

✕

Gap 02

Built for research, not for decisions.

Designed for marketing studies and qualitative work. Not built to defend a policy choice in a hearing.

✕

Gap 03

Closed, black-box methodology.

Personas and reasoning are opaque. No auditable subgroup breakdown. No inspection.

~50 SEC · LANDSCAPE

We're not the first to think about this. Simile, Ditto, and Aaru are serious players. Between them, over 50 million dollars raised. Real products, real momentum.

But none of them are actually trying to do what we're doing. They were built for commerce: product research, qualitative personas, business messaging. Useful tools for those use cases, just optimized for a completely different problem.

Through a policy lens, three gaps show up consistently. Shallow demographic grounding: personas synthesized from model priors, not census data. Built for research, not decisions: you can't cite an AI persona in a policy hearing. And closed methodology: no auditability of how a persona was built or why it reasoned a certain way. For policy work, that's a non-starter.

We saw that gap. That's what CivicSim is built to close.

04 · The solution

Introducing CivicSim.

Pick a U.S. location. Choose a policy question. Run a synthetic electorate that is actually representative.

01

Representative
Population.

Agents sampled from ACS Census microdata. 2.5M records. Distributions match the location.

02

Policy-Specific
Grounding.

Each agent carries an empirical opinion prior from 38,449 Pew respondents. Not LLM guesswork.

03

Transparent
by Design.

Every persona inspectable. Demographics, prior, stance, rationale streamed live and saved.

The goal is not perfect prediction. It is representative, transparent simulation before real-world rollout.

~45 SEC · INTRODUCING CIVICSIM

So this is CivicSim. The idea is simple: you pick any U.S. location, you choose any policy question, and you get back a synthetic electorate that actually reflects who lives there.

And it's built on three pillars, each one a direct answer to the three failure modes I just described.

First: representative population. Every agent is sampled from real U.S. Census microdata, 2.5 million records. So the people you're simulating actually match the demographics of the place you picked.

Second: policy-specific grounding. Each agent carries a real opinion prior drawn from 38,449 Pew survey respondents. The LLM isn't guessing what someone might think. It's conditioned on what people like them have actually said.

Third: fully inspectable. Every persona's demographics, prior, stance, and reasoning are saved. You can drill into any agent and see exactly why they responded the way they did. Nothing is a black box.

We're not promising perfect prediction. The goal is representative, transparent simulation before a policy ever goes into the real world.

05 · Architecture & technical deep dive

How the system is built.

Client tier

API server

Core engine

Data sources

~50 SEC · ARCHITECTURE

Let me walk through this top to bottom. When a user opens CivicSim, they hit our Next.js frontend. That's what handles the UI, the inputs, and the live visualization. It's a thin layer that passes requests down to our backend API.

The backend is a FastAPI server that orchestrates everything. When a simulation request comes in, it kicks off three things in sequence.

First, the agent sampler builds your synthetic population. It uses a largest-remainder algorithm to draw people from ACS census data so the demographics of your simulated group actually match the real location.

Second, the prior lookup attaches an opinion prior to each agent. Given this person's demographics, here's how people like them have historically answered this kind of question. That comes from Pew ATP data, not the LLM's imagination.

Third, the LLM client runs each agent through the model, conditioned on both their demographics and their prior. It works with OpenAI, Anthropic, or local models.

The responses stream back using Server-Sent Events, so you see agents arriving one by one in real time. That's not just a UI choice. It's part of the transparency story: you watch the system think rather than waiting for a black-box result.

06 · At a glance

From policy to insight in seconds.

A complete user journey, end to end.

① Your Policy

Pick a question.

"$20 Federal Minimum Wage"

Economy Labor

What will different demographics think?

② CivicSim Analyzes

Run the simulation.

5,000 census-grounded agents
Demographics + opinion priors
AI-powered simulation

~5 sec

vs. weeks of polling

③ Instant Insights

See the breakdown.

59%

Support

27%

Oppose

14%

Unsure

By age

18–29

75%

65+

38%

07 · Live demo

Demo time.

Overall Results · Min Wage $15/hr

47%

Strongly support

of simulated population

Strongly support47%

Somewhat support27%

Somewhat oppose13%

Strongly oppose7%

Unsure6%

Breakdown · By Race / Ethnicity

White

Hispanic

Black

Asian

Other

Strongly support

Somewhat support

Somewhat oppose

Strongly oppose

Breakdown · By Age Group

18–29

30–44

45–64

65+

← younger cohorts skew strongly supportive

Strongly support

Somewhat support

Somewhat oppose

Strongly oppose

~2 MIN · DEMO WALKTHROUGH

This slide shows a real simulation run: a $15 federal minimum wage question against Alameda County.

Top card: overall result. 47% strongly support. Walk the audience through the stacked bar: roughly three-quarters of the simulated population land in some form of support. That number came from 5,000 census-grounded agents, not a poll.

Middle card: by race and ethnicity. Point out the divergence. Black respondents sit at 62% strong support. Asian respondents at 38%. That's a 24-point gap on a single demographic axis, and it's the kind of subgroup signal that aggregate polling buries.

Bottom card: by age group. Younger cohorts skew strongly supportive; the 65+ bar shifts noticeably toward opposition. This is the structure of disagreement, not just a headline number.

The point: in under 5 seconds you get not just who supports, but which communities diverge and why. That's what we hand to a policymaker before they go into a hearing.

If demoing live: run the same question live and point to the streaming panel. Drill into one dissenting agent and show the grounded reasoning chain. Wrap back to this slide.

08 · How it works

Four steps. End to end.

01

Sample

Build N agents matching the location's demographics.

Real example

"Give me 200 people who actually live in Alameda County."

02

Prime

Attach an empirical opinion prior to each agent.

Real example

"How do people like you usually answer this question?"

03

Simulate

Each agent answers with stance plus rationale.

Real example

"Should the city raise minimum wage to $20? Here is why I would say yes."

04

Aggregate

Stream results live. Surface divergence by group.

Real example

"68% support overall. Renters and homeowners diverge sharply."

→

→ press space or arrow to reveal each step

~55 SEC · HOW IT WORKS (CLICK THROUGH)

Four phases. Reveal each as you go.

(Step 1.) Sample. Build N agents matching the location's demographics. Largest-remainder algorithm, same method used in election apportionment. Proportions are guaranteed to match census.

(Step 2.) Prime. This is the step everyone else skips. We attach a real opinion prior to each agent: what people with these demographics have historically said about this type of question. The LLM doesn't guess. It conditions.

(Step 3.) Simulate. LLM generates stance and rationale for each agent, conditioned on both demographics and prior. You get a reasoning chain, not just a vote.

(Step 4.) Aggregate. Results stream live, group divergence surfaced. Not just "68% support" but renters vs. homeowners, urban vs. rural, wherever the real signal is.

09 · The data foundation

Two gold-standard datasets
do the heavy lifting.

Population

American Community Survey

U.S. Census microdata · via IPUMS USA

~2.5M

Adult records / year

Why: the only public source with joint demographic distributions at population scale. Marginal-only sampling collapses intersectional groups. ACS preserves them.

Opinion

Pew American Trends Panel

Probability-based panel · waves 2021 to 2024

38,449

Validated respondents

Why: consistent methodology across 80+ waves, broad topical coverage. Compiled into a compact opinion-prior lookup. No PII.

Census for the who. Pew for what they think. Our contribution is in how we combine them.

11 · Experimentation & results

The demographics we assume matter
are not the ones that shape opinion.

10.6%

Conventional

Textbook variables (age, income, education).The status quo.

→

25.2%

Empirical selection

Variables chosen empirically per domain.Our approach.

=

2.4×

★ Improvement factor

Across 1,426 opinion items.Same models. Better grounding.

Census division (omitted by textbooks) drives a 53.5% signal drop when removed. The variable we would skip is the variable we most need.

~60 SEC · RESULTS · ★ KEY SLIDE

If you remember one slide from this talk, this is it.

We ran 1,426 opinion items across 14 policy domains, testing every combination of 7 demographic variables, about 180,000 mini-experiments total.

10.6%: variance explained by the textbook demographic variables. Age, income, education. What every research team starts with.

25.2%: variance explained when we let the data tell us which variables actually matter per policy domain. Same models. Same compute. Smarter conditioning.

2.4×: not a marginal improvement. We're more than doubling explanatory power without touching the LLM.

And the punchline: Census division, which region of the country you live in, drives a 53.5% signal drop when removed. The most important variable. The one the textbook approach throws out. That's the finding we want to leave with you.

12 · Evaluations · 10,000 runs

CivicSim consistently outperforms
both naive baselines at scale.

Metric 1

Total Variation Distance (TVD)

Metric 2

Wasserstein Distance

Condition
TVD ↓
Wasserstein ↓

🏛️ Pew ATP GROUND TRUTH

reference

★ CivicSim BEST

0.101

0.058

Naive OpenAI (gpt4o-mini)

0.188

0.201

Naive Anthropic (claude haiku 4.5)

0.311

0.215

Thresholds: excellent < 0.15 moderate 0.15 – 0.30 poor > 0.30 20 questions · 50 demographic slices · 10,000 runs per condition

12 · Voice of users

What policy researchers
are telling us.

Tested with

I wouldn't use this to replace public opinion polling, but I would absolutely use it to explore and pressure-test policies before taking them into the real world.

Grad StudentGoldman School of Public Policy

I see this as a strong starting point for mixed-methods policy research — not a replacement for consultation, but a way to test and refine ideas before investing in large-scale public engagement.

Policy Lab AdvisorStanford

What's interesting here is not replacing public engagement, but creating a faster way to explore policy directions, identify blind spots, and ask better questions before going to communities directly.

Project LeadPossibility Lab · UC Berkeley

13 · Limits & what's next

Honest about what it is.
Deliberate about where it goes.

Current scope

What we're working with

Directional realism, not outcome prediction.
Thin cells in rare intersectional subgroups.
Synthetic opinion is not lived experience.

→ What's next

Where we're heading

Custom datasets for granular policies — e.g. climate impact on low-income Hispanic populations in CA.
Real surveys to validate sparse subgroup opinions, inspired by Stanford HCI Lab.
Larger evaluations — expanding beyond Pew ATP to CCES.

Representative simulations can support better decisions. They must never replace the voices of real communities.

~40 SEC · LIMITS & NEXT

Two cards. Left is what we're working with today. Right is where we're going.

On the left: this is directional realism, not prediction. We surface the structure of opinion, not exact outcome percentages. Rare intersectional subgroups are thin and we use backoff, which is honest but not perfect.

On the right, three concrete next steps. First, letting policymakers bring custom datasets for granular questions, things like climate impact on low-income Hispanic populations in California. Second, running real surveys to validate what the model says about sparse subgroups, an approach inspired by Stanford HCI Lab. Third, scaling evaluations beyond Pew ATP to CCES for broader coverage.

And the callout at the bottom is the one we mean: simulations support decisions. They never replace real communities.

CivicSim · 2026

Thank you.

Ground it before you simulate it.

Try it live

civicsim.xyz

Special thanks to Joshua Blumenstock, our advisor.

...and the collaborators who couldn't make it tonight 😅

ChatGPT

Claude

Cursor

Scan to try it

AI tests ideas everywhere.Except where it matters most.

Industries using AI today.

Public policy is missing.

Policy simulations failbefore generation begins.

Stereotypes,not populations.

Survey panelsare not populations.

Wrong variables.Confidently.

Persona platforms are growing fast.None hit the policy bar.

Demographic grounding is shallow.

Built for research, not for decisions.

Closed, black-box methodology.

Introducing CivicSim.

RepresentativePopulation.

Policy-SpecificGrounding.

Transparentby Design.

How the system is built.

From policy to insight in seconds.

Pick a question.

Run the simulation.

See the breakdown.

Four steps. End to end.

Sample

Prime

Simulate

Aggregate

Two gold-standard datasetsdo the heavy lifting.

American Community Survey

Pew American Trends Panel

The demographics we assume matterare not the ones that shape opinion.

CivicSim consistently outperformsboth naive baselines at scale.

What policy researchersare telling us.

Honest about what it is.Deliberate about where it goes.

What we're working with

Where we're heading

Thank you.

AI tests ideas everywhere.
Except where it matters most.

Policy simulations fail
before generation begins.

Stereotypes,
not populations.

Survey panels
are not populations.

Wrong variables.
Confidently.

Persona platforms are growing fast.
None hit the policy bar.

Representative
Population.

Policy-Specific
Grounding.

Transparent
by Design.

Two gold-standard datasets
do the heavy lifting.

The demographics we assume matter
are not the ones that shape opinion.

CivicSim consistently outperforms
both naive baselines at scale.

What policy researchers
are telling us.

Honest about what it is.
Deliberate about where it goes.