UC BERKELEY · CAPSTONE 2026

Ground it before
you simulate it.

What if you could predict public opinion on any policy in seconds?CivicSim stress-tests policy ideas against a population that actually looks like America.

Launch simulator→View findings

38,449Real Respondents

2.5MCensus Records

94%Demographic Coverage

2.4×Better Signal

Interviewing & collaborating with · Ongoing user testing

Possibility Lab

Goldman School of Public Policy

01Overview

A simulated population is only useful if it's the right one.

If the modeled people are off, the policy readout is off. CivicSim closes that gap.

TRADITIONAL APPROACH

Generic AI prompts like “simulate a 35-year-old voter.”
Survey-only data that systematically excludes rural and intersectional groups.
One-size-fits-all demographics for every policy domain.
Captures only 10.6% of available opinion signal.

CIVICSIM APPROACH

Samples from 2.5M ACS Census records to match real U.S. demographics.
Conditions AI agents with validated Pew ATP opinion priors.
Domain-specific conditioning: geography matters for immigration, not tech.
Captures 25.2% of opinion signal with empirically selected variables.

02Architecture

Two grounding streams. One simulated voter.

One stream answers who a person is. The other answers what they tend to believe. Together they produce agents that behave like real populations, not stereotypes.

Structural stream

Who is represented

Draws agents directly from 2.5M ACS Census records so every demographic combination reflects the real U.S. population.

Behavioral stream

How opinions are seeded

Seeds each agent with Pew ATP survey priors and empirically chosen variables, so opinions follow observed patterns.

CivicSim architecture diagram — The structural stream builds the person. The behavioral stream gives them an opinion shape. Both feed every simulated response.

03Empirical Findings

Three failures, three corrections.

STUDY 01

The survey is not the population.

Post-stratification weighting corrects marginal demographic distributions, but not joint distributions in sparse, systematically excluded subgroups. Rural and intersectional populations face the largest representation gaps.

0.321TVDYoung Black Americans (18-29): nearly 1/3 of mass lands in the wrong income bracket

0.303TVDRural × low-income geographic gap (Census Division)

14×baselineRural geographic misalignment vs. full-sample baseline

TVD is the total variation distance between the survey and the real population. 0 means a perfect match; higher means bigger error. Baseline compares each subgroup's gap to the gap across the whole sample, so 14× means rural respondents are 14 times more misaligned than average.

Income distribution: real U.S. (ACS) vs. weighted survey (Pew ATP)

<$30k

$30-40k

$40-50k

$50-60k

$60-70k

$70-80k

$80-90k

$90-150k

>$150k

Survey respondents (Pew ATP, weighted) Real U.S. population (ACS Census)

Weighting fixes the easy gaps but still over-represents low-income (~$30k) respondents by nearly 9 points. The mismatch concentrates in groups that are hardest to recruit.

STUDY 02

Marginal rankings are the wrong selection tool.

We tested every possible combination of 7 demographic variables across 1,426 opinion items. The usual set that researchers pick (age, income, education) misses most of the signal. The best set isn't what ranking variables one-by-one suggests.

CONVENTIONAL{age, income, education}

10.6%

of full 7-variable joint signal

vs.

GREEDY OPTIMAL{race, location, age}

25.2%

of full 7-variable joint signal

Ablation: how fast does coverage grow?

Add one variable at a time. The greedy path picks the most useful variable next; the conventional path picks textbook variables. Greedy gets there twice as fast.

Greedy optimal (race → location → age) Conventional (age → income → education)

Variables added →

Leave-one-out: which variable hurts most when removed?

Pull each variable out of the full set and see how much signal disappears. The biggest drop is the most essential variable.

Census division

-53.5%

Race

-38.2%

Age

-27.1%

Income

-18.4%

Education

-12.6%

Gender

-7.1%

Religion

-4.3%

-53.5%

The structural failure. Drop Census division from the full set and opinion signal collapses by 53.5%, the biggest hit of any variable. Yet on a marginal ranking it's only third, so the textbook approach leaves it out entirely.

STUDY 03

Geography is domain-specific, not universal.

Once you condition on demographics, do opinions still vary by region? For most domains, no. People's views pool nationally and you can skip geography. International affairs is the exception. And young respondents need geography almost everywhere.

The numbers below are Jensen-Shannon distance after demographic conditioning. Lower means opinions transfer cleanly across regions. Higher means a Texan and a Vermonter still disagree even after accounting for demographics.

not needed

Pool nationally after demographic conditioning.

Technology 0.119
Environment / Climate 0.119

optional

Include for sensitive analyses; required for young agents.

Health 0.129
Family & Society 0.134
Economy 0.135
Religion 0.141
Politics & Government 0.142
Race & Inequality 0.145
Immigration 0.150

required

Geographic conditioning is mandatory. Views covary with local immigrant community composition.

International 0.174

+70%

The age modifier. For 18-29 year olds, geographic variation is up to 70% higher than for older cohorts, in every income tier and every domain. If you're simulating young Americans, tier geography up by one regardless of what the domain table says.

04Built For Policymakers

Built for policymakers.

Understand public opinion before you propose, test messaging before you communicate, and identify support gaps before they become problems.

Pre-Proposal Testing

Test policy ideas before committing resources. Understand which demographics support or oppose your proposal.

Example: Universal healthcare, minimum wage increase, housing reform

Message Testing

Compare different framings of the same policy to find messaging that resonates across demographic groups.

Example: Climate policy as "jobs program" vs. "environmental protection"

Coalition Building

Identify which demographic groups are most supportive and where you need to address concerns or build awareness.

Example: Finding unexpected allies or opposition segments

TRUSTED BY

Municipal, State, and Federal Agencies

CivicSim's demographically grounded approach means you're not just getting an AI's opinion. You're getting predictions based on empirical census data and validated opinion research.

94.2%

Demographic Coverage

±1.4%

Margin of Error

38k+

Validated Respondents

How CivicSim differs

Census-grounded samplingvs. survey-only or generic prompting
Empirically selected variablesvs. conventional demographics only
Domain-specific conditioningvs. one-size-fits-all approach

▶Try the Simulator

05Framework

Three corrective steps.

CivicSim operationalizes a single principle: the decision of who to simulate, and along which demographic axes, is an empirical question, not a design preference.

STEP 01⬢

Draw agents from census microdata

Sample synthetic agents from ACS PUMS (~2.5M adult records per year) rather than survey sample data. Population representativeness becomes a property of the data, not a research question.

→ Validated by Study 01

→

STEP 02◈

Select conditioning variables empirically

Run a leave-one-out or greedy IG ablation over the survey corpus for the target domains. Always include race and Census division, interaction-dominated signal cannot be recovered from marginal rankings.

→ Validated by Study 02

→

STEP 03◇

Apply tiered geographic conditioning

Use the domain classification from Study 03 to determine whether geography is required, optional, or unnecessary. For young agents (18-29), tier up by one level regardless of domain.

→ Validated by Study 03

06Paper

Read the full work.

UC BERKELEY · CAPSTONE 2026

Ground It Before You Simulate It: The Case for Demographically Grounded LLM Simulations

We argue that current LLM-based public opinion simulations are not approximations of a representative population but consistent, predictable distortions at the input level, and that fixing this is methodologically prior to all other concerns about LLM agent quality.

CivicSim Team · UC Berkeley

Paper→GitHub repository