UC BERKELEY · CAPSTONE 2026

Ground it before
you simulate it.

What if you could predict public opinion on any policy in seconds?CivicSim stress-tests policy ideas against a population that actually looks like America.

38,449Real Respondents
2.5MCensus Records
94%Demographic Coverage
2.4×Better Signal
CivicSim simulator demo

Interviewing & collaborating with · Ongoing user testing

01Overview

A simulated population is only useful if it's the right one.

If the modeled people are off, the policy readout is off. CivicSim closes that gap.

TRADITIONAL APPROACH
  • Generic AI prompts like “simulate a 35-year-old voter.”
  • Survey-only data that systematically excludes rural and intersectional groups.
  • One-size-fits-all demographics for every policy domain.
  • Captures only 10.6% of available opinion signal.
CIVICSIM APPROACH
  • Samples from 2.5M ACS Census records to match real U.S. demographics.
  • Conditions AI agents with validated Pew ATP opinion priors.
  • Domain-specific conditioning: geography matters for immigration, not tech.
  • Captures 25.2% of opinion signal with empirically selected variables.
2.4×

more opinion signal captured vs. conventional methods

94%

demographic coverage including systematically excluded groups

38k

validated survey respondents informing AI agents

02Architecture

Two grounding streams. One simulated voter.

One stream answers who a person is. The other answers what they tend to believe. Together they produce agents that behave like real populations, not stereotypes.

Structural stream

Who is represented

Draws agents directly from 2.5M ACS Census records so every demographic combination reflects the real U.S. population.

Behavioral stream

How opinions are seeded

Seeds each agent with Pew ATP survey priors and empirically chosen variables, so opinions follow observed patterns.

CivicSim architecture diagram
The structural stream builds the person. The behavioral stream gives them an opinion shape. Both feed every simulated response.
03Empirical Findings

Three failures, three corrections.

STUDY 01

The survey is not the population.

Post-stratification weighting corrects marginal demographic distributions, but not joint distributions in sparse, systematically excluded subgroups. Rural and intersectional populations face the largest representation gaps.

0.321TVDYoung Black Americans (18-29): nearly 1/3 of mass lands in the wrong income bracket
0.303TVDRural × low-income geographic gap (Census Division)
14×baselineRural geographic misalignment vs. full-sample baseline

TVD is the total variation distance between the survey and the real population. 0 means a perfect match; higher means bigger error. Baseline compares each subgroup's gap to the gap across the whole sample, so 14× means rural respondents are 14 times more misaligned than average.

Income distribution: real U.S. (ACS) vs. weighted survey (Pew ATP)
28
19
<$30k
10
7
$30-40k
9
8
$40-50k
7
8
$50-60k
7
8
$60-70k
5
7
$70-80k
4
6
$80-90k
13
18
$90-150k
17
19
>$150k
Survey respondents (Pew ATP, weighted) Real U.S. population (ACS Census)
Weighting fixes the easy gaps but still over-represents low-income (~$30k) respondents by nearly 9 points. The mismatch concentrates in groups that are hardest to recruit.
STUDY 02

Marginal rankings are the wrong selection tool.

We tested every possible combination of 7 demographic variables across 1,426 opinion items. The usual set that researchers pick (age, income, education) misses most of the signal. The best set isn't what ranking variables one-by-one suggests.

CONVENTIONAL{age, income, education}
10.6%
of full 7-variable joint signal
vs.
GREEDY OPTIMAL{race, location, age}
25.2%
of full 7-variable joint signal

Ablation: how fast does coverage grow?

Add one variable at a time. The greedy path picks the most useful variable next; the conventional path picks textbook variables. Greedy gets there twice as fast.

0%10%20%30%+race+location+age+income+education+age+income+education+gender+religionVariables added →
Greedy optimal (race → location → age) Conventional (age → income → education)
Variables added →

Leave-one-out: which variable hurts most when removed?

Pull each variable out of the full set and see how much signal disappears. The biggest drop is the most essential variable.

Census division
-53.5%
Race
-38.2%
Age
-27.1%
Income
-18.4%
Education
-12.6%
Gender
-7.1%
Religion
-4.3%
-53.5%
The structural failure. Drop Census division from the full set and opinion signal collapses by 53.5%, the biggest hit of any variable. Yet on a marginal ranking it's only third, so the textbook approach leaves it out entirely.
STUDY 03

Geography is domain-specific, not universal.

Once you condition on demographics, do opinions still vary by region? For most domains, no. People's views pool nationally and you can skip geography. International affairs is the exception. And young respondents need geography almost everywhere.

The numbers below are Jensen-Shannon distance after demographic conditioning. Lower means opinions transfer cleanly across regions. Higher means a Texan and a Vermonter still disagree even after accounting for demographics.

not needed

Pool nationally after demographic conditioning.

  • Technology 0.119
  • Environment / Climate 0.119

optional

Include for sensitive analyses; required for young agents.

  • Health 0.129
  • Family & Society 0.134
  • Economy 0.135
  • Religion 0.141
  • Politics & Government 0.142
  • Race & Inequality 0.145
  • Immigration 0.150

required

Geographic conditioning is mandatory. Views covary with local immigrant community composition.

  • International 0.174
+70%
The age modifier. For 18-29 year olds, geographic variation is up to 70% higher than for older cohorts, in every income tier and every domain. If you're simulating young Americans, tier geography up by one regardless of what the domain table says.
04Built For Policymakers

Built for policymakers.

Understand public opinion before you propose, test messaging before you communicate, and identify support gaps before they become problems.

Pre-Proposal Testing

Test policy ideas before committing resources. Understand which demographics support or oppose your proposal.

Example: Universal healthcare, minimum wage increase, housing reform

Message Testing

Compare different framings of the same policy to find messaging that resonates across demographic groups.

Example: Climate policy as "jobs program" vs. "environmental protection"

Coalition Building

Identify which demographic groups are most supportive and where you need to address concerns or build awareness.

Example: Finding unexpected allies or opposition segments

TRUSTED BY

Municipal, State, and Federal Agencies

CivicSim's demographically grounded approach means you're not just getting an AI's opinion. You're getting predictions based on empirical census data and validated opinion research.

94.2%
Demographic Coverage
±1.4%
Margin of Error
38k+
Validated Respondents
How CivicSim differs
  • Census-grounded samplingvs. survey-only or generic prompting
  • Empirically selected variablesvs. conventional demographics only
  • Domain-specific conditioningvs. one-size-fits-all approach
Try the Simulator
05Framework

Three corrective steps.

CivicSim operationalizes a single principle: the decision of who to simulate, and along which demographic axes, is an empirical question, not a design preference.

STEP 01

Draw agents from census microdata

Sample synthetic agents from ACS PUMS (~2.5M adult records per year) rather than survey sample data. Population representativeness becomes a property of the data, not a research question.

→ Validated by Study 01
STEP 02

Select conditioning variables empirically

Run a leave-one-out or greedy IG ablation over the survey corpus for the target domains. Always include race and Census division, interaction-dominated signal cannot be recovered from marginal rankings.

→ Validated by Study 02
STEP 03

Apply tiered geographic conditioning

Use the domain classification from Study 03 to determine whether geography is required, optional, or unnecessary. For young agents (18-29), tier up by one level regardless of domain.

→ Validated by Study 03
06Paper

Read the full work.

UC BERKELEY · CAPSTONE 2026

Ground It Before You Simulate It: The Case for Demographically Grounded LLM Simulations

We argue that current LLM-based public opinion simulations are not approximations of a representative population but consistent, predictable distortions at the input level, and that fixing this is methodologically prior to all other concerns about LLM agent quality.

CivicSim Team · UC Berkeley