It seems like a simple question: How many African-American boys scored proficient on California’s math test?

Go ahead, try to answer it. I’ll wait while you Google.

You’ll probably end up here, at the California Department of Education’s snazzy (by state government standards) website, which provides a litany of bar graphs, charts and filters to help inform the public how California’s students fare on standardized tests. If you’re a persistent nerd, you might even end up here, where the state posts some of the underlying testing data in large downloadable datasets for researchers and advocacy organizations to dig through.

But although you’ll be able to find how all black students are performing (a dismal 18 percent passed the test), or how boys in general are performing (37 percent passed), you won’t be able to find the combination of the two. Nor could you look up in which schools African-American boys performed the best or worst.

This isn’t just an exercise. Imagine you’re the mother of African-American boys trying to enroll her sons in the best possible school district in the region. Or an academic researching the interplay of race and gender in education outcomes. You might be able to make some simplifying assumptions and guess. But, as a mother or a researcher, you would rather just have the data.

That’s just one example of several data points you may be surprised to learn that California doesn’t provide at your fingertips—even if you stretch. Below you’ll find five more, gleaned from interviews with advocacy organizations, researchers, and my own experience as a data journalist trying to fetch numbers from the state.

Sometimes the state has legitimate reasons for not collecting or publishing such data: confidentiality concerns, resource constraints, the inherent limitations of what technology can and cannot track. Sometimes the explanations are less compelling. Whatever the case, the data points below represent a diverse set of policy questions for which Californians are flying data-blind.

1. What percentage of California high school graduates enroll in college?

UC Davis students, photo by Katie Hetrick via Flickr

UC Davis students, photo by Katie Hetrick via Flickr

The Missing Data: California has lots of data on its high school students—how many graduate, how many meet University of California coursework requirements, how many drop out. It’s not always beautifully displayed, but it’s there. But longitudinal data tracking what happens to students after they graduate—the holy grail for education researchers—is sorely lacking. While admissions data from the UC and California State University systems provide some valuable information, California does not know the total number or percentage of students it sends to post-secondary institutions or the demographic makeup of those students, much less how many ultimately earn degrees.

Why It’s Important:

The percentage of students a school sends to college is an intuitive outcome for how well a school is performing—and, at the state level, how well the public K-12 system is performing overall and compared to other states.

“If you ask most parents about what they want to know about schools, they want to know whether their children are going to a school that’s going to send them to college,” says Edgar Cabral, an education policy expert at the state Legislative Analyst’s Office. “But right now we can’t tell them that.”

Perhaps more importantly, tracking what happens to high school students once they enter college can provide insights about why many students—and especially low-income students and students of color—fail to earn advanced degrees.

Why We Don’t Have That Data: Mostly because it involves a major degree of cooperation and infrastructure-building between separate education institutions—K-12 public schools, UC, CSU, community colleges, and associations of private universities.

Creating a reliable longitudinal dataset for all California students is no doubt an ambitious endeavor. But it’s not impossible. According to the Education Commission of the States, 37 states have at least some form of longitudinal data on their students—and 16 states track their students from pre-K through the workforce. That’s an amazing dataset for researchers to explore what works and what doesn’t in the classroom.

2. How much does California spend on prescription drugs?

Prescription drugs

The Missing Data: According to the Legislative Analyst’s Office, California spent at least $3.8 billion on prescription drugs across a wide variety of agencies, including state prisons and the University of California system. Why “at least?” Because for nearly 80 percent of Californians on Medi-Cal—the state’s health insurance program for low-income residents—the state does not publish a precise estimate for how much we’re paying for prescription drugs.

Why It’s Important: Rising drug prices are taking a toll on government coffers across the country. Quantifying just how much the state spends on prescription drugs—and how it has changed over time—would seem to be a helpful figure in the debate over how best to rein in costs, and how well California is controlling costs compared to other states.

Why We Don’t Have That Data: Well, we actually kind of do. But, like seemingly everything else in the world of health care policy, it’s buried beneath escalating layers of opacity and complexity. As someone recently said, who knew health care could be so complicated?

About 10 million Californians are enrolled in Medi-Cal managed care plans—commercial, nonprofit or county-run health plans that the state contracts with to cover low-income patients. The state gives these plans a monthly “capitation” payment for each patient they serve, and adjusts that payment to reflect rising health care costs.

How much of those costs can be attributed to prescription drugs? The Department of Health Care Services can help disentangle that, but it’ll involve some work on their end and cost about $1,500 for you to get enough data for a five-year-trend.

In fairness, this isn’t as bad as it sounds. The state will often bypass managed care plans and pay for extremely expensive drugs directly, and you can get decent data on that. You can also kind of get at managed care drug expenditures by combining separate data from the Centers for Medicare and Medicaid Services and the state. But that doesn’t really tell you exactly what you want to know, it complicates trying to make a calculation on a per-capita basis, and you have to do a decent amount of data-crunching yourself.

3. How many men of color are currently in county jails?

The Missing Data: California has more than 70,000 people locked up in county jails, a population that has fluctuated dramatically in recent years after a series of major criminal justice reforms. The state does not collect demographic data on those county inmates, including race/ethnicity and age.

Why That Data Is Important: It’s difficult to study disparities in California’s criminal justice system without reliable demographic data on who is behind bars. But perhaps more importantly, tracking inmate characteristics such as race and age can help researchers and law enforcement gauge the effectiveness of various probation programs used on different types of offenders. A job training program may work well for one type of offender, and not so well for another. Reducing recidivism with smart, evidence-based programs starts with knowing the population those programs serve.

Why We Don’t Have That Data: Like many gaps in California data, the rub lies in accurately capturing information at the local level. Counties voluntarily complete a monthly jail profile survey that gives the state an idea of its average daily inmate population, but requiring them to provide any more information without additional funding could be considered an unfunded state mandate on local government. The Public Policy Institute of California has teamed with the state and 13 counties on an ambitious project to improve county jail data systems, and has produced some demographic data on the jail populations of those counties.

4. How many new housing developments are blocked by NIMBYs?

San Jose Housing Development, photo by Sean O'Flaherty via Wikimedia Commons

San Jose Housing Development, photo by Sean O’Flaherty via Wikimedia Commons

The Missing Data: What happens after a developer submits a proposal for a new housing project to a local planning commission? While the state compiles some data from local jurisdictions on housing projects that are ultimately approved, there’s not much known about projects that are ultimately rejected or die on the permitting process vine.

Why It’s Important: The California Department of Housing and Urban Development estimates that the state needs 180,000 new units of housing annually just to keep pace with rising demand. For the last decade, the state has averaged 80,000 new units per year. That mismatch has resulted in some of the highest rent burdens and lowest homeownership rates in the country.

Exactly what obstacles are preventing more housing from being built is a constant source of debate and frustration among policymakers, especially as the state tries to incentivize cities to add more housing stock.

“We have some permit data, but without the full pipeline we can’t answer some basic questions,” says Sarah Mawhorter, a postdoctoral researcher at UC Berkeley’s Terner Center for Housing Innovation. “We don’t know what types of projects are being rejected, which could be enormously helpful.”

Data points like the average amount of time it takes for an application to be approved, or whether multi-family housing gets rejected at a higher rate than single family housing, or which cities have the most difficult permitting processes, can’t be answered conclusively without more detailed data collection.

Why We Don’t Have That Data: This would be an ambitious undertaking, and as Mawhorter points out, is less a deficiency in what the state currently collects than a dream dataset of tremendous potential. Again, the problem lies in getting localities to publish and report their own data in a standardized, accessible, transparent format. Cities have to document this data internally, but there’s no requirement to report it to a statewide database in any way.

5. Who Gets Thrown Off Welfare?

Homeless person's abandoned spotThe Missing Data: There are about 430,000 California families on CalWORKs, the joint federal-state program that provides cash payments and work training for needy families. While the state keeps tabs on the ethnic mix of families receiving benefits (about 54 percent are Latino), we don’t know much about those families that are kicked off the program for fraud.

Why It’s Important: Advocates for low-income families suspect there could be bias in how county welfare administrators investigate program abuses in CalWORKs. Caseworkers have considerable discretion in investigating fraud, and advocates fear explicit or implicit prejudices could factor into which families must re-pay benefits or are kicked off the program entirely.

“We know implicit bias can have a strong impact on case by case decisions,” says Jessica Batholow, policy advocate at the Western Center on Law and Poverty. “You may have different races and cultures colliding, and you can have a misunderstanding of what people are saying and why they’re saying it.”

Why We Don’t Have That Data: The Department of Social Services, which oversees CalWORKs at the state level, does not require counties to report demographic information on its fraud cases.

These are just a handful of data points Californians can’t access—many more are out there. If you have your own data disappointment with the state, share them by emailing matt@calmatters.org or tweeting at @mlevinreports.