The Piñata Theory of School Reform 
The misapplication of NAEP testing results to beat on the public schools in expectation of goodies tumbling forth
BY JAMES J. HARVEY/School Administrator, May 2019

James Harvey believes public school detractors distort the use of international assessments.
Carla Santorno, the superintendent in Tacoma, Wash., was bemused by the question. During a 2018 interview with staff from the National Superintendents Roundtable, she’d been asked how her school district uses results from the National Assessment of Educational Progress, or NAEP.

Santorno had an answer ready, saying that Washington ranks among the top dozen states in math achievement, according to NAEP, but she added the tests “don’t provide enough evidence to improve our instruction as a system, at the school level or for specific students.”

Santorno acknowledged the increasing pressure to link NAEP benchmarks to state assessments such as the Common Core tests, including Partnership for Readiness for College and Careers and the Smarter Balanced Assessment Consortium (often referenced as PARCC and SBAC). This pressure requires educators to understand the implications of NAEP’s definition of what it means to be “proficient.”

Variations Afoot
The Tacoma superintendent was correct on both counts. With the exception of 27 large cities that participate in NAEP’s Trial Urban District Assessment, NAEP results are not available for school districts. NAEP’s major reports provide information on how students perform at grades 4 and 8 (known as Main NAEP) or at ages 7, 11 and 17 (known as Long-Term NAEP). The results are provided nationally and broken down by state and by significant demographic variables. Outside the trial urban districts, school leaders, while perhaps troubled by the national reports, have no NAEP findings that apply to their school districts.

But that is about to change in subtle and little-understood ways. NAEP’s proficiency benchmark has pretty much been accepted as the bar students will be expected to clear under PARCC, SBAC and statewide assessments under the Every Student Succeeds Act. What this means is that individual districts now are beginning to hear variations on the theme of school failure based on state benchmarks closely linked to NAEP’s proficiency standard.

A particular concern in these failure narratives is that typically only about one-third of students meet the NAEP benchmark of “proficient” on tests of reading and mathematics. According to reform advocacy groups like Achieve, Education Trust and The 74, that means two-thirds of American students can’t perform at grade level. Or does it?

Paying Heed
The very term assessment is enough to make the eyes glaze over. For most school leaders, the internal workings of student assessment are as mysterious as string theory or quantum physics.

But whether the assessments are state, national or international, it’s time to pay attention. And school leaders don’t have to get into the weeds to get on top of things. They can draw on resources such as The Global Education Race: Taking the Measure of PISA, a 2017 book by three distinguished statisticians, and “How High the Bar?” and “School Performance in Context,” two detailed reports co-published by the National Superintendents Roundtable and the Horace Mann League.

The picture painted by these sources illustrates assessment methodologies that have improved by leaps and bounds in recent decades, producing results at the state, national and international levels that politicians and school critics have abused, misused and misconstrued.

The National Level
Statistical analyses dating back decades have asked: How would students abroad perform if held to NAEP’s proficiency benchmark?

The answer: The vast majority of students in most countries can’t clear that bar. Not a single nation can demonstrate that even 40 percent of its 4th graders are proficient in reading by NAEP’s definition. That’s why a succession of scholars and organizations, including the National Academy of Sciences, have labeled NAEP’s proficiency benchmark “wishful thinking,” of “doubtful validity” and producing results that “defy reason” and “refute common sense.”

In truth, while critics equate NAEP’s proficiency benchmark with grade-level performance, the U.S. Department of Education has long noted that if observers are interested in grade-level performance, NAEP’s “basic” standard is the appropriate benchmark. By this metric, the critics’ claim of failure is stood on its head. Depending on the grade and subject, up to 82 percent of U.S. students are performing at grade level. While this is not perfect, it is also not a sign of shameful failure.

NAEP’s highly debatable proficiency benchmark is employed as a weapon in what Roland Chevalier, former superintendent in St. Martin Parish, La., liked to call the “Piñata Theory” of school reform: Keep beating the schools until good things fall out of them.

State Assessments
State benchmarks for career and college readiness standards closely track NAEP’s proficiency benchmark, according to “How High the Bar?” That’s why Michael Hynes, superintendent of Long Island’s Patchogue-Medford School District, insists that educators need to understand the benchmark connections between NAEP and the Common Core assessments.

He believes superintendents and other school leaders need to stand up and object loudly when NAEP and state assessment data are misused to support claims of school failure. Proficiency on NAEP (and equivalent benchmarks in PARCC, SBAC and state assessments) represent pie-in-the-sky thinking. If Singapore and Finland, normally thought to be world beaters on international assessments, cannot haul 40 percent of their 4th graders over the “proficiency” bar in reading, why would we establish such a standard as an appropriate benchmark for all our students?

PARCC assessments in reading/language arts and mathematics at grades 4 and 8 are especially problematic. PARCC’s standards are equivalent to NAEP’s proficiency benchmark or approach it closely. SBAC’s, on the other hand, are more akin to NAEP’s basic benchmark in grade 4 reading and mathematics and approach the proficient benchmark by grade 8.

At the state level, Florida’s and New York’s statewide assessments are in lock step with NAEP’s proficiency benchmark in both reading/language arts and mathematics in grade 4 and grade 8. This helps explain why hundreds of thousands of parents in New York have had their children opt out of the state’s assessments in recent years.

International Assessments
Misunderstandings about international assessments are rife. The testing experts who developed these assessments decades ago were clear they should not be used to rank nations against each other. That idealistic sentiment vanished the minute politicians and public school critics got their hands on the results.

We also should understand that despite proclamations about the comparative performance of high school graduates, no such thing as a comparative assessment of students in the last year of secondary school exists. The closest to a final-year comparative assessment is PISA, but it assesses 15-year-olds, not high school seniors. Moreover, its sample does not represent all 15-year-olds in each society, just those 15-year-olds enrolled in each nation’s schools. That means that what seems to be the wealthiest and most able one-third of 15-year-olds in China, according to estimates by The Wall Street Journal, are being compared with essentially 100 percent of 15-year-olds in the United States and the rest of the developed world. No wonder the performance of Shanghai’s students on PISA “stunned” observers as The New York Times credulously reported in 2010.

Equally troubling, Chinese urban public schools are able to refuse admission to the children of rural laborers who migrate to cities in search of work. That’s millions of migrant laborers and millions of children, all Chinese nationals, are ignored in PISA testing. In the United States, by contrast, as every superintendent knows, the U.S. Supreme Court has ruled that every child in a public school, with or without immigration papers, is entitled to a free public education.

Destructive Memes
It is time school leaders called out false and destructive myths about school failure in the U.S. You won’t be thanked for pointing out this assessment malpractice. You’ll be greeted with smooth evasions, bland denials, half-truths and misstatements of your position.

But education leaders cannot blow an uncertain trumpet. Other data make several things clear: American schools are better now than they have ever been. They are the best public schools in the world. The myths are designed to evade public discussion of a troubling reality. Millions of students in the U.S. are living in Third World conditions. Shedding crocodile tears about test results is a cheap way for critics and public officials to pretend they care about children in these communities while ignoring their basic needs.

None of this is intended to sugarcoat the very real problems the U.S. faces in its schools. Achievement gaps are real. Our schools are segregated. Savage opportunity gaps exist. And for the first time in our history, the majority of students in our schools are children of color and from low-income families. We need to address these difficult and complicated issues. Wildly exaggerated claims of school failure based on highly questionable interpretations of test data are destructive, not helpful, as educators take up this challenging work.

JAMES HARVEY is executive director of the National Superintendents Roundtable in Seattle, Wash. Twitter: @natsupers