By Heather Vogell, John Perry and Alan Judd and M.B. Pell
The Atlanta Journal-Constitution
Suspicious test scores in roughly 200 school districts resemble those that entangled Atlanta in the biggest cheating scandal in American history, an investigation by The Atlanta Journal-Constitution shows.
The newspaper analyzed test results for 69,000 public schools and found high concentrations of suspect math or reading scores in school systems from coast to coast. The findings represent an unprecedented examination of the integrity of school testing.
The analysis doesn’t prove cheating. But it reveals that test scores in hundreds of cities followed a pattern that, in Atlanta, indicated cheating in multiple schools.
A tainted and largely unpoliced universe of untrustworthy test results underlies bold changes in education policy, the findings show. The tougher teacher evaluations many states are rolling out, for instance, place more weight than ever on tests.
Perhaps more important, the analysis suggests a broad betrayal of schoolchildren across the nation. As Atlanta learned after cheating was uncovered in half its elementary and middle schools last year, falsified test results deny struggling students access to extra help to which they are entitled, and erode confidence in a vital public institution.
“These findings are concerning,” U.S. Secretary of Education Arne Duncan said in an emailed statement after being briefed on the AJC’s analysis.
He added: “States, districts, schools and testing companies should have sensible safeguards in place to ensure tests accurately reflect student learning.”
In nine districts, scores careened so unpredictably that the odds of such dramatic shifts occurring without an intervention such as tampering were worse than one in 1 billion.
In Houston, for instance, test results for entire grades of students jumped two, three or more times the amount expected in one year, the analysis shows. When children moved to a new grade the next year, their scores plummeted — a finding that suggests the gains were not due to learning.
Overall, 196 of the nation’s 3,125 largest school districts had enough suspect tests that the odds of the results occurring by chance alone were worse than one in 1,000.
For 33 of those districts, the odds were worse than one in a million.
A few of the districts already face accusations of cheating. But in most, no one has challenged the scores in a broad, public way.
The newspaper’s analysis suggests that tens of thousands of children may have been harmed by inflated scores that could have precluded tutoring or more drastic administrative actions.
The analysis shows that in 2010 alone, the grade-wide reading scores of 24,618 children nationwide — enough to populate a midsized school district — swung so improbably that the odds of it happening by chance were less than one in 10,000.
Cheating is one of few plausible explanations for why scores would change so dramatically for so many students in a district, said James Wollack, a University of Wisconsin-Madison expert in testing and cheating who reviewed the newspaper’s analysis.
“I can say with some confidence,” he said, “cheating is something you should be looking at.”
Statistical checks for extreme changes in scores are like medical tests, said Gary Phillips, a vice president and chief scientist for the large nonprofit American Institutes for Research, who advised the AJC on its methodology.
“This is a broad screening,” he said. “If you find something, you’re supposed to go to the doctor and follow up with a more detailed diagnostic process.”
The findings come as government officials, reeling from recent scandals, are beginning to acknowledge that a troubling amount of score manipulation occurs. Though the federal government requires the tests, it has not mandated screening scores for anomalies or investigating those that turn up.
Daria Hall, director of k-12 policy with the nonprofit The Education Trust, said education officials should take steps to ensure the validity of test results because of the critical role they play in policy and practice.
“If we are going to make important decisions based on test results — and we ought to be doing that — we have to make important decisions about how we are going to ensure their trustworthiness,” she said. “That means districts and states taking ownership of the test security issue in a way that they haven’t to date.”
‘Way too much pressure’
Both critics and supporters of testing said the newspaper’s findings are further evidence that in the frenzy to raise scores, the nation failed to pay enough attention to what was driving the gains.
“We are putting way too much pressure on people to raise scores at a very large clip without holding them accountable for how they are doing it,” said Daniel Koretz, a Harvard Graduate School of Education testing expert.
Test-score pressure is palpable in schools grappling with urban blight and poverty.
These are the schools that the 2001 No Child Left Behind Act was supposed to fix.
But at Patrick Henry Downtown Academy in St. Louis, airy red brick towers rising above the school belie a grimmer reality on the ground. Children leaving one recent afternoon passed piles of trash and a .45 caliber bullet tucked into the curb. Inside, their classrooms are beset by mold, rats, discipline problems and scandal.
Last year, the former principal — once hailed as among the district’s strongest — was accused by Missouri officials of falsifying attendance rolls to get more state money.
State investigators didn’t publicly question Henry’s test scores.
But the AJC’s analysis found suspicious scores in the school dating back to 2007. In 2010, for instance, about 42 percent of fourth-graders passed the state math test. When the class took the tests as fifth-graders the next year — with state investigators looking into cheating and other fraud allegations — just 4 percent passed math.
Experts say student learning doesn’t typically jump backwards.
Henry’s scores were consistently among the lowest in the state — except for the occasional sudden leap.
After school one recent afternoon, Deborah Dodson, who sends two children to the school, said she saw a teacher provide inappropriate one-on-one assistance during a state test. And she’s heard from other parents that teachers will give students answers.
Some students who aren’t likely to test well don’t receive tests at all, she said. “They don’t do anything by the book,” Dodson said. “That school and how they do things is not right.”
Rural, city schools flagged
The AJC used freedom of information laws to collect test scores from 50 states to look for the sort of patterns that signaled cheating in Atlanta. A Georgia investigation last year found at least 178 Atlanta educators — principals, teachers and other staff — took part in widespread test-tampering.
In each state, the newspaper used statistics to identify unusual score jumps and drops on state math and reading tests by grade and school. Declines can signal cheating the previous year. The calculations also sought to rule out other factors that can lead to big score shifts, such as small classes and dramatic changes in class size.
Some school leaders accused of cheating have attributed steep gains to exemplary teaching. But experts said instruction isn’t likely to move scores to the degree seen in the AJC’s analysis.
Through teaching alone, Wollack said, “it’s going to be pretty tough to have that sort of an impact.”
The AJC developed a statistical method to identify school systems with far more unusual tests than expected, which could signal endemic cheating such as that which occurred in Atlanta. The newspaper’s score analysis used conservative measures that highlighted extremes and were likely to miss many instances of cheating.
Big-to-medium-sized cities and rural districts harbored the highest concentrations of suspect tests. No Child Left Behind may help explain why. The law forced districts to contend with the scores of poor and minority students in an unprecedented way, judging schools by the performance of such “subgroups” as well as by overall achievement.
Hence, high-poverty schools faced some of the most relentless pressure of the kind critics say increases cheating.
Improbable scores were twice as likely to appear in charter schools as regular schools. Charters, which receive public money, can face intense pressure as supposed laboratories of innovation that, in theory, live or die by their academic performance.
Common problems unite the big-city districts with the most prevalent suspicious scores: Many faced state takeovers if scores didn’t improve quickly. Teachers’ pay or even their continued employment sometimes depended on test performance. And their students — mostly poor, mostly minority — were among those needing the most help.
The analysis, for instance, flagged more than one in six tests in St. Louis some years. In Detroit, it was one in seven.
Dozens of school systems in mid-sized cities — such as Gary, Ind.; East St. Louis, Ill., and Mobile, Ala. — exhibited high concentrations of suspicious tests, too.
Though high-poverty city schools were more likely to have suspicious tests, improbable scores also showed up in an exclusive public school for the gifted on the Upper West Side of Manhattan. And they appeared in a rural district roughly 70 miles south of Chicago with one school, dirt roads and a women’s prison.
The findings call into question the approach that dominated federal education policy over the past decade: Set a continuously rising bar and leave schools and districts essentially alone to figure out how to surmount it — or face penalties.
“If you want to keep your job, keep your school out of the news, keep winning awards and advance in your career, you need to make your school look better,” said Joseph Hawkins, a former testing official with the Montgomery County, Md., school system.
Koretz, the Harvard expert, said cheating is one extreme on a continuum that, at its other end, includes gaming the test in legal ways — such as through test-prep drills — that don’t significantly increase students’ overall knowledge or skills.
Even as state test scores have soared, students’ performance on national and international exams has been more mediocre. Cheating and gaming may help explain why.
“The big picture is: Are we seeing apparent gains in student achievement that are bogus?” Koretz asked.
Decade of tumult
Test scores show that instead of progressing steadily in their academics, districts have endured a decade of tumult.
In some of the nation’s biggest cities, dynamic district leaders preached “data-driven” decision-making and even linked test scores to bonuses or principal hiring and firing decisions. Many boasted of taking a corporate approach to education, focusing on student test achievement as the single most important measure of success.
Some of the most persistently suspicious test scores nationwide, however, occurred in districts renowned for cutting-edge reforms.
In Atlanta, for instance, former Superintendent Beverly Hall won national recognition as Superintendent of the Year in 2009. State investigators later confirmed scores that year were widely manipulated by educators who assisted students improperly and outright changed tens of thousands of their answers on state tests.
In some Atlanta schools, cheating was an open secret for years. After students turned in their tests, teachers and administrators erased and corrected their mistakes — even holding a “changing party” at a teacher’s home. In another school, staff cut plastic wrap securing test booklets with a razor, then melted the wrap shut again after making forbidden copies.
State investigators accused a total of 38 principals with participating in test-tampering. One allegedly wore gloves while erasing to avoid leaving fingerprints.
Ultimately, the cheating supported a massive effort to bolster the Atlanta superintendent’s image as a tough reformer who had turned around a struggling system.
In 2002, Houston was the first winner of the Broad Prize, which has become the most coveted award in urban education. The Eli and Edythe Broad Foundation praised Houston’s intense focus on test results. More recently, Houston has been among the leaders in tying teacher pay to student test scores.
But twice in the past seven years, the AJC found, Houston exhibited fluctuations with virtually no chance of occurring except through tampering.
In 2005, scores fell precipitously in five dozen classes in 38 schools after a statistical analysis by the Dallas Morning News suggested test-tampering in Houston. The district fired teachers and principals and improved test security.
In 2011, however, as three-fourths of Houston teachers earned performance-based bonuses, scores rose improbably in a similar number of classes in the same number of schools. In the same year, Houston confirmed nine cheating allegations and fired or took other action against 21 employees.
Through Jason Spencer, a spokesman for the district, Houston officials questioned whether cheating caused all of the unusual score changes the AJC found. He said the district doesn’t think its pay-for-performance plan has made cheating more likely.
“We feel like we put a lot of safeguards in place,” he said, but added: “We know it happens. We would never pretend it’s not an issue.”
Teachers and other school staff in Atlanta were eligible for mostly small bonuses if scores hit district targets. Perhaps more worrisome for principals were the penalties: Former Superintendent Hall boasted of replacing about 90 percent of principals and told new hires they had three years to deliver high scores. Her mantra: “no exceptions, no excuses.”
Three studies of merit-pay programs did not show they consistently produce higher test scores, either legitimately or through cheating, said Matthew Springer, director of the National Center on Performance Incentives at Vanderbilt University.
Yet, he added that “it’s incredibly important that we systematically monitor these programs for opportunistic gaming of the system.”
Pushback from officials
Some school districts and states have taken an apathetic, if not defiant, stance in the face of cheating accusations in recent years.
The AJC sent detailed findings to districts with some of the most suspicious clusters of scores. For those not already publicly looking at cheating, the responses were similar: Officials said they were unaware of most anomalies, but protested characterizing the score changes as cheating.
Several local and state school officials objected to conducting the analysis at all, saying it doesn’t consider enough variables.
Some districts simply denied any problems exist. Detroit, for instance, claimed its scores were not “unusual or out of line in any way” and that Michigan officials had not identified irregularities “with respect to an erasure analysis, suspected cheating, or any other issue.”
In fact, Michigan’s education agency identified six Detroit schools as having statistically unlikely gains on a state test in 2009. At one school, the state determined, sixth-graders averaged 7.4 wrong-to-right erasures. Their peers statewide averaged fewer than one such change.
Analyzing Detroit’s scores from 2008 and 2009, the AJC found suspicious swings in 14 percent of classes. The statistical probability: zero.
Regardless, Detroit officials offered an explanation that experts have said is among the least likely: better teaching.
Steven Wasko, an assistant superintendent in Detroit, said the district has offered before- and after-school programs, expanded summer school, and added extra reading and math instruction. “Increases in student performance,” Wasko said in an email, “could be attributed in part to these factors.”
In a statement, St. Louis school district officials acknowledged the strangeness of score changes, but disagreed that cheating was to blame. They said neither the district nor state education officials have any “credible evidence that testing improprieties have occurred at the schools in question.”
Officials acknowledged, however, that the district has a cheating investigation open at one school. The state said that since 2010 it has received allegations of cheating at two other St. Louis schools identified as suspicious by the AJC analysis. Accusations of cheating persist.
State officials say they do not screen test scores for possible cheating and do not consider unusually high gains to be a sign of test-tampering — if schools provide an explanation.
“We hope to see great gains in our proficiency levels,” said Michele Clark, a spokeswoman.
Dallas officials said that when irregularities surfaced several years ago, they instituted new test security measures and started screening for anomalies.
Few big-city districts have attacked cheating as aggressively as Baltimore.
After he became the district’s chief executive in 2007, Andrés Alonso heard a whistle-blower complain at a PTA meeting about the district’s lax investigation into cheating allegations at her school.
With accused educators sitting nearby, Alonso recalled recently, the room became “a deafening vacuum.”
Alonso ordered a new investigation, which spread into 15 other schools. The district posted independent monitors in each school during tests. In the suspected schools, scores fell dramatically. In other schools, scores continued to rise.
Alonso asked state officials to check test papers for illicit erasures and changes. Their analysis confirmed his suspicions.
At Fort Worthington Elementary, for instance, as many as 20 mistakes were corrected on some students’ tests, often in a lighter shade of pencil.
All of Fort Worthington’s classes posted improbable gains in 2008, the AJC’s analysis shows. The performance level held for two more years, when the school faced the threat of state takeover. After the cheating was detected, statistically unlikely score drops multiplied, occurring in three-quarters of the school’s classes. Similar patterns show up across the district.
Sitting outside the school in her aging station wagon one late winter day, Vernetta Jones-Marshall said Fort Worthington is doing the best it can.
“I don’t even know if it was really a true statement,” Jones-Marshall, 57, said of the cheating allegations as she waited to pick up her son, a fifth-grader. “We didn’t make a big deal about it.”
Cheating is a big deal to Alonso, however.
Most educators act with integrity, he said, but others “feel a sense of impunity” because school officials haven’t always held cheaters accountable.
“I was doing this before the Atlanta story broke,” he said. “This was me feeling that nothing mattered more than the integrity of the school system.”
Call for vigilance
Leaders need to maintain that tough stance even after cheating disappears from the headlines, experts say.
In Dallas, for instance, the score analysis shows the number of suspicious gains dropped after cheating allegations surfaced in late 2004 — but then began inching up again a few years later.
For years, Los Angeles’ scores were among the least suspicious for big-city districts. But when California stopped conducting routine erasure analysis in 2008 for budget reasons, the number of improbable score changes in L.A. climbed steeply.
States and districts find little advice when they do decide to conduct erasure or statistical screenings of test scores.
Federal education officials and testing experts have begun working on new recommendations for detecting and investigating test-score anomalies.
Wollack, the Wisconsin testing expert, said there is room to improve. “Some of the investigations that have taken place in the past have been less than thorough, have been less exhaustive than they should have been,” he said. “Cheating went undetected as a result.”
Districts don’t have a big incentive to unearth ugly truths about their own testing programs. What’s more, most screening methods miss instances of cheating by setting high thresholds in an effort not to falsely identify innocent schools.
“It’s clear there are schools, there are districts, that are under that threshold that are still engaged in some level of misconduct,” Wollack said.
Critics of testing have complained for years that increased pressure brought on by accountability measures leads to more testing abuses.
Education historian and New York University Professor Diane Ravitch said the incessant focus on testing has eroded the quality of instruction.
“All of this is predictable,” said Ravitch, a former top U.S. Department of Education official who in recent years reversed her support for testing and tough accountability measures. “We’re warping the education system in order to meet artificial targets.”
Through programs such as Race to the Top, federal education officials have pushed states to adopt more aggressive teacher evaluation systems that, typically, consider test scores.
“Whatever the stakes were under No Child Left Behind,” Ravitch said, “they are going to be much higher, now that teachers are being told your scores are going to be public and you’re going to be fired if they don’t go up X number of years in a row.”
But Daria Hall, of the Education Trust, said most educators don’t cheat, and testing data is essential for determining if students have basic skills — such as the ability to read.
“What parent doesn’t want to know how their child is doing in reading and in math? What teacher doesn’t want to know how their student is doing?” she said. “You can’t take away the source of the information. We have to make the information better.”
Crisis of confidence
For parents, questions of academic integrity can lead to a crisis of confidence.
The chronically low-performing Nashville district illustrates the conundrum. Test scores in some of the district’s schools have alternately soared and swooped to improbable degrees. Sixth-graders at Two Rivers Middle School ranked among the 10 worst in reading scores in the state in 2010, for instance. One year later, as seventh-graders, they skyrocketed to among the top 25 percent.
Nashville school officials said the data raises concerns about their effectiveness as educators, but not cheating. They echoed other districts’ objections to the analysis, including their relatively high percentage of students learning English and the number of students changing schools from one year to the next.
In Hermitage, a working-class section east of downtown Nashville, Megan McGowan said she was torn about whether to send her son to Dupont Tyler Middle School.
Tests carry too much weight, she said, and teachers face tremendous pressure to produce results. Still, she said, cheating is inexcusable. If it happened at Dupont Tyler, she said, she’d think twice about sending her son there.
“I expect teachers to be ethical,” she said.