Editor’s Note: Below you will find SchoolBook’s approach to organizing school data, as designed by Rob Gebeloff, a New York Times data journalist who has extensive experience covering education. He wrote the following explanation in 2011. It will be updated on a regular basis. Make sure to check out the FAQ too.
When the forecast calls for temperatures in the 90s with 85 percent humidity, you can almost feel the sweat beads forming on your skin.
When your favorite football team signs that 240-pound linebacker who runs 40 yards in 4.5 seconds, you can easily envision the opposing team’s quarterback ducking for cover.
But what to make of an elementary school with an average class size of 22.5 students? Is that a little or a lot? Or a high school where 65.9 percent of the students graduate with a Regents diploma; is that high or low?
You can try and apply logic — 22.5 seems like a reasonably sized classroom, two out of three sounds pretty good — but because you do not have as much personal experience with school-based statistics as you have with the weather or with football, it can be hard to make an instant connection between the number and its true meaning.
This is where SchoolBook can help. We have pulled together thousands of public records about schools in New York City from a variety of sources, and developed a system to translate them into a format that makes the raw data points easier to understand, as well as more meaningful.
In mathematical terms, we’ve “standardized” the data: we’ve summarized it and simplified it, with a 1-to-9 scale and labels of “below average,” “average” and “above average.” As often as possible, we have used graphics to help visualize a value and place it in context.
What we have not done, quite purposely, is grade or rate schools.
The student population in New York is so vast and varied — as is, increasingly, the array of school choices — that to do so objectively seemed impossible, and certainly inappropriate for us. Instead, we will tell you how schools do on tests, where classrooms are most crowded, who has the highest-paid teachers and what we know about the demographics of the student body. But we will leave it to you to decide which of these or other factors are most important to you; our “Search and Compare” tools will help you focus in on those priorities and see how the schools stack up.
The 1-to-9 numbers, then, serve as indexes, or summaries, to the troves of underlying data, all of which are also available on SchoolBook for those who want to dig deeper — and deeper.
I culled most of the data on this site from reports posted on the Internet by the city and state education departments, but I also made some specific requests to these agencies for more information.
To develop SchoolBook’s system of summarizing and scaling the data, we studied various systems for quantifying complex information, and consulted with experts both inside and outside The New York Times and WNYC, including Nate Silver of our FiveThirtyEight blog, and Aaron Pallas, an education professor at Teacher’s College with an expertise in both statistics and the city schools.
While SchoolBook includes detailed data on dozens of topics for all of its almost 1,800 public schools across the city, we have highlighted summary 1-to-9 scores in three key categories: Performance, Satisfaction and Diversity.
“Performance” is a compilation of scores on state standardized tests (in grades 3 to 8), Regents scores, SAT scores and the number of students graduating with Regents diplomas (for high schools).
“Satisfaction” is based on annual city surveys of teachers, parents and students that are packed with information but rarely reviewed.
“Diversity” deals with the racial and ethnic makeup of the student body and essentially tells you how likely it is, if two students at a given school were selected at random, that they would be of different backgrounds.
Because demographics are so closely related to academic performance, each school also has a “Needs Index,” which is shown as a dot on a line ranging from low to high. This is something we take directly from the city’s Education Department that factors in how many students at a school are poor, disabled or learning English as a second language, and, for high schools, how they performed on tests in earlier grades.
The city uses the Needs Index to create demographic peer groups of schools and bases its progress reports largely on how an individual school performs in comparison with this group — an effort to equalize the playing field.
SchoolBook takes a different approach. Our 1-to-9 scores do not take demographics into account, but we offer the Needs Index as a useful bit of context through which to consider the numbers.
Below, you will find a more detailed explanation of how we have crunched the numbers; a glossary of terms; a discussion of the limitations of our system; and a preliminary list of frequently asked questions (and answers).
In education, even seemingly straightforward statistics, like test scores, need context. A score of 650 on the state’s eighth-grade English test, for example, is about average in New York City, but the same 650 on the fourth-grade English exam is below average.
This contextualization is the first pillar of our approach. Every figure for a school is placed in context of the average value for all schools that have data for a particular category.
Then we quantify how far away each value is from the average. The challenge with interpreting raw data is that every variable has its own scale — 10 points on the SAT test is far less important than 10 points on a graduation rate. This is why mathematicians often measure differences in “standard deviations,” which allows for comparisons regardless of any data point’s original scale.
Scores that calculate standard deviations from the mean — including our 1-to-9 scale — are known as Z-scores. Our Z-scores represent how many standard deviations a given value is from the city average, enabling comparison between, say, SATs and graduation rates.
But standard deviations from the mean are not easy to digest. Fortunately, Z-scores arestatistically linked to the familiar Bell Curve probability distribution. So, to make things simpler to understand, statisticians often convert Z-Scores to a percentile ranking, which is on a 100-point scale.
For example, a Z-score of 0.25 is about a 60, a Z-score of -0.13 is about a 45, etc. Most people know, instinctively, what to make of 45 or 60 out of 100.
We went further, collapsing those percentiles into index scores from 1 to 9. To be perfectly clear, we label 1, 2 and 3 “below average;” 4, 5 and 6 “average”; and 7, 8 and 9 “above average.”
For example: In 2010, students at Tottenville High School posted an average SAT score of 1408. How does that compare with other city high schools? Well, the average school score for all the city’s high school was 1219, meaning Tottenville’s score was 189 points above the average. As it turns out, the standard deviation for city high schools was 176 points, so Tottenville was just over one standard deviation above the mean. In a perfectly distributed set of data, that one standard deviation equates to the 85th percentile — above average, and on our index scale, a 7.
There are two main reasons we decided to use the 1-to-9 scale. One is that it is easier to comprehend, particularly when looking at a lot of numbers quickly, than the actual values or a percentile ranking.
Secondly, percentile rankings give a sense of precision that we do not mean to imply: in most cases, the 68th percentile is not meaningfully different from the 65th. They would get the same value in our 1-to-9 scale, leaving users with the more accurate impression that they are about the same.
Of course, any such system has to have cutoff points, and they are necessarily subjective. We chose ours to insure a normal and fair distribution. The bar chart shows the expected distribution. For any variable , roughly 60 percent of schools should end up with a summary score of 4, 5 or 6; 20 percent with 7, 8 or 9; and 20 percent with 1, 2 or 3 — so “below average” and “above average” are truly far from the average. We chose a 1 to 9 scale, instead of 1 to 10, so there would be a middle, 5, and four parallel gradations above and below.
As you peruse the site, you’ll frequently see three indexes — Performance, Satisfaction, Diversity. Here’s what is behind those “At-a-Glance” 1-to-9 summaries.
Performance is the combined scores of a school’s students on standardized tests. For elementary and middle schools, this means the state’s tests, which are given each spring in English and math to grades 3 through 8. For high schools, we combined the scores on 11 Regents exams in four subjects — math, English, science and history — along with SATs and the percentage of seniors who graduate with Regents diplomas.
This doesn’t tell you what school is best for your child; nor does it say, as the city’s report cards do, which schools showed the most progress, or how students did compared with their demographic peers. It just says who got the highest scores on standardized tests.
Sticking with Tottenville, it has a Performance index of 6. This is how we got there:
|Weighted Diploma Rate||186||0.644||6 (Average)|
|Regents English||85%||0.748||6 (Average)|
|Regents Mathematics||63%||0.541||6 (Average)|
|Regents Science||53%||(0.011)||5 (Average)|
|Regents History||72%||0.546||6 (Average)|
|Average SAT Score||1408||1.072||7 (Above Average)|
|Average Z-Score||0.59||0.74||6 (Average)|
The summary score for “performance,” then, is a straight average of the component Z-scores. On average, Tottenville students performed 0.59 standard deviations above the city school average on these measures.
We then compare that with the average Z-scores of all the city’s high schools, which produces a final Z-score of 0.74 — a 6 on our distribution. We compare schools with similar grade structures — and therefore, similar required tests — when tabulating the summary score.
The Satisfaction score is based on the city’s annual survey of parents, students and teachers, which generally yields some one million responses.
We realize there is some controversy with this survey — some have raised concerns that groups motivated to either boost or damage a school’s reputation could easily manipulate the results. But we have decided to highlight it, both because it is an official measure the city uses to judge schools and because the surveys contain interesting information that is not measurable with other data.
The Education Department breaks the survey findings into four categories: academic expectations, safety, engagement and communications. The city assigns each category a score from 0 to 10, then adds in the school’s attendance rate and compares the results with peer schools to create a letter grade for what the progress report calls a “learning environment.”
At SchoolBook, we combined the scores for the four categories to create our index, which we offer as a measure of how pleased some parents, students and teachers — those who chose to complete the survey, whatever their reasons — are with their schools. You can also click on links to the city’s more detailed accounting of the survey results, including how respondents answered specific questions.
To continue with our Tottenville example, the school earned 29.4 out of a possible 40 points in the four categories. That’s about 0.5 standard deviations above the mean, giving Tottenville a Satisfaction index of 6.
Diversity, our third index category, is a big buzz word in education and beyond. It can be used to refer to many things — religion, gender, economic class, educational background, or what sports or ice cream flavors people prefer. Our index deals with none of these things: it reflects students’ race and ethnicity.
In school data, students are classified as white, black, Hispanic, Asian, Native American or multiracial. Our diversity index is based on the statistical probability that two students in a school, chosen at random, would be in different categories.
Schools that have a relatively balanced mix of students from these racial groups end up with high diversity scores, since the odds that two students would be from a different group are high; schools that are overwhelmingly African-American, say, or Asian, score low.
At Tottenville, 82.5 percent of the students are white, 10 percent are Hispanic, 5.5 percent are Asian, and 2 percent are black. The chances that two randomly selected Tottenville students would be members of a different racial group are 30 percent — 1.25 standard deviations less than the average of the city’s high schools, for a summary score of 3.
We would never argue that these three index scores are definitively the most important categories, or the only things you need to know about a school. But we find each to be very interesting, and they are distinct from one another.
We hope you will dive deeper into the data to examine, also, statistics on class size and teacher experience and new variables released by the city that show how students perform after leaving school. We hope this will give you much fodder for thought and discussion.
It is important to say, first of all, that data is only one window onto a school. Most educators will tell you that the best way to judge a school is to walk its hallways, meet its principal, and listen to its teachers and students; the numbers are completely blind to the intangibles.
Within the world of data, reasonable people can come up with different useful ways to organize and compare it. We want you to know from the start that there are some particular, inevitable limitations to our analysis.
The education universe is vast, complex and ever-changing. Data are released at various times of the year, schools open and close; we’re never going to have every category for every school for a given year. So you might see 2010 SAT scores and 2011 elementary-school tests.
And even the most current data always lags at least a year behind reality. What’s going on in your school today may not reflect what happened in 2010 — especially, if, say, there has been a change in the principal’s office — but if 2010 is the most recent year for which there is data, that’s the best we can offer.
We also have no data for new schools, and limited data in schools that serve the early grades, where students might not take standardized tests. So, sometimes, our summary Performance scores are based on different numbers of variables for different schools, making comparison imperfect.
In high schools, there are generally six variables, and in some cases, eight; we decided to only present summary scores for schools with at least three data points.
(We have even less, and less consistent, data for private schools, because there is not much they are required to report to the state.)
Some users may not like the fact that, in order to plot everything on our 1-to-9 scale, we had to pick cutoff points, which can seem arbitrary. (A longer discussion is below in the FAQ). And others will quibble with whether Performance, Satisfaction and Diversity are the best three categories to highlight.
We are eager to hear what you think. You can e-mail comments or questions to firstname.lastname@example.org.
Q. You say you don’t want to “rate” schools, yet those three numbers on the top of every school page look an awful lot like ratings. What’s the distinction?
A. In our mind, a rating system involves combining variables across categories into a single, summary grade. Building such a system involves a series of subjective choices about what variables to include and how to weight them.
Our indexes combine variables, but within narrow, specific categories. For example, our performance score combines the results on tests in many grades into a single result — we feel this is a service to readers by boiling down a long list of numbers into a more digestible figure. It also follows the logic of representing how the school performed overall on tests.
Q. The Education Department has its progress report system that assigns every school a letter grade. How does that system differ from what you’ve done?
A. The progress report system is an entirely different animal, reflecting, in part, the distinct missions of the Education Department (to assess performance of its employees and their “products”) and SchoolBook (to provide the public information).
The city created its progress reports to drive accountability — they are used in decisions on staffing, bonuses, financing, closings — and their emphasis is not only on how students are doing, but on how much they are improving. The grades are based on a complex formula that draws from scores of variables across multiple categories and generally compare a school’s performance primarily with its demographic peers.
By creating a different system, we are not disputing the city’s methodology. We’re making different measurements — we want our users to be able to focus on specific variables that are important to them, and to consider demographics — the diversity score as well as the Needs Index — as they see fit.
Q. Everybody knows that test scores and other education statistics are largely driven by demographics. Why did you decide not to make any adjustments that account for schools that serve difficult-to-educate populations?
A. This goes back to the question of whether we’re rating schools or summarizing the data. If we were to rate the schools, then it would only be fair to “regress” the test scores or use some other method that accounted for poverty, English language acquisition, and other sociological factors that affect learning.
But we’re not trying to rate schools — we’re mainly trying to help parents learn about them. And when parents look for schools, they’re not looking for one that is good only when its data is adjusted for poverty. They’re looking for how many kids are proficient and how high their SAT scores are, regardless of their backgrounds.
Our Needs Index is right there next to the summary scores, to provide that context for those who want to take it into consideration.
Q. What is the Needs Index based on, and how should we use it to interpret the data?
A. For this, we’ve adapted the peer index score in the city’s progress reports. We feel the method the Education Department uses to calculate those scores is difficult to improve upon — for younger students, the score is based on poverty and other demographic factors; for older students, the score is based on actual performance of students prior to entering that school.
The city uses these figures to identify 20 “peers” for each school. We took a different approach, standardizing the number and translating it to our 1-to-9 scale. In relative terms, schools with large Needs Index differences would be expected to show large differences in performance; the closer the Needs Index scores, the more relevant and meaningful the comparison.
Q. You say you’re comparing each school with the average school. Is that the same as the city average or total that gets reported whenever new data is released?
A. No, the city average in most reports refers to the performance of the average student in the city. SchoolBook’s calculations are based entirely on school performance — we’re comparing schools with one another, and the average we use is the average of all schools that have data for a given category. So if you’re curious about how your own child compares with other children in the city, you would want to refer to the city average, whereas SchoolBook’s figures will tell you how your child’s school compares with other schools.
Q. I think your scores misrepresent my school. How can I register a protest or give your users what I consider a more accurate assessment of my school?
A. We’re eager for constructive feedback, as well as alternate viewpoints. To share your thoughts directly with SchoolBook editors, e-mail email@example.com. comment on the website or tweet us @schoolbook.
Q. You’re inviting me to compare schools on your site. But you say that the scores are based on how a school compares only to certain subcategories of schools. How do I know that your numbers provide an apples-to-apples comparison for the schools I’m interested in?
A. Sometimes they don’t. Sometimes, schools do not have enough data, especially new schools, and private schools. When you compare them, you may see a blank space or a N/A in certain spots. Also, for high schools, where there are generally six but as many as eight variables, we only provided summary Performance scores for those with at least three variables.
To come up with our 1 to 9 scores, we compared an individual school to the average of all schools in the city serving similar grades. So, elementary schools are compared to elementary schools. The 1 to 9 summary score they get, though, is then comparable to summary scores for other types of schools, because after standardizing the data, the comparisons become more relative.