STAR Help

Comparing California Standards Test Scores

Any user of the California Standards Test results should compare scores only within the same subject and grade. That is grade 2 English-language arts compared to grade 2 English-language arts or grade 6 mathematics compared to grade 6 mathematics. No direct comparisons should be made between grades or between content areas.

Two types of comparisons are possible: 1) comparing the average scaled score or 2) comparing the percent of students scoring at each performance level. The reviewer may compare schools across years within a school, between schools, or between a school and its district, county, or state. When making comparisons, the reviewer should consider comparing the percentage of students scoring proficient and advanced, since the state target is for all students to score at or above proficient.

Comparing CAT/6 Scores

Same Year, Within School Comparisons
A reviewer may want to compare the performance of students at different grade levels within a school. Similarities and differences in student performance in the same subject may be seen by comparing the percent of students scoring at or above the 50th percentile for each grade. When making this comparison it is important to remember that the number of students in each group affects the confidence in the inferences that can be made. The smaller the group the more cautious one should be in making comparisons. It is also important to note that the national norm groups to which California's students' scores are compared were unique for each grade level.

Same Year, Between School or Between School, District and State Comparisons
When comparing the performance of students between schools or between a school, its district, and the state, the reviewer may compare the average scaled scores within a content area such as reading, language or mathematics. The most defensible comparison is the Percent of Students Scoring At or Above the 50th NPR within each grade and content area.

Comparing 2003 CAT/6 Scores with Stanford 9 Scores

Users should make no direct comparisons between the 2003 Cat/6 scores and the Stanford 9 scores from previous years posted on the Internet. Different publishers developed the two test series, and the CAT/6 tests were developed and normed more than five years after the Stanford 9. Making direct comparisons between the scores for the two test series is inappropriate, because they have different formats and difficulty levels. Score tables that can be used to compare results between the two tests will be available at www.startest.org during fall 2003.

Comparing Stanford 9 Scores

Comparing Group Test Results
Since the Stanford 9 was unchanged from year-to-year and was administered in all California public schools, users may make comparisons from year-to-year as well as between and among schools and/or districts for the tests administered between spring 1998 and spring 2002. The most defensible comparison is the Percent of Students Scoring At or Above the 50th NPR. This is the percentage of students in the group purported to have demonstrated achievement at or above grade level on this particular test. A number of comparisons are possible, each with its own set of cautions.

Same Year, Within School Comparisons
A reviewer may want to compare the performance of students at different grade levels within a school. Similarities and differences in student performance in the same subject may be seen by comparing the percent of students scoring at or above the 50th percentile for each grade. When making this comparison it is important to remember that the number of students in each group affects the confidence in the inferences that can be made. The smaller the group the more cautious one should be in making comparisons. It is also important to note that the national norm groups to which California's students' scores are compared were unique for each grade level.

Same School, Different Years Comparisons
There are two ways to compare two or more years of data for the same school. One can look at a cohort comparison over time by following a group of students from grade-to-grade. For example, if 48% of a school's third graders scored at or above the 50th NPR in 1999, 51% of the fourth graders scored at or above the 50th NPR in 2000, and 54% of the fifth graders scored at or above the 50th NPR in 2001, the school appears effective in improving the achievement of this group of students. When making this comparison, it is important to understand that even if the number of students is the same from year-to-year, that the group's composition may be quite different if student mobility (transiency) is high.

In a cross-sectional comparison, third-grade results are compared from year to year. Since the results for two-separate groups of students are being compared, differences that may exist between the two groups should be considered.

Scaled Score Comparisons (Cohort)
While scaled scores cannot be compared between different tests or subject areas, scaled scores are useful for comparing performance over time on the same test for the Stanford 9. For example, if the second grade in a school had 52% of the students score at or above the 50th percentile and 52% of the students also scored at or above the 50th percentile in third grade, a comparison of the average scaled scores may be used to determine if the students actually demonstrated growth during the year. If a group maintains the same position relative to the norm group, the average scaled score will increase, because the average scaled scores for the norm groups increase from year-to-year.

California Department of Education