Evidence Based Education
Are SOR Laws Working?
Article Summary:
This article continues on with our series on what percentage of students can reach grade level, as most typically defined by either the 30th percentile on a norm referenced assessment or the basic benchmark on the NAEP test. In previous articles we showed that 97.5% of students could achieve grade level standards if three conditions were met:
-
82% of students were reading on grade level, based on core instruction alone
-
Struggling readers received 80 hours or more of tier 3 instruction
-
Tier 3 instruction included systematic phonics instruction.
(https://www.pedagogynongrata.com/tier-3-instruction)
The biggest barrier here would seem to be that in most US schools only 63% of students are currently reading on grade level. This means that less than 95% of students would likely reach the basic benchmark, if struggling readers were provided sufficient tier 3 instruction. This leads to the question, can we get 82% of students reading based on just core instruction alone? In our last article (https://www.pedagogynongrata.com/core-instruction), we showed that with phonics instruction starting in kindergarten, this was at least possible, if not necessarily probable.
For this article, we wanted to test whether or not state policies based on the science of reading were effectively improving the number of students reading at grade level. We matched 12 states that adopted science of reading laws prior to 2020 with states that did not and then compared their 2022 NAEP results. On average states that passed SOR laws showed a smaller covid slump and outperformed states that did not. However, these differences were very small and not statistically significant. That said, some of the SOR laws appeared to be more effective than others.
Introduction:
There is substantial experimental evidence that a systematic phonics approach is superior to an unsystematic phonics approach or a balanced literacy approach. In 2001 the NRP conducted the first meta-analysis of this topic and found a mean effect size of .44, for systematic phonics. Both Camilli 2003 and Stuebing 2008 re-analyzed the NRP studies again, to measure the fixed effect of systematic phonics vs unsystematic phonics. In both studies, the systematic phonics approach outperformed, if only by a smaller effect size. Since the NRP, 14 more meta-analyses have been conducted on phonics instruction (to the best of my knowledge) and on average showed a mean effect size of .43, if we exclude for outlier studies. If we include outliers, that effect size jumps to .54 (Hansford, 2022).
In 2022, I conducted a meta-analysis, for this blog, with Joshua King of phonics vs balanced literacy programs also showed that phonics based programs showed roughly double the learning impact as did balanced literacy based programs, this meta-analysis has since been updated and submitted for peer review. Similarly, earlier this month, I re-analyzed the 2009 Torgesen and 2002 Mathes literature reviews of the impact of tier 3 instruction. In this re-analysis, I consistently found that a systematic phonics approach outperformed an approach that focused on cueing/constructivist pedagogies. To date there are no peer-reviewed meta-analyses on the topic of balanced literacy. To the best of my knowledge, the only meta-analysis specifically, conducted on balanced literacy, was mine, and it showed a low impact.
Over the last several years, there has been a growing recognition of the importance for systematic phonics instruction and consequently, many states have adopted so-called “science of reading laws”. These state policies typically mandate a series of pedagogical shifts, mainly replacing balanced literacy with structured literacy. However, critics of these laws have often tried to claim a lack of evidence for mandating systematic phonics instruction on a state level. Oftentimes, these critics will point to a lack of research demonstrating that systematic phonics at a state level improves the number of students at grade level.
I am very critical of the idea that a lack of state level evidence is problematic. While on paper, wanting to see state level outcomes, seems meaningful, as we want to know that public policy works. However, I am skeptical of the experimental value. While a small scale study might have far fewer students, we also have a much better understanding of what happened. In a state level analysis, we really have no meaningful idea of what actually happened in the classroom. Afterall, curriculum books often just sit dusty on school shelves. It also needs to be pointed out that the number of students on grade level is not actually a particularly scientific metric, as it uses an arbitrary cut off mark that is inconsistent across populations, tests, and contexts. Using the number of students at benchmark does not account for students who are far above grade level, nor does it account for if students are only slightly above grade level. Conversely, an effect size is primarily used to measure the magnitude of effect that a treatment provides to one group of students over another. For example, we could have 100% of students in one class, on grade level and 0% in another, but the mean difference could still be only 1%. The outcome would look significant; however, in reality, this difference could be entirely statistically insignificant.
That said, it is still necessary to examine the impact of state SOR laws on the percentage of students at grade level, for three reasons. Firstly, governments are spending millions of dollars implementing these laws and people want to know if it's working. Secondly, there is value in identifying how effectively scientifically validated pedagogy can be implemented on a state level. Lastly, while effect sizes and meta-analyses are better tools for identifying the scientifically demonstrable efficacy of pedagogies, they are far less well understood by the public. There is therefore a value in identifying the impact of state education “SOR” policies on the percentage of students at grade level.
With these reasons in mind, we decided to analyze changes in the NAEP scores, for states who had passed “SOR” laws. We decided to focus on the following research questions:
-
Are SOR laws working?
-
What is the impact of SOR laws on the number of students at grade level?
-
What types of policy factors have the greatest impact?
Methods:
In July of 2022, Sarah Schwartz published a list of states that have passed SOR policy laws. She also recorded how comprehensive these laws were, based on a scale of 1-6. To get the highest score of 6 states had to include legislation that targeted professional development, teacher preparation, teacher licensing, assessment, materials, and instruction. I decided to use this data, combined with the NAEP data-base of testing scores, to try and measure if these laws were actually effectively improving student achievement. 12 of these states had passed SOR laws prior to 2020. I used these states to form a treatment group. I then selected 11 states that had equivalent 2019 NAEP reading scores for grade 4, one control state was used twice, as there were no other states available with equivalent test scores. These combined scores were used to represent pre-intervention reading scores. I then compared the 24 states, based on the 2022 testing results, to measure the magnitude of effect for these SOR laws. Cohen’s d effects sizes were calculated by comparing the difference in raw scores and dividing them by the standardized deviation between states (opposed to students).
As can be seen from the above results the matched comparison states started out slightly ahead, but that this difference was not statistically significant.
Before, I conducted the experiment, I had three predictions:
-
SOR states would outperform, by a small but significant margin.
-
The most important factor in outperformance would be mandating the 5 pillars of literacy and increased tier 3 instruction.
-
States with shorter bills would outperform states with longer bills.
I thought states with shorter bills would outperform, because I believed implementation would be easier. I hypothesized that longer bills would lead towards too many priorities, for any single initiative to be implemented with fidelity. All three of these hypotheses were proven at least partially wrong.
I also ran a sub analysis, with Pearson correlation tests, to see if more comprehensive policy documents produced greater differences, if longer bills produced worse results, and to test if having a SOR for a longer period of time produced a greater difference. The results can be seen in the below chart.
There was a small correlation for the years of implementation, suggesting that the impacts of implementing a SOR state policy may take many years to fully develop. However, the P-value was very high at .15, suggesting that these results might be statistically insignificant. The effect size for degree of comprehensiveness was lower than the p-value found, suggesting that there is no meaningful relationship between how comprehensive the state law is and the level of impact found. There was a statistically insignificant, but positive correlation between bill length and NAEP scores, suggesting longer bills on average had higher outcomes.
I am sure that these results are very disappointing to many. These laws reflect years of lobbying efforts and very large amounts of public spending. Moreover, they also reflect popular ideological shifts in regards to reading instruction. I worry that many will read this article and see it as an attack on the “science of reading” movement, or as evidence that science does not work. However, I think the problem actually comes back to the legislation and the implementation. Firstly, just because these policies are written down in a slate legislature, does not mean they are actually being implemented. Indeed, I am sure that much of education policy gets written into a binder and left on a district shelf. However, more problematic is these actual policies themselves. They’re often quite long, unfocused, and contain both well evidence-based recommendations and completely theoretical based recommendations. I reached out to Dr. Timothy Shanhan for his thoughts on the matter, and he responded “Typically, these laws specify little or no funding or detail as to implementation or enforcement. Busy state departments of education send out an announcement of the requirement and perhaps ask districts to carry out some minor reporting (what percentage of your schools are implementing, what tests you are using) and that’s the end of it.The assumption is that if schools have information about children’s reading status this will lead to appropriate and meaningful instructional responses by well-trained and well-supervised teachers using sound materials, and so on. Those assumptions are often not fulfilled. Over time, even this meager administrative interest wanes and the legislation turns out to be no more than a set of worthless reporting mechanisms.”
While the overall execution might be less than ideal in some cases, I still wondered if some types of laws were more effective than others. To better analyze each state's results, I analyzed each state's laws according to two sets of moderator variables. The first set of variables was pedagogical and included 4 factors: the pillars of literacy, systematic phonics, tier 3 intervention, and 3 cueing. The second set of variables was based on resources and included: coaching, materials, and assessment. The results of these analyses can be seen below.
Table 1: 2019 Pre-Test Results
Table 4: State Pedagogical Variables
If we look at the state that did the best in comparison to its control state, it would be Arkansas. This is interesting as Arkansas was (to the best of my knowledge) the only state included that fully banned 3-cueing. The second best performing state was Michigan, which included systematic phonics, the 5 pillars of literacy instruction and increased tier 3 instruction in their legislation. Both of these states had passed their legislation 4 or more years ago. This might be relevant, as it might take time to see the effects of such legislation. The two worst performing states were Nevada and New Mexico. Neither state included the 5 pillars of literacy instruction in their bill. To expand upon this analysis, I also conducted both a moderator analysis and a regression analysis, on both sets of variables.
Table 6: Moderator Analysis 2
Table 8: Regression Analysis 2
The pedagogical results seem to indicate that making sure the 5 pillars of literacy instruction were included, was the most important factor in improving literacy results on the state level and that mandating systematic phonics was the second most. However, the impact of increased tier 3 instruction and banning 3-cueing could not be properly analyzed, as only one state included in this study banned 3-cueing and only one state did not increase the amount of tier 3 instruction. To better test the hypothesis that the 5 pillars of literacy and systematic phonics were the most important components, I re-calculated the post test effect size but excluded all states that did not mandate both systematic phonics and the 5 pillars of literacy instruction. I also conducted a second effect size calculation, to measure the magnitude of effect for coaching. For this analysis I compared the states that only included coaching funding and did not include funding for materials or assessments and then compared them to the 12 control states. The results of both calculations can be seen below.
Table 8: The Impact of Systematic Phonics and The 5 Pillars of Literacy
Table 9: The Impact of Literacy Coaches
As can be seen in the above results, the difference between states that both mandated systematic phonics instruction and the 5 pillars of literacy instruction, far outperformed the states that did not.
Conversely, the results for spending and resources seemed to strongly suggest that the most important factor was coaching, not materials, or assessment. Perhaps, this was a reflection of the low barrier for companies to call their products research based.
Discussion:
There is strong experimental evidence that it is possible for 95% plus of students to achieve grade level expectations, as defined by the NAEP basic benchmark, provided that students are provided with early phonics instruction and that struggling readers are provided with 80 hours or more of systematic phonics instruction, prior to grade 3. That said, just because something is possible, within the confines of a scientific experiment, does not mean that it can be accomplished on a state level. Our investigation into the impact of state SOR laws, showed mixed results. However, when states mandated both of the 5 pillars of literacy instruction, literacy coaches, and systematic phonics, results were far more substantial. We believe that SOR policies should focus on:
-Early phonics instruction, starting in kindergarten
--The inclusion of the 5 pillars of literacy instruction within core settings
-A minimum of 80 hours plus of tier 3 systematic phonics instruction for struggling readers
-On going coaching to train teachers
Limitations:
This study was not peer-reviewed. Standard deviation calculations were based on state results, not student results. State level analysis like this cannot account for fidelity. Therefore instruction in schools might not match policy mandates. One state in the control group did not have a SOR law, but did include a list of evidence-based approved curriculums (Arizona). However, there were no other states that could be easily matched to the treatment state (South Carolina). This series of articles will be combined, refined, and submitted for peer review. In the current analysis we only examined the question, what percentage of students could reach grade level, provided they were given systematic phonics instruction, not what percentage of students would likely reach grade level. For the full version of this article, we intend to also answer the question, what percentage of students will likely reach grade level, provided they receive evidence-based reading instruction.
Written by Nathaniel Hansford
Contributed to by Dr. Rachel Schechter of LXD Research
Last Edited 2023-05-22
References:
-Camilli, G., Vargas, S., & Yurecko, M. (2003). Teaching children to read:
The fragile link between science and federal education policy. Education Policy Analysis Archive, 11(15). Retrieved March 20, 2007, from http://epaa.asu.edu/epaa/v11n15/
-Stuebing, K. K., Barth, A. E., Cirino, P. T., Francis, D. J., & Fletcher, J. M. (2008). A response to recent reanalyses of the National Reading Panel report: Effects of systematic phonics instruction are practically significant. Journal of Educational Psychology, 100(1), 123–134. https://doi.org/10.1037/0022-0663.100.1.123
Hattie, J. (2022). Phonics. Visible Learning Metax. Retrieved from <https://www.visiblelearningmetax.com/influences/view/phonics_instruction>.
-NRP. (2001). Teaching Children to Read: An Evidence Based Assessment of the Scientific Literature on Reading Instruction. United States Government.https://www.nichd.nih.gov/sites/default/files/publications/pubs/nrp/Documents/report.pdf
-Hansford, N & King, J. (2022). A Meta-Analysis and Literature Review of Language Programs. Teaching by Science. https://www.teachingbyscience.com/a-meta-analysis-of-language-programs
-Hansford, N & Schechter, R. (2023). How Long Does it Take to Get 95% of Students Reading at Grade Level? Teaching by Science. https://www.pedagogynongrata.com/tier-3-instruction
-Hansford, N & Schechter, R. (2023). What Percentage of Students Can Succeed With Just Core Instruction? Teaching by Science. https://www.pedagogynongrata.com/core-instruction
--Mathes, P. G., & Denton, C. A. (2002). The prevention and identification of reading disability. Seminars in Pediatric Neurology, 9(3), 185–191. https://doi.org/10.1053/spen.2002.35498
-Torgesen, J. (2009). Preventing Early Reading Failure and its Devastating Downward Spiral. National Centre for Learning Disabilities. http://www.bharathiyartamilpalli.org/training/images/downwardspiral.pdf?fbclid=IwAR2hBxmyoiWNoAQMfCDsO2aeJ1Zsh4MQDm-43VdZ5LZ_J9LkC3BV0o4cwDw