top of page
Should Phonics be Systematic?

Article Summary: There exists strong scientific evidence that phonics instruction should be systematic and that a good way to accomplish this is by using a scope and sequence. However, there is minimal evidence as to what that sequence should be or for what the rate of instruction that sequence should be taught. The value of the sequence might be not the sequence itself, but rather that a sequence ensures all letter sound correspondences are systematically taught. 

One of the more debated conclusions of the NRP meta-analysis was that systematic phonics programs were better than non-systematic, based on an effect size of .44. In the report systematic phonics was defined as “The distinctions between systematic phonics approaches are not absolute, however, and some phonics programs combine two or more of these types of instruction. In addition, these approaches differ with respect to the extent that controlled vocabulary (decodable text) is used for practicing reading connected text. Although differences exist, the hallmark of systematic phonics programs is that they delineate a planned, sequential set of phonic elements and they teach these elements explicitly and systematically. The goal in all phonics programs is to enable learners to acquire sufficient knowledge and use of the alphabetic code so that they can make normal progress in learning to read and comprehend written language.” (NRP, 105). 


Part of why this finding was controversial might be because some seem to define the difference between structured literacy and balanced literacy as being systematic vs incidental in its phonics instruction. Indeed, the NRP included the following statement that seems to be hinting at this interpretation: “In whole-language programs, the emphasis is upon meaning-based reading and writing activities. Phonics instruction is integrated into these activities but taught incidentally as teachers decide it is needed. Basal programs consist of a teacher’s manual and a complete set of books and materials that guide the teaching of beginning reading. Some basal programs focus on whole-word or meaning-based activities with limited attention to letter-sound constituents of words and little or no instruction in how to blend letters to pronounce words.” (NRP, 106). 


One of the main criticisms of this effect size of .44 is that it was not based on comparing systematic phonics studies vs unsystematic phonics. Instead, it was based on 38 studies that the NRP coded as systematic phonics, which had a wide range of different control groups.  In other words, it was a “random effect” and not a “fixed effect.” This does not mean the .44 effect size is invalid, it just means that it may not be as precise as possible. More modern meta-analyses often use what is called a multilevel regression model to better account for differences in individual study designs and their impact on the overall effect size, especially when looking at random effect results. 


In 2003, Camili et al, conducted a multi-level regression on the studies identified by the NRP meta-analysis. They recoded studies to have either systematic phonics in the treatment group or some phonics in the treatment group. They also recoded studies to have either no phonics in the control group or some phonics in the control group. Camili et al found an estimated effect size of .51 for systematic phonics compared to no phonics and an estimated effect size of .24 for some phonics compared to no phonics. Camili et al pointed out that the difference between these two effects is .27 and that according to Cohen’s guide an effect size of .27 is small. They, therefore, suggested that the claim that phonics instruction should be systematic is based on weak evidence. 


In 2008 Steubing et al conducted a re-analysis of the Camili study. As part of their study, they used statistical modeling based on the Camili study to estimate an effect size for systematic phonics vs some phonics of .31. They also pointed out that it would be a misinterpretation of Cohen’s research to suggest that an effect size of .31 or .27 would suggest that the difference between systematic and unsystematic phonics is not meaningful. Personally, I agree with this interpretation. As this effect is a moderator effect and not an overall effect, it needs to be interpreted differently. Also, it should be pointed out that the effect size for systematic phonics was more than double the effect size for unsystematic phonics. I would argue this strongly supports the idea that phonics instruction should be systematic. 

In the 2008 Steubing et al study, they. also conducted a multi-level regression of the studies identified by the NRP and Camli et al, to further test how different levels of phonics and language instruction impacted the resulting effect size. They also coded for one on one instruction vs classroom instruction, as well as for the number of language activities that accompanied the other instruction. Steubing et al found that the highest effect sizes were found when the treatment group received 1-1 instruction, systematic phonics, and multiple forms of language instruction, and the control group received unsystematic language instruction (d=1.33). Comparatively, they found the lowest effect sizes when the treatment group received unsystematic phonics instruction, in a classroom setting, with limited other language activities, and the control group received systematic language instruction (d=-.12). 


While there are scholars who would like to reject the claim that systematic phonics instruction is superior to incidental phonics instruction, I interpret the NRP, Steubing, and Camili meta-analyses to demonstrate that there is strong evidence for the use of systematic vs incidental phonics instruction. Systematic phonics instruction is often interpreted to mean having a scope and sequence, as stated in the NRP meta-analysis. However, I wonder if the benefits are from the sequence or if they are from making sure all letter-sound correspondences (GPCs) are covered. Personally, I would hypothesize that it is the latter. That is not to say that having a logical sequence cannot be helpful to student learning; however, I wonder if the biggest value to that sequence is just ensuring nothing is missed. I think this is especially relevant to some of the constructivist programs that do not include a phonics scope and sequence or more incidental phonics instruction. These programs were inspired by the whole language movement and I think many would be concerned that these programs might end up neglecting some GPCs. 


While the previously discussed papers help to show the need for a systematic approach, they do not provide any evidence for a particular scope and sequence, or rate. As Timothy Shanahan, as previously pointed out, we have minimal research on establishing any particular scope and sequence to be best practice. One popular sequence for teachers has been to teach one GPC per week. This works out nicely for teachers, as there are 44 commonly recognized sounds and most school years are roughly 44 weeks long. Personally, I really dislike this approach, as it seems incredibly slow. Other programs like the Jolly Phonics program teach 1 GPC per lesson, and others still take a middle approach teaching 2-3 GPCs per week. 


In my own personal practice, I prefer a much faster scope and sequence that involves teaching multiple GPCs per lesson. I typically introduce 6 GPCs per lesson and then have students practice these GPCs, across multiple lessons until they reach mastery. I like teaching phonics this way, because I think teaching multiple GPCs in a lesson is a more engaging methodology and because I hope that it can accelerate the process. In general, I think a fast scope and sequence is preferable because it should in theory allow the decoding of more words more quickly. That said, this is my personal opinion and not based on any experimental research. To the best of my knowledge, there is no meta-analysis of this topic or systematic review that points to a particular rate of instruction. However, this week a friend of mine sent me a fascinating 2020 paper by Patricia Vadasy and Elizabeth Sanders that included 2 very high-quality RCT studies on the rate of instruction for teaching phonics and for whether it is best to teach one or multiple GPCs at a time. 


Study 1 Description: 

In the first study, 65 students were randomly assigned to one of two equivalent treatment conditions comparing a fast vs slow scope and sequence. In the fast group, students were taught 15 sound letter combinations, over 5 weeks. In the slow group, students were taught 10 sound letter combinations, over 5 weeks. 11 Tutors delivered the program, one on one, while fidelity was strictly observed. Each lesson was scripted and the teaching procedure was the same in both groups, for everything, except the scope and sequence. This is an extremely high-quality study, because not only is it an RCT, but the authors were very careful to control all factors, but one, so as to measure for a fixed effect.

That said, I do think this study has a few limitations. Firstly, the study did not state what assessment was used for post-test results. This leads me to believe the assessment was researcher designed. This is problematic because researcher-designed assessments usually inflate results. My second concern is that the treatment group received instruction on more grapheme phoneme correlations (GPCs or letter sound relationships) than the control group. This isn’t necessarily a problem, as they stated their research question was to test if teaching students GPCs faster caused lowered retention, but it makes me wonder if the difference in results is because of the pace, or the number of GPCs taught. Thirdly, this study sample size was not particularly large. Lastly, the difference between 10 and 15 GPCs does not seem particularly large to me, so I wonder if the difference is in practice too small to measure the application results for. 


The original authors concluded that these results were strong evidence for a faster scope and sequence. We used a Hedge’s g effect size calculation to compare the post-test results of the treatment group and the control group. Each author independently calculated the below effect sizes, to ensure reliability.

The mean effect size was significant. However, the majority of the difference was driven by the letter ID outcome. Most reading and writing outcomes were actually negative (albeit statistically insignificant). That said the authors also did a second post-test that was timed, to test for automaticity. We also calculated effect sizes for these results, as can be seen below. 

Again the mean result was statistically significant, albeit small. However, if we look at the individual results, this is being driven completely by the letter-sound writing outcomes. That said, the treatment group outperformed the control group on all measures, for the timed test. In general, we think this study provides weak but positive evidence that a faster scope and sequence is better than a slow scope and sequence. However, much more research is needed, both to replicate these results and to see what is the ideal scope and sequence.  


Study 2: 

In the second study, 61 students were randomly assigned to two equivalent groups. Both groups received instruction on 15 GPCs over 5 weeks; however, different correspondences were used in each group. One group received instruction on one letter per lesson, the other group received instruction on multiple GPCs per lesson. All lessons were scripted and taught by one on one tutors. Lessons were otherwise the same in both groups.14 different tutors were used and fidelity was closely monitored. Again this was a very high-quality study, as it was an RCT measuring a fixed effect. 


However, I think there are a few limitations. Firstly, the study used different GPCs in each study, which I think is an unnecessary confounding factor (although I am sure the authors had their reasons). The researchers used their own assessments and the study had a relatively small sample. 

The original authors interpreted the study results to show strong evidence that teaching multiple GPCs in a lesson is better than teaching one GPC in a lesson. Again, we calculated Hedge’s g effect sizes, based on the post-test scores. The original authors used three different tests in this study, one containing all the GPCs taught to each group, and one containing the GPCs taught to teach the individual groups. We took a mean of these results, rather than separating out the results for all three tests. Each author independently conducted the calculations to ensure reliability. Our results can be seen below. 

These results suggested a small but statistically significant benefit overall. However, it should be noted that the actual reading scores were lower in the group that received instruction on multiple GPCs per lesson. In general, we would conclude that this study showed weak but positive evidence for teaching multiple GPCs per lesson, instead of one. 



I really wanted the above two studies to demonstrate strong evidence for my hypotheses. However, while the results were mostly positive, some of the findings were actually negative, and the overall magnitude of the effect observed was low. In general, I think for results to be truly meaningful they need to be systematically replicated. I, therefore, do not think the above studies prove my hypotheses to any meaningful extent. Given the results of this study, and the meta-analyses previously discussed, I think Shanahan was correct in 2009 when he summarized the research to say that the research shows strong evidence for the inclusion of a scope and sequence, but not for the use of any particular scope and sequence. That said, I would personally hypothesize that the main value of the sequence is not the sequence itself, but the fact that it systematically ensures all GPCs are covered. 


Written by Nathaniel Hansford and Sky McGlynn

Last Edited 12/4/2022



-Camilli, G., Vargas, S., & Yurecko, M. (2003). Teaching children to read:

The fragile link between science and federal education policy. Education

Policy Analysis Archive, 11(15). Retrieved March 20, 2007, from http://


-Stuebing, K. K., Barth, A. E., Cirino, P. T., Francis, D. J., & Fletcher, J. M. (2008). A response to recent reanalyses of the National Reading Panel report: Effects of systematic phonics instruction are practically significant. Journal of Educational Psychology, 100(1), 123–134.


T, Shanahan. (2009) On Sequence of Instruction. Shanahan on Literacy. Retrieved from <>. 


-NRP. (2001). Teaching Children to Read: An Evidence Based Assessment of the Scientific Literature on Reading Instruction. United States Government. Retrieved from <>.

bottom of page