Evidence Based Education
Is Structured Literacy More Effective Than Balanced Literacy?
In 2020, I co-authored a meta-analysis with my co-founder Joshua King comparing balanced literacy studies to structured literacy studies. It was our first attempt at any such project, and it was very rudimentary. We did not weight effect sizes, take steps to deal with outliers, and none of the variables was dual-coded. Over the years, I have redone this meta-analysis, with the help of other scholars, multiple times. Indeed, I have essentially completely redone the paper every time an important or relevant new study has come out or I have learned a new meta-analysis technique. In total, I believe I have completely redone the meta-analysis now seven times (although not all seven drafts have been publicized, as on more than one occasion, I redid the analysis before publicizing the current draft). At this point, I am unsure if this research endeavor has become a labor of love or a never-ending treadmill of academic punishment.
In 2022, we submitted the paper to peer review, and it was my first attempt at peer-reviewing a meta-analysis. Since then, the paper has gone through two full rounds of peer review, the first with The Review of Educational Research and the second with Reading Research Quarterly. Both journals gave similar feedback and criticisms. The primary criticisms were:
-
Too little attention in the literature review to defining the terms "Balanced Literacy" and "Structured Literacy."
-
Insufficient methodological reporting.
-
APA formatting errors.
-
A non-systematic approach to picking which programs were included in the meta-analysis.
During this peer-review process, the manuscript and meta-analysis have been completely redone four times. However, I must admit that the second-to-last revision was not submitted for peer review, as I got busy with non-academic life and time escaped me. Since then, we have made several attempts to improve the analysis and manuscript in order to resubmit to peer review, including:
-
Weighting effect sizes with both a fixed and random-effects model.
-
Adopting more robust outlier protocols.
-
Dual coding 24% of study variables.
-
Better defining what is meant by "Structured Literacy" and "Balanced Literacy."
-
Increasing the methodological reporting detail.
-
Removing APA formatting errors.
-
Addressing the criticisms of past phonics meta-analyses.
That said, there are two limitations to the paper, which leave me unsure as to whether we will ever pass peer review. Firstly, the method by which I selected programs was not systematic. I chose programs based on my perception of popularity. There was likely a better way to select programs. However, at the time of starting this research, I was not considering how to systematically select programs. That said, there are an incredibly large number of literacy programs in existence, and systematically evaluating all of them would be incredibly difficult. To put this into perspective, there are 78 studies included within this meta-analysis, and there were 36 studies within the NRP (2000) phonics meta-analysis.
Secondly, we chose to include non-peer-reviewed studies. This decision was made for two reasons. First, many program studies are never peer-reviewed. Second, the peer-review process has a positivity bias, as pointed out by Nair (2019). Studies that are successfully published in peer-reviewed journals are more likely to have significant effects. While seemingly counterintuitive, this may happen both because authors are more willing to peer-review studies with positive results and because journals think significant results are more compelling to read about.
While including non-peer-reviewed studies may be justifiable, it comes with an additional challenge. Many program studies that were not peer-reviewed included effect sizes calculated by the authors but not the underlying statistical data used to make these calculations. This makes using these effect sizes in a weighted mean not possible. To deal with this challenge, I calculated both an unweighted mean that included all studies and a weighted mean that included only the studies with full statistical reporting. By doing this, I believe that we were able to represent the findings of all studies while also evaluating the weighted mean of the more rigorous peer-reviewed studies. Interestingly, there was no statistically significant difference between the unweighted mean (d = .47) and the weighted fixed effect mean (d = .44), suggesting that the peer-review status of the included papers had no impact on the results of this meta-analysis.
Ultimately, this paper found the same thing as the NRP (2000) did 24 years ago: systematic phonics programs outperform whole language programs. Indeed, we even found the identical fixed effect size for systematic phonics instruction (d = .44), a fact that gives me some solace in our work. That said, while most of our findings simply replicate the findings of past research, I do believe that we have three novel findings:
-
Structured Literacy and Balanced Literacy, while relatively new terms, really represent rebranding efforts for past terms. In attempting to better define these terms for the manuscript, I came to realize that there was no practical difference between the terms "whole language" and "balanced literacy" or between the terms "structured literacy" and "systematic phonics." Indeed, many of the same programs described by the terms "systematic phonics" and "whole language" in the NRP (2000) meta-analysis were the very same programs I was analyzing under the terms "structured literacy" and "balanced literacy." I cannot help but feel sheepish about this finding, as I had previously interpreted the research to suggest that Balanced Literacy programs were better than whole language programs but worse than systematic phonics programs. However, I now believe that this was a misunderstanding of the terms and thus the research.
-
We found strong evidence that phonics instruction was helpful for older readers if they were struggling. The NRP (2000) findings suggested that phonics was only helpful for students in Grade 1 or younger.
-
We found preliminary evidence that systematic phonics instruction was even more effective than balanced literacy/whole language instruction over the long term. Indeed, there is even some evidence that balanced literacy instruction could result in negative learning outcomes when compared to a control group over the long term.
Previously, upon updating this research, I updated the blog summarizing our findings on this website. However, given the slow pace at which I have moved in updating this research for peer review, I feel compelled to share a more complete version of this research. With this in mind, I have published a preprint of our paper and an open-access database of our findings to the OSF preprint server, which can be found here: LINK
I also want to thank all of the people who have helped contribute to this research and improve it over the years, especially Dr. Scott Dueker, Dr. Kathryn Garforth, and my co-founder Joshua King.
References
Nair, A. S. (2019). Publication bias: Importance of studies with negative results! Indian Journal of Anaesthesia, 63(6), 505–507. https://doi.org/10.4103/ija.IJA_142_19
National Reading Panel. (2000). Teaching children to read: An evidence-based assessment of the scientific literature on reading instruction. United States Government. https://www.nichd.nih.gov/sites/default/files/publications/pubs/nrp/Documents/report.pdf