Evidence Based Education
A Review of Reading Recovery
Reading Recovery is an intensive 1 on 1 reading intervention aimed at supporting students in need of tier 2 support. Reading Recovery has been controversial because of its Balanced Literacy philosophy of instruction. In 2004, D'Agostino conducted a meta-analysis on Reading Recovery. The results can be seen below.
In 2004, D’Agostino et al, conducted a meta-analysis of 36 Reading Recovery Studies. The purpose of this meta-analysis was to find the effect of Reading Recovery, in studies with control groups, and to summarize the extent of the research evidence for Reading Recovery. This meta-analysis of 36 studies found a mean effect size of .59. For a reading meta-analysis this is quite high; indeed, the NRP meta-analysis only showed a mean overall effect for systematic phonics instruction of .44. However, there are some issues of concern with this effect size, as the inclusion criteria were far more strict for the NRP(2001) effect size than for the D’Agostino one.
One has to be cautious when looking at the reported effect sizes in D'Agostina et al. (2004) because of the studies included in the sample. While it is important to include as many studies as possible in a meta-analysis, not all research is conducted to the same standards. Many of the studies included are not peer-reviewed. Some of the studies that were peer-reviewed were published in journals specifically aimed at promoting the efficacy of “balanced literacy” programs. Indeed, to the best of our knowledge, only 3 out of 36 studies were published in proper peer-reviewed journals. D'Agostina et al. (2004) also included effect sizes not based on comparing the difference between groups on the post-test, which can dramatically increase the effect size found, as well as the inaccuracy. Moreover, many of the studies used non-standardized tests, which also tend to inflate results. For example, many of the Reading Recovery studies are based on Marie Clay’s Observational assessment, which includes scoring for things like concepts about print. By design, observational assessments are qualitative by nature and allow for a lot of personal judgment. This makes the information collected or data more subjective and difficult to replicate.
When only looking at studies that used a proper experimental design and used standardized tests, there is a mean effect size of .32 (25 studies total). Even for this effect size, there might be some inflating factors, as most of these studies had standardized tests that had since been discontinued. If only the experimental studies with non-discontinued standardized tests are examined, there was a mean effect size found of -.34. Comparatively, there were 6 studies that used effect sizes, based on observational assessments, showing a weighted mean effect size of 1.01. With these factors in mind, we would suggest that the most objective effect size to be referenced from this paper is not the mean effect of .59, but rather the effect of .32 from experimental studies with standardized assessments. This is an important distinction because an effect size of .32 is considered evidence of a small impact.
The What Works Clearinghouse(WWC) conducted its own analysis based on 79 studies submitted by Reading Recovery. They excluded 76 of the 79 studies, out of quality concerns. Pinnell 1989, Pinnell 1994, and Schwartz 2005. The WWC calculated a mean effect size of .36 (small but statistically significant).
In order to better examine the impact of Reading Recovery and its longitudinal outcomes, Teaching by Science conducted a meta-analysis of the topic.
Methodology:
In order to assess the efficacy of the Reading Recovery program, the authors of this paper conducted an analysis of available Reading Recovery studies, specifically focusing on contrasting the findings of longitudinal and non longitudinal studies. Searches were conducted on the Education source data-base, the ERIC database, the Reading Recovery website, Google Scholar, and the WWC website. In total 112 studies were located. 99 Studies were excluded. All studies with control groups and sufficient reporting to find effect sizes were accepted. Studies were only excluded, for lack of control groups or for lack of reporting and for no reasons related to quality. Four Studies were located on the Education Source database, two additional studies were found on the WWC website, and six additional studies were located on the Reading Recovery website.
Non-Longitudinal Study Summaries:
Study: Deford 1988
Design: RCT
Sample Size: 79
Grade: 1
Control Group: Alternative Treatment
Fidelity Tracked: Yes
Standardized Assessment: Yes
Duration: 20 Weeks
Mean Effect Size Found: .59
Effect Size Calculated by: WWC
Study: Iverson 1993
Design: RCT
Sample Size: 64
Grade: 1
Control Group: Alternative Treatment
Fidelity Tracked: No
Standardized Assessment: No
Duration: 20 Weeks
Mean Effect Size Found: .2.59 (2 effect sizes of 4.03 and 11.42 were excluded from this study as obvious outliers).
Effect Size Calculated by: Authors
Study: Iverson 1993 (Second Control Group)
Design: RCT
Sample Size: 64
Grade: 1
Control Group: Additional Phonics
Fidelity Tracked: No
Standardized Assessment: No
Duration: 20 Weeks
Mean Effect Size Found: .12
Effect Size Calculated by: Authors
Study: Pinnell 1994
Design: Quasi
Sample Size: 71
Grade: 1
Control Group: No Treatment
Fidelity Tracked: No
Standardized Assessment: Yes
Duration: 75 Hours
Mean Effect Size Found: .57
Effect Size Calculated by: Authors
Study: Schwartz 2005
Design: RCT
Sample Size: 94
Grade: 1
Control Group: No treatment
Fidelity Tracked: Yes
Standardized Assessment: Yes
Duration: 20 Weeks
Mean Effect Size Found: 1.00
Effect Size Calculated by: WWC
Study: Burroughs 2008
Design: RCT
Sample Size: 292
Grade: 1
Control Group: No treatment
Fidelity Tracked: No
Standardized Assessment: No
Duration: 40 hours
Mean Effect Size Found: .34
Effect Size Calculated by: Authors
Study: D'Agostino 2017
Design: RCT
Sample Size: 592
Grade: 1
Control Group: No treatment
Fidelity Tracked: No
Standardized Assessment: No
Duration: 1 year
Mean Effect Size Found: .51
Effect Size Calculated by: D'Agostino
Longitudinal Study Summaries:
Study: Askew 1994
Design: RCT
Sample Size: 100
Grade: 1
Control Group: No treatment
Fidelity Tracked: No
Standardized Assessment: No
Duration: 4 years
Mean Effect Size Found: .-.31
Effect Size Calculated by: Authors
Study: Askew 2002
Design: RCT
Sample Size: 107
Grade: 1
Control Group: No treatment
Fidelity Tracked: No
Standardized Assessment: No
Duration: 4 years
Mean Effect Size Found: .-.42
Effect Size Calculated by: Authors
Study: Schmitt 2004
Design: RCT
Sample Size: 548
Grade: 1
Control Group: No treatment
Fidelity Tracked: No
Standardized Assessment: Yes
Duration: 4 years
Mean Effect Size Found: .-.5
Effect Size Calculated by: Authors
Study: Holliman 2013
Design: RCT
Sample Size: 241
Grade: 1
Control Group: No treatment
Fidelity Tracked: Yes
Standardized Assessment: Yes
Duration: 4 years
Mean Effect Size Found: .48
Effect Size Calculated by: Authors
Study: D’Agostino
Design: RCT
Sample Size: 592
Grade: 1
Control Group: No treatment
Fidelity Tracked: No
Standardized Assessment: no
Duration: 4 years
Mean Effect Size Found: .25
Effect Size Calculated by: D’Agostino
Study: Serinides 2018
Design: RCT
Sample Size: 6888
Grade: 1
Control Group: No treatment
Fidelity Tracked: Yes
Standardized Assessment: Yes
Duration: 4 years
Mean Effect Size Found: .50
Effect Size Calculated by: Authors
Study: Center for Research in Education and Social Policy 2022
Design: RCT
Sample Size: 70 000
Grade: 1
Control Group: No treatment
Fidelity Tracked: Yes
Standardized Assessment: Yes
Duration: 4 years
Mean Effect Size Found: -.19
Effect Size Calculated by: Center for Research in Education and Social Policy
Meta-Analysis Procedure:
We synthesized the above effect sizes into the below tables. In the first chart, a raw mean was taken. In the second, a weighted mean was taken by multiplying effect sizes by the relation of their sample size to the meta-analysis sample average. By doing this studies with larger sample sizes were given greater proportionality than studies with smaller sample sizes. All effect sizes calculated by the first author were independently cross-verified by the third author to ensure reliability.
Results Discussion:
With over 100 studies, Reading Recovery might be the most well studied reading program in the world. However, that research still has many limitations. All studies had initial treatment done in grade 1 only. There are no studies on dyslexic students. Only 2 studies offered students an alternative treatment, to properly test for the impact of the Reading Recovery treatment. None of these studies, tracked fidelity, offered a standardized assessment, and provided the control group with equivalent instructional time. That said, with these caveats in mind there is strong evidence that Reading Recovery provides a high magnitude of effect on the short term results of participants. However, there is also substantive evidence that students who receive Reading Recovery instruction do worse over the long term than students that do not. The mean non-weighted effect found (.22) was similar to the most rigorous mean effect size found by D’Agostino (.20). However, the weighted mean effect was negative, albeit statistically insignificant. It is our interpretation that these results suggest that over the long term Reading Recovery instruction is harmful to students. That said, Reading Recovery is a complex program and it is difficult to say which factor or multitude of factors is causing this harm. However, it seems plausible that this harm might be caused by the practice of three-cueing instruction.
Study Limitations: All papers listed in the D’Agostino meta-analysis were not located. The CRESP 2022 paper and Serinides 2018 paper, were the only large scale RCTs analyzed in this paper. However, both had severe study limitations. The 2022 paper has not been peer reviewed and had an attrition rate of over 70%. The 2018 experiment had 10% of students in the control group receive Reading Recovery instruction. This means that if Reading Recovery does have a longitudinal negative effect, the effect size on the 2018 paper could be inflated. It should be noted that because these papers had such substantially large sample sizes in comparison to the other papers that they were disproportionately included, within the weighted mean effect size. One study effect size was found to be an outlier, by using an IQR formula. Removing this effect size from the non-weighted average, reduced the mean by more than 50%.
The studies with the highest results tended to include the most assessment data. Whereas the studies with negative results tended to include the lease assessment data. This means that the moderator variable results were massively inflated and are therefore not likely reliable. The authors did conclude that Reading Recovery would be better if more explicit phonics instruction was added. However, the only study that looked at a fixed effect compared Reading Recovery, to Reading Recovery with additional phonics instruction. In this study, the phonics added group performed worse. However, there were other limitations to this study. It had a small sample, did not track fidelity, and did not use a standardized assessment. Moreover, one study is never indicative alone.
Final Grade: C: The program principles are not well supported by research and we found a mean effect size was found between .20 and .29
Qualitative Grade: 4/10
The program includes the following evidence-based principles: Fluency, spelling, vocabulary, and comprehension instruction.
Written by Nathaniel Hansford, Kathryn Garforth, Sky McGlynn
Last Edited 2022-12-02
References:
Askew, B. J., & Frasier, D. F. (1994). Sustained Effects of Reading Recovery Intervention on the Cognitive Behaviors of Second-Grade Children and the Perceptions of Their Teachers. Literacy, Teaching and Learning: An International Journal of Early Literacy, 4(1), 43–66. Reprinted (2002) in S. Forbes & C. Briggs (Eds.) Research in Reading Recovery, volume two (pp.1–24). Portsmouth, NH: Heinemann.
Burroughs-Lange, S. (2008). Comparison of literacy progress of young children in London Schools: A Reading Recovery Follow-Up Study.
Center for Research in Education and Social Policy. (2022). READING RECOVERY:
Long-Term Effects and Cost-Effectiveness. Under the Investing in Innovation (i3) Scale-Up. University of Deleware.
D’Agostino, J. V., Lose, M. K., & Kelly, R. H. (2017). Examining the Sustained Effects of Reading Recovery. Journal of Education for Students Placed at Risk (JESPAR), 22(2), 116–127. https://doi.org/10.1080/10824669.2017.1286591
Iversen, S., & Tunmer, W. E. (1993). Phonological processing skills and the Reading Recovery Program. Journal of Educational Psychology, 85(1), 112–126. https://doi.org/10.1037/0022-0663.85.1.112
Jerome V. D'Agostino & Sinéad J. Harmey (2016) An International Meta-Analysis of Reading Recovery, Journal of Education for Students Placed at Risk (JESPAR), 21:1, 29-46, DOI: 10.1080/10824669.2015.1112746
Pinnell, G. S., DeFord, D. E., & Lyons, C. A. (1988). Reading Recovery: Early intervention for at-risk first graders (Educational Research Service Monograph). Arlington, VA: Educational Research Service.
Pinnell, G. S., Lyons, C. A., & DeFord, D. E. (1994). Comparing instructional models for the literacy education of high-risk first graders. Reading Research Quarterly, 29, 9–39. https://doi-org.ezproxy.lakeheadu.ca/10.2307/747736
National Reading Panel. (2001). Teaching Children to Read: An Evidence Based Assessment of the Scientific Literature on Reading Instruction. United States Government. Retrieved from <https://www.nichd.nih.gov/sites/default/files/publications/pubs/nrp/Documents/report.pdf>.
Schmitt, M. C., & Gregory, A. E. (2005). The Impact of an Early Literacy Intervention: Where Are the Children Now? Literacy Teaching and Learning: An International Journal of Early Literacy, 10(1), 1-20.
Schwartz, Robert. (2005). Literacy Learning of At-Risk First-Grade Students in the Reading Recovery Early Intervention.. Journal of Educational Psychology. 97. 257-267. 10.1037/0022-0663.97.2.257.
Sirinides, P., Gray, A., & May, H. (2018). The Impacts of Reading Recovery at Scale: Results From the 4-Year i3 External Evaluation. Educational Evaluation and Policy Analysis, 40(3), 316–335. https://doi.org/10.3102/0162373718764828
Suggate, S. (2016). A Meta-Analysis of the Long-Term Effects of Phonemic Awareness, Phonics, Fluency, and Reading Comprehension Interventions. Journal of Learning Disabilities., 49(1), 77–96.