ARC | Teaching By Science

American Reading Company

ARC is a k-12 English Language Arts program that integrates literacy instruction with social studies and science. Similar to CKLA and Wit and Wisdom, the ARC program attempts to systematically improve background knowledge for students, in the hopes of improving reading comprehension. However, unlike CKLA which also includes a structured literacy component, ARC was developed as a balanced literacy program. I previously gave ARC a low grade for two reasons: it did not meet the definition of a systematic phonics program and their experimental studies did not report on all study findings.

In June of 2023, a past member of the ARC team reached out to Pedagogy Non Grata to request an appealed update to our review of the ARC program. I spent hours interviewing members of the ARC leadership team, reviewed their scope and sequence, materials, spoke to many users of the program, and re-read all of their research. One member of their team noted to me, there were some previous problems with the program that led to many people equating the program to a balanced literacy approach. However, the company has taken steps to better align their program with the “Science of Reading”. While other members of the team argued against there ever being a connection between their program and a balanced literacy approach.

The company has not yet completed any new research, so such an update would have to answer the question, is ARC still a balanced literacy program or does it now meet the definition of a structured literacy/systematic phonics program. This is important to ascertain, as there is strong evidence that systematic phonics/structured literacy programs are better for helping to teach students to read than whole language/balanced literacy programs (NRP, 2000; Camilli, 2003; Steubing, 2008; Hansford, 2023). As a caveat, I will note that it is very important to me, to get this review right.I think there is a lot of anger in the education space directed at the creators of balanced literacy programs, because parents and teachers feel misled about their efficacy. However, I think it is important that teachers remain open to programs being reformed. Otherwise, there is no incentive for curriculum producers to improve their products and instead they become incentivized to justify bad pedagogical practices. With this in mind, I have tried to really go the extra mile in executing due diligence for determining if the qualitative changes to the program go far enough.

In order to evaluate whether or not these changes are sufficient, I think it is important to utilize clear definitions. The body of scientific literature shows strong support for systematic phonics programs (structured literacy) over unsystematic phonics programs (balanced literacy/whole language). Both types of approaches teach the 5 pillars of literacy. However, balanced literacy programs teach the phonics pillar unsystematically. Within a systematic phonics approach, programs teach phonics explicitly, based on a scope and sequence, and usually with controlled texts (decodables) (NRP, 2000).

There are debates as to how balanced literacy should be defined. It is sometimes defined as an approach halfway between, systematic phonics and whole language instruction. Others, refer to a foundational paper (Pressley, 2001) on the subject, which defined Balanced Literacy an approach that taught phonemic awareness, word recognition, vocabulary, comprehension strategies, extensive writing, process writing, and focused on motivation. A more detailed summary can be found in the below table (taken from a submitted manuscript, I co-wrote).

It is important to note that in the above balanced literacy guidelines, there is no mention of teaching phonics with a scope and sequence, or decodables. I have seen many interpret this paper as to suggest that phonics should be taught within a balanced literacy context, as needed by the individual student and their needs. Therefore, the definition of balanced literacy provided by (Pressley, 2001) does not meet the (NRP, 2000) definition of a systematic phonics program, but rather of a Whole Language program. In my opinion the theoretical difference between Balanced Literacy and Whole Language is practically meaningless.

In my personal experience examining balanced literacy programs, these approaches typically emphasize cueing instruction over decoding instruction, rely on leveled texts over decodable texts, and teach phonics embedded within fluency instruction, opposed to explicitly in isolation. For example, while a systematic phonics approach might teach the /ph/ grapheme and its associated /f/ sound, followed up with related fluency practice. Whereas, a balanced literacy or whole language approach might wait for a student to struggle with a word using the <ph> grapheme and then teach it.

Therefore, in order for me to classify the ARC program as a systematic phonics program and improve their grade, I would want to see the program teach phonics based on an explicit scope and sequence, as well as include decodable texts. In an attempt to accurately answer this question, I interviewed members of the ARC team, on multiple occasions, for hours at a time, I reviewed their updated materials, and interviewed teachers who had used the program.

Teacher Feedback

I interviewed several anonymous teachers who had used the ARC program. Truthfully these teachers were very angry. They felt the program lacked sufficient decoding instruction, encouraged “three cueing”, did not use decodable texts, and encouraged word guessing. I spoke about these concerns with the ARC leadership team. They acknowledged that in the past the ARC program encouraged students to look at the picture to identify unknown words. However, they also claimed adamantly that this was no longer the case. The teachers I interviewed had used the program recently; however, they were not able to confirm for me whether or not their resources were the updated ones. ARC shared with me testimonials from educators that had used their program. Some of these educators shared that they liked that the program had a strong focus on writing, knowledge building, authentic texts and included systematic phonics instruction.

Scope and Sequence

Within their updated program there are over 60 individual graphemes explicitly taught, as well as additional blends and analytic word families. Similar to the UFLI scope and sequence, the sequence spirals so that concepts are taught multiple times. The phonics scope and sequence also comes with a phonemic awareness sequence that explicitly teaches multiple phonemic awareness concepts. I was also pleased to see that their scope and sequence for phonemic awareness focuses mostly on segmenting, blending, and isolating. This factor is important to note, as many other programs focus on onset rhyme, manipulation, and deletion, which have less research to support their efficacy.

Texts Selection

In regards to text selection the ARC program now makes use of decodable texts, leveled texts, and predictive texts. While, using a combination of texts might be an unpopular choice, (Shanahan, 2019) [lead author for the NRP report] points out that the research on this topic is limited. He and many other scholars have suggested that using a combination of text types may present the most benefits for early readers. However, he also notes that highly predictable texts encourage guessing and may be less beneficial for students. In my interview with members of the ARC leadership team, they indicated that some predictive texts were used in the ARC program, not for the goal of increasing word recognition, but for increasing student knowledge of background knowledge vocabulary.

That said, their decodable texts are not likely what many mean when they use the term decodable. Typically the term decodable text, refers to a book, which is targeted towards a specific lesson or weekly unit, within a phonics scope and sequence. For example, in the Pedagogy Non Grata reading program, the first week's lessons target the letters “s,a,t,p,i,n” and the corresponding decodables only use these letters.That said, while this is colloquially what is meant by the term decodable, I am not yet aware of any scientific research showing this is necessary for teaching students how to read. Moreover, as I have seen Dr. Nell Duke point out on social media, any word is decodable, if a student can decode it. With the ARC decodables the words are not scaffolded by each lesson, or week, but rather by larger unit structures. For example, the first unit of the ARC kindergarten phonics program includes 25 letters, graphemes, or blends. While this is an unusual choice, that might lead some to believe the ARC decodables are not decodable, I am not aware of any substantial research that can be pointed to discredit the practice.

Phonics Lessons

Whole class phonics instruction was less explicit than I would have liked. Rather than explicitly explain phoneme grapheme correlations, phonics lessons featured short texts, with a number of examples of the grapheme being explored. Teachers were then encouraged to explore the relationship between the grapheme and the sound. However, small group phonics lessons, designed to support students with specific decoding needs, were far more explicit and in my opinion, much better. The program does include some instruction on analytic phonics, blends, and high frequency words. All of these choices are less common with modern structured literacy programs. However, to the best of my knowledge there is no research that can be pointed to, to suggest that the ARC creators are “wrong” for including these elements. That said, these design choices may lead to some resistance with structured literacy advocates. For the purposes of transparency, I have attached examples of ARC phonics lessons in the references section, with permission from the American Reading Company.

What Does the Arc Research Show?

Evidence for ESSA:

John Hopkins University did a review of the Evidence for ESSA research. They evaluated one study and examined a sample of 792 students. They rated the study rigor as strong, and found a mean effect size of .14, which according to Cohen’s guide is negligible. That said, there is debate on how low an effect size has to be, to be truly negligible , (Kraft, 2005) suggested that rigorous RCT, with effect sizes above .05 should be considered moderate. However, we would disagree with that finding at Pedagogy Non Grata. Our review of studies reviewed by Evidence for ESSA, rated strong, and with large sample sizes found a mean effect size of .13 (k=33) [.13, .21]. None of these studies were negative and only one of these studies showed an effect size below .05.

Our Evaluation:

To evaluate the efficacy of ARC, we conducted a review of ARC studies. We searched their company website, Education Source and the ERIC database. On the company website we found 15 studies. However, only 2 of these studies used an experimental model to examine the efficacy of the program. The other 13 studies were single group design and thus excluded. On the Education Source database, we located 3 articles, of which none were experimental. On the ERIC database, we found 22 studies, with the search terms “American Reading Company”. However, none of these studies were specific to the American Reading Company program.

In order to examine the efficacy of the ARC program, we examined the two experimental studies conducted. The first study was conducted by Abigail Gray, Philip Sirinides, Ryan Fink, and Brooks Bowden. Their study used an RCT design to evaluate the effects of ARC for kindergarten students. “Data were collected from 71 classrooms (treatment and control) in 21 schools, encompassing 1,589 students in two kindergarten cohorts.” Each cohort received the program for one school year. Treatment group instruction was compared with business as usual instruction and examined across multiple standardized tests, including the WRMT, KRMS, AIMSweb, KTEA. The study authors calculated their own effect sizes using the Glass Delta formula. “Glass's delta, which uses only the standard deviation of the control group, is an alternative measure if each group has a different standard deviation.” (Social Science Statistics, n.d.). The authors calculated effect sizes both for the students intended to be treated (ITT) and the students actually treated (TOT). We tabulated the (TOT) effect sizes in the below chart to model the findings of the Gray 2021 study.

The second study was conducted by Dr. Joseph DuCette for Temple University. This study used a quasi-experimental design to examine the efficacy of using the ARC program in conjunction with the 100 book challenge. In the 100 book challenge, the students are given a library of ARC books and reading logs. The students are then encouraged to read at least 100 books each. The study included 3317 grades 1-3 students from 12 schools. Results were measured with the Stanford-9 standardized test. We calculated effect sizes for this study using Cohen’s d. Effect sizes were calculated by dividing the mean difference between the treatment group and the control group and dividing it, by the pooled standard deviation. SDpooled = √((SD12 + SD22) ⁄ 2). The effect size was calculated by the first review author and then replicated by the second review author to insure validity. The study showed an effect size of .54 for reading comprehension, and .80 for general reading achievement. Students in the treatment group were also 16.59% more likely to be reading at grade level, than in the control group (p = .024).

What Do These Results Mean?

To the best of my knowledge there have been two experimental or quasi-experimental studies on ARC. However they show rather opposite results. The (DuCette, 1999) study showed a mean effect size of .67 suggesting a large result and the (Gray, 2021) study showed a much lower effect size of .10. There are three possible ways to interpret this difference:

The studies show different results, because they’re looking at different grades. Arc is not effective for kindergarten, but it is effective for grades 1-3.
The (Gray, 2021) study is a RCT, more recent, and thus more rigorous. Therefore, the (Gray, 2021) results are more reliable.
The DuCette study was examining the impact of the 100 book challenge, with ARC, it is therefore impossible to tell if the (DuCette, 1999) outcomes should be attributed to ARC or the 100 book challenge. Therefore the (Gray, 2021) study is more reliable.
There is no way to know which study is more reflective of ARC, without more research. Therefore, the least problematic interpretation should be based on a mean result, between the two studies. Likely surprising to none of my readers, I lean towards the third option.

In order to evaluate the efficacy of the ARC program, we took an unweighted mean effect size for each assessment outcome and charted it in the below graph. For context, effect sizes below .20 are considered negligible, between .20 and .39 as moderate, between .40 and .79 as moderate, and .80 or above as large. However, the specific interpretations of effect sizes can be subjective, see (Kraft, 2018) and (Hansford, 2023). The following results suggest that the ARC program has a moderate impact on reading outcomes.

Discussion:

The mean results of these studies are moderate.That said the results of the (Gray, 2021) study are negligible (according to Cohen’s guide), which was the more rigorous of the two studies evaluated. However, the (Gray, 2021) study was a large scale RCT, with a standardized assessment. Research of the John Hopkins study data-base by LXD research and Pedagogy Non Grata show that education studies with this design do typically show smaller effect sizes, on average .17 (Hansford & Schechter, 2023). If we base our interpretations on the (Hansford & Schechter, 2023) findings, these effect sizes could be alternatively analyzed as small but significant for the (Gray, 2021) study, high moderate for the (DuCette, 1999) study and well above average overall. That said, with only two experimental studies, showing very different results, it is difficult to find conclusive trends from this research. However, given that ARC has made substantial changes to their programing overall, it is in our opinion, better to evaluate the program based on the qualitative aspects of the curriculum and not their research findings.

While critics of the ARC program do suggest that the program still has room to improve, specifically around their phonics programming and their decodable texts; overall the changes made have been far more substantial than those made by other (previously balanced literacy) companies. Moreover, the changes made do qualify the program for the systematic phonics classification and the program is now inarguably research based. I think it is important to acknowledge these changes, to better incentivize companies to continue to positively develop their products.

Final Grade: B+

Two studies showed a mean effect size of .40, on standardized tests & the program’s updated principles are evidence-based. These are the criteria of Pedagogy Non Grata, for an A- grade. However, given the controversy surrounding the program, and the fact that the high effect size results came from a study that looked at both the impact of ARC and the 100 book challenge at the same time. I am uncomfortable, giving the program the A grade. That said, I look forward to seeing more research on this topic and hope that future studies show a large magnitude of effect for the program.

Qualitative Grade: 9/10

The ARC program includes the following evidence-based types of instruction: explicit, phonemic awareness, systematic phonics, morphology, vocabulary, spelling, fluency, and comprehension.

Want to know more about our grading system? Click here: Grading System

Review Limitations:

This review was conducted for the Pedagogy Non Grata blog and does not constitute peer-reviewed research. People using this review to make purchasing decisions should consult multiple sources of information, including What Works ClearingHouse and Evidence for ESSA.

Written by Nathaniel Hansford

Reviewed by Elizabeth Reenstra

Last edited 2024/02/21

References:

Gray, A,. Sirinides, P., Fink, R., & Bowden, B. (2020). Zoology One Efficacy Evaluation. Consortium for Policy Research in Education.

Ducette, J. (1999). An Evaluation of the “100 Book Challenge Program”. Temple University.

Kraft, M. A. (2020). Interpreting Effect Sizes of Education Interventions. Educational Researcher, 49(4), 241-253. https://doi.org/10.3102/0013189X20912798

Shanahan, T. (2019). How Decodable Do Decodable Texts Need to Be?: What We Teach When We Teach Phonics. Shanahan on Literacy. https://www.shanahanonliteracy.com/blog/how-decodable-do-decodable-texts-need-to-be-what-we-teach-when-we-teach-phonics#:~:text=We%20want%20beginning%20reading%20texts,words%20or%20that%20are%20so

Social Science Statistics. (n.d.). Effect Size Calculator. https://www.socscistatistics.com/effectsize/default3.aspx

Hansford, N,. Schechter, R,. Reenstra,. E & Aitchison, P. (2023). What is the Best Language Program? Teaching by Science. https://www.teachingbyscience.com/what-is-the-best-language-program

Lesson Samples:

American Reading Company

Subscribe Form