Skip Navigation
Administration for Children and Families  
ACF
ACF Home   |   Services   |   Working with ACF   |   Policy/Planning   |   About ACF   |   ACF News   |   HHS Home

  Questions?  |  Privacy  |  Site Index  |  Contact Us  |  Download Reader™  |  Print      

Office of Planning, Research & Evaluation (OPRE) skip to primary page content
Advanced
Search

Table of Contents | Previous | Next

CHAPTER IV: SMALL-SCALE EVALUATION: MATHEMATICS CURRICULUM

Mathematics skills are critical for success in today’s world. Yet, despite the M importance of these skills, studies suggest that low-income preschoolers’ mathematics skills lag behind those of their more-advantaged peers by as much as seven months (Starkey et al. 2004). Similarly, mathematics achievement scores of children in persistently poor households continue to lag behind those of children from higher-income households into the early grades of school (Smith et al. 1997). The gap in mathematics achievement is partly a function of differences in maternal education and family income. In a study of preschool-age children in 65 Los Angeles neighborhoods, children’s reading and mathematics achievement scores related most closely to the mother’s educational attainment and to neighborhood poverty (Lara-Cinisomo et al. 2004). The differences in children’s scores related to maternal education and neighborhood poverty could thus stem in part from differences in support for early reading and mathematics skills in the children’s home environments. Early intervention programs, such as Head Start, have the potential to provide a grounding in early mathematics concepts that children may not be acquiring at home.

To date, researchers and policymakers have focused more on preschools’ support for early literacy skills development than on the ways in which the preschools can support the development of mathematics skills. Ideas about which mathematics skills should be developed—and how that can be accomplished in the preschool classroom, how preschool children should acquire those skills, the development of curricula and measurement tools to chart the children’s progress, and the evaluation of alternative approaches—have lagged advancements in the early literacy area. However, interest in supporting children’s early mathematics skills is growing, as is knowledge about approaches to developing early mathematics skills among preschoolers. For example, researchers have investigated which mathematics concepts children are developmentally ready to learn (Clements et al. 2004; Cordes and Gelman 2004). They also have developed mathematics-oriented preschool activities and curricula (Casey et al. 2004; Greenes et al. 2004; Sarama and Clements 2004; Sophian 2004; Starkey et al. 2004). Findings from these research efforts suggest that preschool children have some intuitive knowledge about many mathematical concepts that they have obtained through their own experiences; as a result, most adults underestimate the ability of preschool children to increase knowledge by exploring materials and by interacting with adults who can amplify what the children are doing to inform about numbers, size, shape, time, and measurement (Tudge and Doucet 2004). Even so, many mathematics concepts are beyond the comprehension of preschool children, suggesting that mathematics activities and curricula must be designed carefully (Ginsburg and Golbeck 2004). Several researchers implementing mathematics curricula in preschools have noted that the teachers reported learning a great deal about the mathematics activities that children are capable of performing, as well as about methods of presenting a much broader range of mathematical concepts in interesting ways (Sophian 2004; Griffin 2004; Starkey et al. 2004). After implementation, the teachers reported an increase in the frequency and breadth of mathematics activities in their classrooms, with participation by all children, rather than by just a few.

As another example of an increasing focus on mathematics instruction in preschool, the National Research Council recently published a summary of a workshop on mathematical and scientific development in early childhood (National Research Council 2005). The workshop was intended as an initial step in exploring the research base on children’s cognitive capacity in mastering mathematical and scientific ideas and skills. The workshop focused on availability of curricular and resource materials for early childhood settings in an effort to map the links between research and practice. Workshop panelists concluded that there is a marked lack of connection between research and practice in this field. The research base was found to be weaker than that in literacy with less well-developed basic and applied research, as well as a dearth of longitudinal studies. The NRC concluded that more research in all these strands would be valuable, as well as a synthesis study that pulls together the existing strands of research knowledge.

Mathematical skills are likely to be emphasized more strongly in Head Start classrooms as programs consider the results of the first year of Head Start National Reporting System (NRS) assessments, which included both the literacy and mathematics domains. In particular, Head Start programs are considering ways to strengthen their mathematics content to match recent efforts to improve language and literacy activities in the classroom. Yet, Head Start programs have little information about which mathematics activities or curricula might be most useful. To date, the research on available mathematics curricula has involved small numbers of classrooms and, in most cases, designs that are not rigorous (for example, comparison groups or pre-post designs). In the Preschool Curriculum Evaluation Research (PCER) project, one group of researchers is evaluating two mathematics curricula using a rigorous design.1 Site-level sample sizes in the PCER projects tend to be sufficient to detect effect sizes of .20 to .40 for language, early literacy, and early mathematics outcomes and .35 to .45 for behavioral outcomes.2

This chapter describes approaches to evaluating alternative mathematics curricula in which Stage 2 evaluation designs are used. These designs are called “small-scale” in contrast to the nationally representative Stage 3 designs, but in general, the sample sizes for these evaluations will be larger than those used in PCER or the Quality Research Center (QRC) Consortium projects so that smaller impacts can be detected. We focus on two mathematics curricula developed for preschools that include written manuals, and that have been implemented multiple times. Several other preschool mathematics curricula, described briefly here, are in various stages of early development. Although we focus on mathematics curricula in this chapter, the approach that we describe can be generalized to any curriculum evaluation.

  1. CURRICULUM MODELS

  2. Several approaches to supporting preschool children’s mathematics ability have been developed. Two fairly well developed ones are being evaluated in preschool settings as part of the PCER project:

    • Building Blocks (Sarama and Clements 2004). Building Blocks provides materials (building blocks, puzzles, and art projects) and ideas for teacher-led activities (songs and stories) to support mathematical activities through which children can enhance their mathematics skills. Topics include numbers, operations, and geometric and spatial reasoning, all including subthemes of patterns, data, and classification/sequencing. The curriculum includes both computer-based activities and concrete activities for each lesson.

    • Prekindergarten Mathematics Curriculum (Starkey et al. 2004). This curriculum includes 27 small-group activities with concrete materials that teachers conduct with preschoolers. The activities cover enumeration and number sense, arithmetic reasoning, spatial sense, geometric reasoning, pattern sense and unit, nonstandard measurement, and logical relations. In addition to teacher activities, the curriculum includes computer-based mathematics games and instructions for setting up a mathematics learning center in the classroom. A set of parent activities that coordinate with the classroom curriculum is available.

    Several other preschool mathematics curricula are in earlier stages of development or are narrower in scope. They have been implemented in a few Head Start and pre-kindergarten classrooms, but they would need further replication and documentation to be ready for small-scale evaluation. With additional development, they could offer helpful approaches to providing mathematics education for preschool children:

    • Weekly Mathematics Activities for Head Start Teachers (Sophian 2004). The curriculum focuses on the concept of unit as it applies to enumeration, measurement, and the identification of relations among geometric shapes. It includes a weekly core classroom project, supplementary classroom activities, and a weekly home activity.

    • Big Math for Little Kids (Greenes et al. 2004). This set of activities and stories covers number, shape, pattern, logical reasoning, measurement, operations on numbers, and space and navigation. The approach includes a focus on mathematics vocabulary.

    • ‘Round the Rug Math (Casey et al. 2004). This approach provides six books for preschoolers containing problem-solving adventure stories. Each book focuses on a different mathematics content area, including spatial concepts, shapes and geometry, pattern, measurement, and graphing.

    • Number Worlds (Griffin 2004). This program focuses on developing children’s number sense, including concepts of quantities, relative size, counting, and simple operations. It offers a sequenced set of lessons/activities that identify learning goals, suggest discussion sequences, and provide concrete materials for children to work with.

  3. STUDY DESIGN

  4. The fundamental issues to be addressed by an evaluation of a new mathematics curriculum are whether the curriculum improves children’s early mathematics skills, and whether pursuing that curriculum displaces other classroom activities, such as language, literacy, and social-emotional development, that would otherwise be the dominant activities. This section discusses key elements of a research design to address those questions. We begin by discussing the quality enhancement and its counterfactual, the research questions, sampling strategy, random assignment plan, and sample sizes.

    1. The Quality Enhancement and Its Contrast

    2. Head Start programs traditionally have strongly emphasized learning through play and children’s social development. During the past decade, language and literacy development received increasing levels of attention in preschool settings. We can be fairly certain that, after implementation of the Strategic Teacher Education Program (the national Head Start initiative to improve teacher’s knowledge about activities to promote language development and literacy in the classroom), all Head Start centers added book reading and other language and literacy activities to their class planning. Early mathematics has not received as much attention, so a targeted mathematics curriculum would likely introduce new topics to most Head Start programs’ daily schedules.3

      Accordingly, an evaluation that contrasts Building Blocks or the Prekindergarten Mathematics Curriculum (the two well-developed preschool mathematics curricula) with normal practice in Head Start classrooms would likely reveal clear differences between the two. Relative to the control classrooms and children, the classrooms given the mathematics curriculum would likely introduce more mathematics concepts to children, and the children would spend more time expanding their understanding of mathematical topics. Control classrooms would spend relatively more time in language and literacy activities and in play than in more targeted mathematics instruction.

      A much larger evaluation, perhaps in the context of a broader field test (Stage 3), could support a comparison of one curriculum relative to another by randomly assigning specific mathematics curricula to specific centers. At the small-scale evaluation stage (Stage 2), it would be more useful to focus the evaluation on a single curriculum, and to obtain solid evidence about whether that curriculum works in the set of programs participating in the evaluation. In either design, the activities of the counterfactual would be important to document.

      A less-defensible alternative evaluation design might measure the impact of implementing a broader set of mathematics curricula relative to current Head Start practice. If Head Start teachers were asked to devote more time to mathematics, they might choose a curriculum on their own. Thus, the evaluation would call for teachers in the intervention group to select a mathematics curriculum to implement. The research question in this case would be, “Does implementing a preschool mathematics curriculum selected by the teacher improve children’s mathematics ability relative to current Head Start practice?” This design would not support a comparison of one curriculum relative to another, as curricula would be selected by teachers and programs, rather than be randomly assigned to them. Because the intervention in this design cannot be clearly described or defined, we do not recommend this approach.

      Thus, we recommend evaluating a specific mathematics curriculum for two reasons. First, the alternative mathematics curricula vary in their intensity and focus, so it would be more useful to learn from the evaluation that a particular curriculum can or cannot work, rather than to potentially confound curriculum efficacy with program characteristics that influence curriculum choices. For example, programs that are not well prepared to implement a new mathematics curriculum might choose a less comprehensive, less demanding one, thus confounding the independent effects of weaker programs and weaker curricula. Second, implementing the various curricula well could be difficult if different centers choose different curricula. In addition, implementation would be more cost-effective if the centers implementing a particular curriculum were geographically close, providing easier access for the curriculum developer to provide assistance.

    3. Research Questions

    4. The first set of research questions focuses on direct impacts of the mathematics curriculum on classroom activities and child outcomes:

      • Does the use of an early mathematics curriculum increase the amount of time that the teacher spends discussing mathematical concepts in class?

      • Does the use of an early mathematics curriculum increase children’s number sense, knowledge of geometry, and spatial sense?

      The second set of research questions focuses on whether the mathematics curriculum has displaced other activities, such that mathematics knowledge increases at the expense of language and literacy activities, and on whether the shift in focus has affected children’s progress in those areas:

      • Does the use of an early mathematics curriculum reduce the amount of time that the teacher spends on language and literacy activities or class time spent in play and prosocial activities?

      • Do children in classes with an early mathematics curriculum make less progress in language and literacy or in social-emotional development (relative to children in classes without that curriculum)?

      By Stage 2, we expect that curriculum developers have honed their techniques for implementing the curriculum in preschool (particularly Head Start) settings so that implementation is likely to be successful. Nevertheless, understanding the impacts of the curriculum on children’s development and on classroom activities hinges on whether or not the training and technical assistance process led teachers to implement the curriculum with high fidelity. The larger number and diversity of centers and classrooms involved in the Stage 2 study might lead to new challenges, either for the curriculum’s “fit” with the program or for the training and technical assistance procedures. The evaluation should, therefore, examine whether the curriculum was implemented to high fidelity, what steps were taken to get to that point, and what was learned from the process:

      • Was the curriculum implemented fully and with a high degree of fidelity?

      • What strategies were used to implement the early mathematics curriculum? How much training was provided, and over what time period? What amount and types of technical assistance were provided?

      • What challenges were encountered, either for the curriculum’s “fit” with the program or with training and technical assistance procedures, and how were they resolved?

      Finally, although the evaluation will examine whether the mathematics curriculum is effective overall, the Head Start community also will be interested in whether the curriculum is effective across different subgroups of children and families, and across different program designs:

      • What were the impacts of the early mathematics curriculum on key subgroups of children? What were the impacts for Head Start programs with different characteristics?

      Some caution should be exercised in conducting subgroup analyses in Stage 2 because if many subgroups are examined, some impacts will emerge simply by chance. Moreover, since the sample of grantees and centers is not representative of the broader Head Start population, the subgroups are similarly not representative. Drawing conclusions about how well the curriculum works in specific subgroups would require that the sample frame is designed to include a representative sample from those subgroups. However, this approach would not be consistent with the Stage 2 design, which is to conduct an evaluation that yields internally consistent impact estimates for a group of program grantees and centers that volunteer to participate in the study. Impact estimates for subgroups in a Stage 2 design are valid for the subgroups attending the centers included in the sample. Obtaining a representative sample of specified subgroups would increase the sample size requirements of the study. Thus, serious investigation of subgroup impacts must wait until a Stage 3 evaluation, which will examine a sample that is representative of Head Start programs at the regional or national level.

    5. Major Activities and Timetable for the Evaluation

    6. Several activities must occur to carry out the evaluation:

      • Draft study design and protocol for recruiting program grantees and centers and submit for Office of Management and Budget (OMB) review

      • Identify programs and centers to participate

      • Develop data collection instruments and prepare and submit review packages for the Institutional Review Board (IRB) and OMB

      • Conduct and monitor random assignment

      • Train teachers to implement the curriculum in randomly selected centers or classrooms; provide technical assistance and additional services and supplies needed for implementation; monitor fidelity to implementation

      • Identify a sample of children who have consent to participate

      • Collect data from enhancement and control groups

      • Analyze the data and report findings.

      We recommend that the first four tasks occur during the first year of the evaluation, with implementation, sampling of children and collection of data during the second year, and further data collection, analysis, and reporting during the third year (see Figure IV.1). Because the Head Start year typically runs from August or September through May or June, the timing of activities will proceed most smoothly if the evaluation activities begin in January or February. We discuss the steps in more detail in the rest of this chapter.

      The first year of evaluation activities will be dominated by OMB review and recruitment of grantees and centers. OMB review of the study design and protocols for recruiting grantees and centers must be completed before researchers and curriculum developers can obtain information about prospective centers or negotiate agreements to participate. We have estimated two months to draft and submit a package to OMB that includes the study design and recruiting protocols, and six months for OMB review. During the OMB review period, researchers can conduct activities that do not involve information-gathering from prospective centers. For example, data collection protocols can be developed and submitted for OMB review and the study design and data collection plans can be submitted for IRB review. Data systems for tracking children and managing evaluation data can be developed. ACF can inform the Head Start community about the pending evaluation. Following OMB clearance, the curriculum developer and researchers will contact interested grantees and centers and begin the recruiting process. We estimate that recruitment, executing agreements, and randomly assigning centers or classrooms will require approximately four months.

      The main evaluation activities in the second year will be implementing the curriculum to high fidelity, conducting the implementation study, obtaining consent for children to participate in the study, sampling children, and collecting fall baseline data. Ideally, implementation will occur in the spring so that the Head Start classes can benefit from a fully implemented curriculum during the entire year. Third-year evaluation activities will include collecting follow-up data in the spring on classrooms and children, analyzing data, and reporting the results.

    7. Sampling, Random Assignment, and Sample Sizes

    8. Stage 2 designs generally include grantees and centers that have agreed to participate in the study and were selected because of their location or interest rather than their ability to represent Head Start centers and grantees regionally or nationally. At this stage, the evaluation focuses mainly on whether the quality enhancement could be effective in a set of those grantees and centers. As we discuss in more detail in Appendix B,4 because the emphasis is on internal validity, or a rigorous test of the curriculum within the set of grantees and centers in the evaluation, the sample size requirements are lower than if grantees and centers were selected to represent all Head Start grantees and centers. For Stage 2 designs, a set of programs and centers interested in participating must be identified; random assignment must be completed, and classrooms and children must be selected to participate in data collection.

      Figure IV.1 Schedule of Activities for Small-Scale Evaluation of a Mathematics Curriculum Three-Year Study Beginning in January
      [D]

      Identification of Grantees and Centers. Various methods could be used to recruit grantees. Methods of recruiting a broad set of programs include working through the Head Start Bureau to communicate with all regional offices; sending a fax to all programs; or sending an email sent to program directors. However, Stage 2 evaluations ideally should be geographically focused to support careful implementation of the enhancement, so more-targeted methods probably would be more effective than would broadly disseminated flyers or similar methods. With support from the Head Start Bureau and selected regional offices, the curriculum developer and research team also could approach directors of Head Start programs within a geographic area and ask colleagues in other areas who might help with implementation to contact program directors in their own areas. If a broader call for participation is made first, interested programs could be asked to help the curriculum developer approach other programs in their areas. The initial contact materials should briefly explain the study’s focus (in this case, a new mathematics curriculum) and should briefly summarize the benefits of participation. The contact materials also might indicate that all centers (or classrooms) in the program will have an equal chance of participating in the study, but that only some of the ones in each program will be chosen to implement the curriculum. Those not chosen to implement the curriculum in the first year will be given priority to implement it, if desired, after the follow-up data have been collected. Programs will receive information about the study’s findings, and that they will be partners in the research study. We expect that several programs will want to learn more about the evaluation as a result of this recruiting effort.

      Ideally, program grantees and centers chosen for the study would have the following characteristics

      • Centers are clustered in a small number of geographic locations to simplify implementation.
      • Centers are not already implementing a targeted mathematics curriculum but are interested in doing so.
      • Grantees and centers are willing to cooperate with the study’s random assignment and data collection requirements.
      • The centers currently follow a similar curriculum—either one of the major curricula followed by Head Start programs or a locally designed curriculum—to provide a more stable counterfactual across evaluation sites.

      Recruitment of grantees and centers will be easier if their staff believe they will gain more from participation than they might lose. Accordingly, after the initial contact has been made, the benefits to sites must be explained in detail, and concerns about study burden must be discussed. Curriculum developers and research staff should visit program directors and other relevant administrative staff to discuss the curriculum, its expected benefits to children, what will be involved in implementing it, and the research aspects of the evaluation. Researchers will explain the program’s role in ensuring that the study yields useful information about the curriculum’s effectiveness. For example, researchers will have to work with program staff to obtain information required to implement random assignment (for example, the number of classes in each center, teachers’ names, class sizes, and children’s ages). Program staff will have to maintain random assignment statuses (for example, by ensuring that teachers in the intervention group do not share information about the curriculum with control-group teachers).5 Researchers will have to monitor the integrity of random assignment over time, a key piece of information to demonstrate the reliability of the study. They also will have to work with program staff to schedule child assessments and classroom observations. These evaluation-related requirements are balanced by the chance to implement a new curriculum that could benefit children and the chance to work with the curriculum developer to ensure that the curriculum fits the needs and interests of Head Start children and can be incorporated easily into Head Start program activities. Benefits to the control group are more challenging to identify, but an important benefit that could be offered for control-group participants in a Stage 2 evaluation is first priority to implement the curriculum once the children participating in the evaluation finish the Head Start year.

      A program’s agreement to participate should include a memorandum of understanding that describes the benefits to the program, and that specifies the respective responsibilities of the curriculum developer, researchers, and program in this joint research undertaking. A similar agreement should be developed with each participating center. This level of detail ensures that misunderstandings that have the potential to threaten the success of the research study do not arise after it is too late to resolve them. A memorandum of understanding also offers a useful way of informing new staff about the study.

      Other Program, Center, and Child Sample Selection Criteria. Because group-randomized designs require relatively large sample sizes to detect impacts of moderate size (20 percent of the standard deviation of the outcome measure), it can be costly to expand the sample to provide sufficient power to detect similar levels of impact in subgroups. For this reason, the programs, centers, and children to be included in the sample should be defined carefully, with the most important research questions in mind when doing so.

      One sample-definition strategy is to include a broad set of programs with different characteristics. This approach would address the question of the curriculum’s effectiveness across the range of Head Start programs and families. In Stage 2 evaluations, however, the programs will not have been sampled randomly from all Head Start programs, so the sample will not truly reflect the diversity of programs. If the evaluation finds that the curriculum is not effective, it will be difficult to determine whether it could be effective in some program subgroups.

      A more useful strategy is to target programs with a narrower set of characteristics in common, so that the evaluation measures the effects of the curriculum in a more defined set of circumstances. One possibility is to focus the evaluation on centers with full-day (six hours or more per day) programs, which would provide the greatest likelihood of finding improvements in children’s mathematics ability without displacement of gains in other areas. Alternatively, given that about half the children served by Head Start attend half-day programs, the evaluation should investigate whether a mathematics curriculum can be effective in both full-day and half-day programs without displacing gains in other areas. Therefore, we recommend that the group of participating centers include an approximately even split between full-day and half-day programs.

      Children’s age is another characteristic that might define the evaluation sample. Including three-year-old children in the sample could be useful if the curriculum includes activities for this age; in addition, starting at a younger age could be helpful to children. Nevertheless, for a small-scale evaluation of a mathematics curriculum, we recommend limiting the sample to four-year-olds. If both three- and four-year-old children are in the sample, most analyses would have to focus separately on each age group, as mathematics performance is likely to be different in the two age groups. Focusing on age groups in this way effectively reduces the sample available for analyses or would require incurring the cost of doubling the sample size. Moreover, many three-year-old children from low-income families cannot achieve a basal score on the measures of mathematics or language typically used in these evaluations, so it might not be possible to gauge their abilities at the end of the first intervention year.

      An important research question is whether the curriculum is effective across the major ethnic groups participating in Head Start (African Americans, Latinos, and whites). During the curriculum development stage (Stage 1), the curriculum developer should adapt the curriculum for each ethnic group, so that a small-scale evaluation will appropriately include all three of them. However, Stage 2 evaluation designs are unlikely to include sufficient sample to support subgroup analysis of the impacts of the curriculum in all three groups. The evaluation will reveal whether impacts are very large in any of the subgroups, but the subgroup samples will probably be too small to allow for detection of impacts of modest size. Attaining large enough sample sizes of different ethnic groups to detect modest-sized impacts might not be feasible until a Stage 3 evaluation of a curriculum.

      Random Assignment. Either centers or classrooms could be randomly assigned without requiring changes in the fundamental research questions. The choice of center- or classroom-level random assignment will influence sample size requirements (and, thus, evaluation costs), as well as the potential for spillover between the intervention and control groups. Sample sizes do not have to be as large if random assignment is conducted at the classroom level than if it is conducted at the center level; randomly assigning smaller units generally increases the power of the design (see Appendix B). In addition, classrooms within one center are more likely to be similar in population served and comparison curriculum used than are two centers. Random assignment at the classroom level might also offer a slight advantage over center-level random assignment at the recruiting stage. Center directors might prefer the 100 percent chance that at least some classrooms in the center can implement the enhancement under classroom-level random assignment over the 50 percent chance that no classrooms will implement the enhancement under center-level random assignment. Moreover, having both enhancement and control classes within their center enables directors to gain a sense (although not rigorous evidence) of how the curriculum seems to be working.

      However, those are the only advantages to randomly assigning classrooms. In fact, for several practical reasons, randomly assigning classrooms can present more difficulties for an evaluation than if centers are randomly assigned. First, if the curriculum is implemented in randomly selected classrooms, care must be taken to prevent spillover from occurring. Spillover occurs if control-group teachers implement the curriculum or use techniques based on the curriculum. Thus, teachers implementing the curriculum will have to understand that the distinction between the enhancement (curriculum) and control classrooms will be eroded if they discuss curriculum-related activities and approaches with teachers in the control group. If control-group teachers implement even part of the curriculum, treatment-control differences in outcomes cannot be considered valid estimates of the curriculum’s impacts. Notably, the requirement not to discuss the curriculum and methods with teachers in the control group tends to inhibit the discussions that would take place in a full-center implementation that can enhance the overall success of implementation. Second, classroom configurations and teachers change from year to year—and even from May to August—as enrollments are finalized, available space in the preschool is negotiated, and teachers’ plans become settled. These changes would present fewer difficulties for the evaluation if an entire center were to either follow the enhancement or continue their usual practices than if individual teachers or classrooms were randomized to one group or the other. Finally, if, as expected, teachers receive training on the curriculum before children are assigned to classes, and if classes are the unit of random assignment, then researchers must be especially vigilant to ensure that children are placed randomly, rather than strategically, into enhancement or control classrooms. Child assignments would not be a problem if centers were randomly assigned, as decisions about children’s placement in centers are based on such factors as center locations and schedules relative to the family’s residence and schedule.

      Random assignment should take place during the year before outcome data are collected, so that curriculum developers can fully implement the curriculum and ensure fidelity.6 Under this schedule, if initial implementation begins in January or February of the program year, teachers will have time to learn and practice the new techniques before the next school year begins.

      If centers are to be randomly assigned, participating programs will have to give the researchers a list of participating centers, so that random assignment can begin. Within a single grantee, centers would be assigned in pairs to the enhancement group and to the control group. For example, if a grantee has eight centers, four centers would be assigned to each group. If classrooms are to be randomly assigned, participating programs and centers must provide a list of teachers’ names so that teachers can be randomly assigned to enhancement and control groups. If the center has two classrooms, one classroom would be assigned to the enhancement group, and one to the control group. Similarly, four classrooms would be assigned evenly to each group. If the center has an odd number of classrooms, more classrooms should be assigned to the enhancement group than the control group. Then, if a teacher leaves the program, the replacement teacher can be randomly assigned to either enhancement or control group.7 If the assignment is to the enhancement group, the teacher should receive training and technical assistance so as to be able to implement the enhancement as soon as possible.

      Selection of Children into the Research Sample After Random Assignment. Although the evaluation design may call for random assignment of centers or classrooms, it will not be necessary to collect data from all of the children in a classroom or from every classroom and child in the center. Including 10 children per classroom (or 15 per center, if centers are randomly assigned) will provide a sample large enough to estimate average outcomes under random assignment of either classrooms or centers. Beyond that number, additional children add data collection costs at a constant rate but add very little to the power of the design to detect impacts. Therefore, Stage 2 designs typically will involve sampling of children within classrooms (if classrooms are randomly assigned) or sampling of children within centers without regard to classroom (if centers are randomly assigned).8

      Sample Sizes. Table IV.1 shows the number of centers, classrooms, and children required for the evaluation of a mathematics curriculum under alternative designs. Randomly assigning classes is more efficient than is randomly assigning centers, as the former designs require fewer centers and fewer children than do the latter ones for each level of minimum detectable effect size and assumptions about the extent of baseline data. However, as detailed above, such designs also allow more opportunity for spillover effects that can cause an evaluation to fail.

      If centers were randomly assigned, and the desired level of precision in detecting impacts was .20 (one-fifth of a standard deviation on the outcome measures), we would have to randomize 74 centers (37 to each group) and would have to randomly select 1,233 children from those centers, assuming the availability of only minimal baseline data.9

      Table IV.1 Sample Sizes Required for a Stage 2 Evaluation of a Mathematics Curriculum Under Alternative Designs
        Total Number of Centers Total Number of Classes Initial Sample of Children
      Randomly Assign Centers MDE = 0.15 No baseline 166 Any 2,767
      Minimal baseline 132 Any 2,200
      MDE = 0.20 No baseline 94 Any 1,567
      Minimal baseline 74 Any 1,233
      Randomly Assign Classes MDE = 0.15 No baseline 115 230 2,530
      Minimal baseline 92 184 2,024
      MDE = 0.20 No baseline 65 130 1,430
      Minimal baseline 52 104 1,144
      Note: Sample size calculations assume that the evaluation includes 10 children per class (under classroom random assignment) or 15 children per center (under center random assignment), and that 90 percent of the initial sample is available at followup. “Minimal baseline” means that demographic information is collected as part of the consent form, and that NRS fall outcome data are available.

      Sample size calculations also assume a two-tailed test of statistical significance at 80 percent power and a 5 percent significance level and that centers or classrooms are divided equally among treatment and control groups. The sample size calculations do not include an adjustment for the design effect of weighting for sample nonresponse.

      MDE = minimum detectable effect; NRS = National Reporting System.

      Baseline demographic data allows us to control for these characteristics statistically when we estimate impacts, which reduces the variance of the impact estimate and thereby increases the power of a particular sample size to detect impacts. (Section E describes the statistical models that could be used in the analysis of impacts.) Minimal baseline data would include demographic information about the family collected as part of the consent form and children’s scores from the fall NRS assessments.10

      If classrooms were randomly assigned and the desired level of precision in detecting impacts was .20, the initial sample would have to contain 104 classrooms (52 per group) in 52 centers, and a total of 1,144 children (11 per classroom), again assuming that minimal baseline data were available.

  5. IMPLEMENTING THE MATHEMATICS CURRICULUM

  6. The goals of implementation in Stage 2 are twofold. First, the curriculum should be implemented to high fidelity in the centers (or classrooms within centers) that were assigned to the enhancement group. High fidelity ensures that the evaluation measures the impact of the curriculum as designed, rather than measures a watered-down version of the curriculum that was implemented incorrectly. Second, procedures and manuals should be clear enough and should cover a broad enough range of Head Start program situations so that, in the future, it would be feasible to implement the curriculum on a broader scale, using Head Start training and technical assistance staff, with only minimal assistance from the curriculum developer. To achieve this goal, the procedures and manuals will have to be revised as necessary after implementation on the basis of information obtained while implementing the curriculum in the 40 to 80 centers. Issues that arise during initial training or during the technical assistance period will have to be addressed and documented, so that the training materials subsequently can be revised in a way that reflects what has occurred in the classroom during implementation of the curriculum.

    At the beginning of Stage 2, the plans for implementing the mathematics curriculum and ensuring that teachers are using it properly will be available to the curriculum developer’s staff, having been established during the curriculum’s development period (Stage 1). This will ensure that at the start of implementation, a clear plan exists to implement the curriculum to high fidelity. Manuals will be available that describe the curriculum in detail, give teachers examples of activities, and offer trainers a plan to work with teachers during an intensive initial period and over time, while the teachers are trying the new methods in their classes. The manuals should indicate the intensity of initial training and ongoing technical assistance, the duration of the training and technical assistance, and the training and technical assistance staff’s qualifications. These manuals do not guarantee that the implementation will proceed smoothly in all centers, but that a basic plan exists that has been honed in several Head Start centers. Challenges encountered and resolved during implementation for a Stage 2 evaluation will form the basis for modifying the curriculum or training and technical assistance.

    For a small-scale evaluation (consisting of 40 to 80 centers, depending on whether random assignment is at the center or classroom level, and whether minimal baseline data have been collected), we expect that the curriculum developer will coordinate training for the centers and classrooms assigned to the treatment group. Depending on the design, the training and technical assistance staff will work with either half the centers or half the teachers in each center. An evaluation of this size could operate in approximately five geographic locations, each with 8 to 15 centers. The curriculum developer should have a site coordinator in each location and a staff of trainers to offer initial training and on-site technical assistance during the implementation period (January through May of the second year) and the evaluation year (September through December of the second year and January through June of the third year). The curriculum developer should visit the sites periodically to conduct some of the large-group training, address any issues that arose during training and technical assistance, and monitor implementation.

    Prior to implementation of the curriculum, the curriculum developers should communicate with the leadership and stakeholders of the participating programs at all the levels involved, including the program director, education coordinators, center directors, parent Policy Councils members, and others as appropriate. Care must be taken to ascertain the most efficient selection of staff to train. For example, previous curriculum implementation experience has shown that there is often value in training all members of a teaching team—teaching assistants and aides as well as the lead teacher—to ensure optimal implementation and continuation of the curriculum approach in the event of teacher turnover. All relevant staff should at least be informed about the goals and approaches of the curriculum.

    Implementing a mathematics curriculum is likely to require three or four days of initial training in a specialized setting (such as a school or university that has facilities available for large and small groups). Training should include an overview of the curriculum and video footage of teachers engaged in activities associated with the curriculum in classrooms of preschool children, so that teachers in the enhancement group can visualize what the curriculum looks like in practice. Training should then cover a series of modules that will describe each element of the curriculum, and that will present activities and materials associated with each element. Trainers could cover the latter training component with small groups that rotate through the modules over a two-day period. Practical exercises involving role playing and hands-on experience with each module should be offered so that the teachers can practice what they have learned. Techniques for direct and observational assessments of children and links between curriculum assessments and required elements of the Head Start Child Outcomes framework can also be covered in training sessions. The training should wrap up with summaries, questions and answers, and discussion of the technical assistance plan.

    After training has been completed, technical assistance staff will visit classrooms periodically to observe how teachers are implementing the curriculum, and to discuss any questions or issues. The staff should visit once per week at first, for about one to two hours, and could then taper the visit schedule to every other week and, eventually, to once per month. During the visits, the staff should discuss with the teachers the material that has been covered in previous sessions and the teachers’ comfort level with the material. They should periodically complete measures of fidelity to the curriculum and should discuss with the teachers what they have observed, what aspects of curriculum implementation are going well, and what steps the teachers can take to improve their techniques.

    1. Measuring Implementation

    2. Researchers will have to monitor random assignment throughout the implementation period to ensure that the enhancement and control groups remain separate. During this time, they also will measure the process of implementation and fidelity of curriculum implementation. During the evaluation year, they will continue to monitor implementation and fidelity and, as discussed in the next section, will collect measures of teachers, classrooms, and children’s outcomes for the impact analysis.

      Measures of Implementation and Fidelity. Researchers will visit a subset of the centers in the evaluation to measure implementation processes and fidelity to the curriculum. The number of centers included in the implementation study is generally determined as a compromise between the evaluation budget and the need to measure implementation experiences in several of the centers. Approximately 20 to 30 percent of the enhancement centers should be selected randomly from among those participating. Since the implementation study is intended to provide insights into what implementation strategies worked well and what did not, the implementation staff should help classify programs by their implementation experience (for example, how well and how easily implementation occurred) and centers can then be randomly selected from each of these groups to participate in the implementation study. Visits should take place during the spring of the implementation year (year 2), after teachers have received training and technical assistance. Researchers will conduct classroom observations to measure how closely the classroom practices match the curriculum and the children’s level of engagement with the mathematics topics (a list of topics by data collection mode are shown in Table IV.2). Researchers will conduct semistructured interviews with training and technical assistance staff and with key program staff, including directors and education coordinators, as well as focus groups with teachers to explore the content and quality of training, the fit between the curriculum and classroom, and the content and quality of technical assistance. Questions will focus on what aspects of training and technical assistance worked well, what challenges were encountered, how well the curriculum fit with the program, and what aspects of the curriculum, training, and technical assistance might have to be changed.

      While enhancements that are evaluated at Stage 2 are expected to have well-developed implementation plans and thus a high degree of success in implementing to high fidelity in the participating centers, it is not possible to anticipate every challenge to implementation. Accordingly, measures of implementation and fidelity are required both to improve the next generation of implementation materials, as described above, and to ensure that the evaluation can measure the effects of a well-implemented enhancement, as we discuss in Section E. Measures of fidelity to the curriculum and the quality of implementation obtained from the implementation study will be used to define subgroups that enable researchers to estimate impacts of the curriculum among classrooms or centers that implemented the curriculum with high fidelity. Since fidelity and the quality of implementation are not experimentally determined (relative to sites that did not implement to high fidelity), these estimates must be interpreted cautiously but they provide a measure of the effectiveness of the curriculum in “high-fidelity” sites relative to the overall impact.

      Table IV.2: Potential Implementation Study Topics for an Evaluation of a Mathematics Curriculum
      Implementation Study Topics Direct Observation Data Collection Methods
      Program Records and Documents Semistructured Interviews with Key Program Staff Semistructured Interviews with T/TA Providers Focus Groups with Teachers
      Initial Staff Training What was the content of training?   X   X X
      What were the qualifications of trainers?       X  
      What was the intensity and duration of training?       X  
      How well did training prepare staff to use the curriculum? X   X X X
      How well did the curriculum blend with other classroom activities? X   X X X
      How could training be improved?     X X X
      Curriculum How many times each day and each week are mathematics concepts discussed in class?       X X
      How many times each day and each week do children do hands-on mathematics activities?       X X
      Are staff presenting mathematics concepts as frequently and for as long as intended?     X X X
      Are staff presenting all of the curriculum’s mathematics concepts? X   X X X
      If not, why not?     X X X
      Are children engaged by the mathematics discussions? X   X X X
      If not, why not? X   X X X
      Do all children participate in the activities? What proportion are engaged? X        
      Technical Assistance and Support Who provides technical assistance and support? Curriculum development staff? Education coordinators? Others?     X X X
      What types of support do staff receive in providing enhanced services? Observation and feedback? Modeling? Conferences between teacher and technical assistance staff? Written reports?     X X X
      How often is this assistance provided?     X X X
      What topics have are covered?     X X X
      What types of questions are raised? What support is needed?     X X X
      How helpful is the technical assistance     X X X
      What other types of support do staff need?     X X X
      Lessons Learned Which aspects of training went well     X X X
      What factors associated with programs or teachers affected implementation positively?     X X X
      What challenges were encountered in training? In technical assistance     X X X
      What factors associated with programs or teachers increased the difficulty of the training or technical assistance?     X X X
      What strategies were used to overcome the challenges?     X X X
      How well has the curriculum blended into other program activities?     X X X
      How well has the curriculum seemed to meet the children's needs and abilities?     X X X
      How did teachers resolve any issues regarding the fit of the curriculum with the Head Start program?     X X X
      What do Head Start families think of the curriculum?     X X X
      How can the curriculum be improved?     X X X
      How can training or technical assistance be improved?     X X X
      T/TA = Training and technical assistance.
  7. OUTCOMES MEASUREMENT AND DATA COLLECTION PLANS

  8. A comprehensive assessment of the impact of a mathematics curriculum on Head Start children would examine the impact of the intervention on children, teachers, and the overall structure of classroom activities. If the curriculum is implemented well, the teacher will be engaged in more mathematics activities with the children; children will show interest in the activities and their mathematics skills will increase over the Head Start year.

    Outcome measures selected for the evaluation should have the following properties:

    • Relevance to School Readiness Goals and the Head Start Child Outcomes Framework. The Head Start Child Outcomes Framework (COF) outlines eight domains of early learning and development (see Appendix A), including children’s interest and early skills in the various domains of mathematics, including number and operations, geometry and spatial sense, and patterns and measurement.

    • Sensitivity to Intervention. Measures should reflect outcomes malleable to intervention, as opposed to more trait-like qualities (such as temperament).

    • Appropriateness for a Culturally Diverse, Low-Income Population. The selected measures should adequately assess outcomes of low-income 3- to 5-year-olds from diverse cultural populations, including those who do not speak English in the home.

    • Adequate Psychometric Properties. Measures selected should be reliable and valid. Reliability means that they measure the same construct across various settings (for example, the classroom or the home), on repeated occasions (test-retest reliability), when administered or rated by different interviewers/observers (interrater reliability), and when subsets of items are administered to identical samples (split-half reliability). Validity refers to the degree to which the measure taps the underlying construct it purports to measure, keeping in mind linguistic and cultural considerations.11

    • Valid and Reliable for Intended Mode of Administration. A measure might not be valid or reliable as reported if it is used with a group for whom it was not designed or with a mode of administration for which its reliability and validity have not been tested.

    • Prior Use in Large-Scale Surveys and Intervention Evaluations. Prior use is helpful because it suggests that the measure is practical and feasible to use in large-scale research and because it provides a benchmark for the scores of children participating in the evaluation.

    • Cost and Burden. The cost or burden of data collection strategies, including training requirements, respondent burden, and program administration burden, should be minimized.

    Table IV.3 describes the classroom and child outcomes to be measured and the recommended measures of each. Measures of teachers’ knowledge and attitudes about teaching mathematics at the preschool level can be taken from the PCER evaluation and from other recent studies of mathematics curricula. Measures of classroom activities will include the fidelity measure for the selected mathematics curriculum and questions to teachers about how classroom time is allocated among mathematics, language and literacy, and play. Measures of children’s development will include mathematics skills (a standardized assessment and a more detailed measure of children’s mathematics abilities used in the PCER evaluation), language (the Peabody Picture Vocabulary Test, Third Edition, which could rely on the NRS assessment), early literacy (for example, subtests of the Woodcock-Johnson, such as Letter-Word Identification and Spelling), and measures of behavioral problems and approaches to learning completed by teachers.

    To obtain data for the impact analysis, a baseline assessment of children’s mathematics skills will be conducted during the fall, with a follow-up assessment conducted during the spring. A teacher interview, also conducted during the fall and spring, will examine fidelity to the curriculum (in both the enhancement and control classes), time spent in mathematics-related activities, knowledge and beliefs about child learning, and the background of the teachers. The classroom observation, which need be conducted only during the spring, will measure fidelity to the intervention (in both the enhancement and control classes) and time spent in mathematics-related activities. Additional observation measures of classroom quality would tap into potential improvements in early childhood practice generally; typically, these focus more on adult-child interactions and resources than on content-specific activities. However, taking stock of these important interim measures is an important step in helping to understand the pathways by which the curriculum may be changing practice and potentially affecting child learning. The child’s demographic characteristics will be obtained from an information sheet completed by parents as part of the consent process.

    Before any data are collected, the research team must draft instruments, obtain IRB and OMB approval, obtain parents’ consent for their children’s participation, and randomly select children to participate based on their eligibility status and their consent status. We discuss these steps in the remainder of this section.

    Table IV.3: Measures of Intermediate and Child Outcomes Associated with a Mathematics Curriculum
    Outcome Recommended Measure Type of Measure
    Teachers Knowledge Expectations for children's mathematics ability Attitudes and beliefs about children's mathematics ability Teacher survey
    Knowledge about how to present mathematics concepts Types of mathematics activities conducted in class Teacher survey
    Frequency of teaching mathematics activities  
    Classroom Processes Proportion of class time spent presenting mathematics concepts Time spent discussing mathematics concepts Observation
    Total time in class Teacher survey
    Teachers engagement in presenting mathematics content Fidelity measure developed for the mathematics curriculum Observation
    Children's engagement during discussion of mathematics Number of children listening during presentation Observation
    Level of engagement of children  
    Proportion of class time spent in language and literacy activities;proportion of time spent in play Time spent in language and literacy activities Observation
    Time spent in free play activities Teacher survey
    Children’s Development Mathematics ability (number sense, knowledge of geometry, and spatial sense) Woodcock-Johnson Applied Problems (Woodcock, McGrew, and Mather 2001) Direct assessment
    Child Math Assessment, Abbreviated Direct assessment
    Language ability Peabody Picture Vocabulary Test III (Dunn and Dunn 1997) Direct assessment
    Early literacy Test of Language Development-Primary-Third Edition, Phonemic Analysis (Newcomer and Hammill 1997) Direct assessment
    Behavioral problems Behavior Problems Scale (FACES Research Team 2001) Teacher survey
    Approaches toward learning Preschool Learning Behaviors Scale (McDermott et al. 2000) Teacher survey

    Develop Data Collection Instruments. During the first year of the study, data collection instruments will be written. These include the consent form with a form requesting demographic information; the teacher survey; the child assessments; and the classroom observation protocol. Many of these will include standardized assessments that need to be formatted to simplify administration by a trained assessor. Others (such as the teacher questionnaire) will rely on questions that have been used in prior studies. Once the data collection instruments are developed, they will be pretested to ensure that respondents understand the questions, that the flow proceeds logically and smoothly, and that the time required to complete them is reasonable.

    IRB and OMB Research Review. Research on human subjects must be reviewed by an IRB, which considers the benefits of the research to society, the programs, and the participating families and weighs them against the cost of the research to the families and program staff. The IRB also reviews protection of research participants from harm by ensuring that confidentiality is maintained. In addition, if the evaluation is federally funded, data collection instruments and the research plan must be approved by OMB. The data collection instruments are reviewed to ensure that they do not overlap with other ongoing federal data collection efforts, and that burden is not excessive. These reviews will be conducted during the nine months preceding the start of the evaluation year, while the intervention is being implemented.

    Consent. Consent for children to participate in the research must be obtained from parents or guardians. The parent consent form will clearly inform parents (guardians) about the duration of the study, the types of assessments that will be administered, and the voluntary nature of participation. An information sheet will be included in the consent package to collect basic demographic information about the family, such as age, race and ethnicity, and family structure.

    Because many Head Start classrooms serve mixed-age groups of children, some children in a particular classroom will not meet the study’s eligibility requirements. Rather than ask teachers to sort children by eligibility status and to distribute consent forms accordingly, it seems more efficient more accurate, and less burdensome for teachers to request consent from all parents of children in the classroom. The consent form could include a few questions designed to elicit the child’s eligibility for the study (for example, birth date), as well as other demographic variables that will have to be collected for the study. Researchers could then review the consent forms to identify eligible children.

    The consent process could proceed more smoothly if it is incorporated into the home visits that many programs make to families. During these visits, which typically occur just before children attend class for the first time, teachers bring forms that parents must complete before the start of the school year. The teacher is available to explain the forms, and to ensure that they are completed correctly. If the study’s consent is part of this process, the teacher would be able to explain the nature of the study, what will happen if the child participates in the research, and the voluntary nature of the child’s participation.

    Child Eligibility Criteria for the Study. Children entering the research sample for the mathematics curriculum study should be four years old (or eligible to enter kindergarten in the following year). Teachers’ training on the curriculum will have begun during the year before the child assessment year, and some classes and centers include children of mixed ages. Therefore, children entering the research sample ideally should be in their first year of Head Start, so they will all have the same duration of exposure to the curriculum.

    Selection of Children for the Study. Consent forms will be sent by program staff to the researchers for processing. The researchers will enter data from the forms to obtain information on the demographic characteristics of children from each center (who have been given consent to participate). They will then identify which children are eligible for the study (using age, whether new to Head Start, and any other criteria set for the evaluation). A sample of children from the pool of eligible children in the center or classroom will be randomly selected for the data collection. Thus, if the design is to randomly assign centers, children who are eligible and have consent will be listed, and 16 to 17 will be randomly selected for the research sample. If the design is to randomly assign classrooms, children who are eligible and have consent will be listed and 11 will be randomly selected for the research sample.

    Parent consent rates typically vary across classrooms, as rates at which parents return the forms and consent can depend on how organized the parents are, their degree of connection with the classroom teacher, whether parents talk to each other about participating in the study, and how diligently the teacher follows up with parents to return the forms. However, the study will typically not have information about children without parental consent; in this case, the study will have to generalize the results based on those with consent to the entire class. However, if some aggregate, class-level data are available (such as demographic composition or scores from the NRS assessments), this information can be used to adjust classroom-based estimates for nonresponse by weighting the responders to reflect the true aggregate classroom-level composition.

    Teacher Self-Administered Questionnaire. Teachers’ attitudes about teaching mathematics to young children and their level of understanding about four-year-old children’s of learning capabilities can influence how well they implements the curriculum. We recommend giving teachers a short (no more than 15-minute) self-administered questionnaire during the fall, while the child assessments are in progress. This questionnaire could be adapted from the PCER teacher interview and could include questions about the teacher’s background, attitudes and beliefs about child learning, mathematics activities conducted in the classroom, and the frequency of teaching specific mathematics activities. During the spring data collection, the teacher questionnaire will omit the teacher background questions (unless the teacher was new) but will include a short behavior problems scale and a short approaches toward learning scale for each child in the research sample. The behavior problems scale will address whether the increasing academic demands in preschool have increased children’s behavioral problems. The approaches toward learning scale will help identify positive factors such as curiosity and attentiveness that are associated with academic success.

    Classroom Observation. Evaluating the time that teachers spend on different mathematics skills, their approaches to these learning experiences, and the proportion of time spent in individual and group activities will help researchers to understand the pattern of child outcomes. Observational instruments that are available to measure mathematics activities in the classroom have been developed primarily as fidelity measures for specific curricula. We therefore recommend using the fidelity measure developed for the mathematics curriculum to measure the content, duration, and intensity of the curriculum’s mathematics activities. In addition, more general measures of classroom quality can be used at this data collection point, in order to assess interim practice quality. We recommend that the classroom observations be conducted during the two weeks before the spring follow-up assessments begin.

    Child Assessment. To allow for a more in-depth evaluation of the mathematics domains stressed in an intervention curriculum, we recommend two assessments, the Woodcock-Johnson Applied Problems subtest and the Child Math Assessment—Abbreviated for PCER (Starkey et al. 2002). The Applied Problems subtest includes simple counting, addition, and subtraction operations and requires the child to decide not only the appropriate mathematical operations to use, but which data to include in the count or calculations. It takes five to eight minutes to administer. The Child Math Assessment—Abbreviated differs from the NRS and the Applied Problems Subtest in that it uses manipulatives (three-dimensional materials that children can touch and move around) to assess the child’s understanding of object counting, construction of equivalent sets, shape recognition, and pattern duplication. It takes 20 minutes to administer. To address the research question on displacement of language and literacy development, we recommend using the PPVT (from the NRS administration, if available) and the Woodcock-Johnson subtests such as Letter-Word Identification and Spelling (an early writing task which is related to small motor skills).

    The importance of obtaining a true baseline child assessment, ideally before the child has any experience in the Head Start classroom, must be balanced against both the need to control data collection costs by assessing children in the Head Start center and the two- to three-week period required to stabilize enrollment in Head Start classes. We recommend a field period that starts approximately two to three weeks after classes begin at the Head Start center, with a six-week window for data collection in the sites. Similarly, to assess what children have learned during the Head Start year, the follow-up assessment should be conducted as close to the end of the year as possible. We recommend that the follow-up data be collected during a six-week window that ends two weeks before the end of the year, and that data collection be matched to the timing of the fall assessment so that classes assessed early in the fall field period also are assessed early in the spring field period.

    Costs of the Enhancement. Program administrators care not only about the effectiveness of the quality enhancements, but also about the costs of these enhancements over and above current expenditures. To support analyses of the cost of the enhancement relative to its impacts or benefits, researchers will collect information on the costs of implementing and carrying out the mathematics curriculum relative to the cost of the program without that curriculum. Researchers should measure two types of costs: (1) the upfront, one-time costs of beginning implementation of the curriculum, and (2) the ongoing additional costs resulting from the enhancement. Among the first set of costs are the costs associated with initial teacher training, including the curriculum developer’s and training staff’s time, the cost of any substitute teachers hired to cover classrooms while the teachers attend training, and the cost of additional staff days for teachers paid for the days they attend training. The first set of costs also includes costs associated with the curriculum training staff’s technical assistance visits and any costs associated with teachers attending extra training sessions outside their normal classroom duties (for example, periodic group discussions, if any). Time spent by teachers and trainers would be valued at their hourly wage, including fringe benefits. If this training actually supplanted the usual teacher training sessions conducted during the year, those routine training costs should be subtracted. Usually, however, new curriculum training will be an add-on expense rather than a substitute. If space for training is rented, the cost of the rental will be included as well. Materials, including documents, videos, and other training materials, should be valued at their cost.

    For the mathematics curriculum, the ongoing additional costs resulting from the enhancement would include the costs of materials, any computer-related costs, the cost of ongoing technical assistance, and costs to train new teachers. Teachers will have to maintain a ready supply of materials used to demonstrate mathematics concepts in the classroom, as well as materials for children to play with in a specially-designated area of the classroom at other times. Because broken or lost materials must be replaced, the cost of maintaining the supply of mathematics materials during the program year is a cost of the curriculum. Similarly, if a computer-based software program is part of the mathematics curriculum, the cost of setting up and maintaining the personal computers in the classroom with appropriate software is an additional cost. The cost of refresher training and ongoing technical assistance to teachers during a normal program year to ensure that the curriculum continues to be implemented to high fidelity is an ongoing cost as well. For example, technical assistance staff might make two or three visits to each classroom during the program year to observe, measure fidelity, and meet with the teachers and Education Coordinator to discuss classroom practices and respond to questions. Finally, overall teacher turnover in Head Start is approximately 15 percent per year (National Institute for Early Education Research 2003). Accordingly, an ongoing cost of the mathematics curriculum is the cost of training and technical assistance to 15 percent of the teachers in the enhancement group each year. Assuming that the evaluation includes 50 to 100 classrooms in the enhancement group, 7 to 15 teachers would have to be fully trained each year.

    Three different approaches to estimating these costs of implementing the early mathematics curriculum could be used if centers are randomly assigned to either the enhancement or control group:

    • Identify the costs associated with the new curriculum (described above) and ask center directors for this information.

    • Obtain the full budget for the enhancement center for the year prior to curriculum implementation and for the current year and estimate the cost of the curriculum as the difference in budgets (adjusting for normal cost inflation from one year to the next).

    • Obtain the full budget for the enhancement centers and the control-group centers and estimate the cost of the curriculum as the difference between enhancement and control-group center costs.

    The first approach is the least burdensome for the enhancement centers (and eliminates burden for the control centers) but could miss costs associated with the curriculum. The second approach is more burdensome for the enhancement centers than the first, and does not require a response from the control centers, but if anything other than the implementation of the curriculum changes from one year to the next, the estimate of the cost of the enhancement will be inaccurate. The third approach is the most burdensome, involving extensive collection of cost information from both enhancement and control centers, but would provide the most accurate estimate of the costs of implementing the enhancement.

    If classrooms within centers are randomly assigned to the enhancement or control groups, the first or the second approach to estimating the costs of the curriculum would have to be used. These approaches would provide estimates of the cost of implementing the curriculum in just a few of the classrooms, and would have to be adjusted using reasonable assumptions regarding fixed costs (costs that would remain the same if one or more classrooms in the center implemented the curriculum) and variable costs (costs that would increase if one more classroom implemented the curriculum).

    Information about these costs can be obtained from semistructured interviews with program and center directors, from the curriculum developer and training/technical assistance supervisory staff, and from program records. All cost information must be obtained in dollars; however, to ensure that the dollar values collected at various time points represent the same value, the dollar values collected from informants and records should be either inflated or deflated, using the Consumer Price Index, to represent dollar values in a single target year (for example, the analysis and reporting year).

    Finally, cost information is sometimes obtained using two perspectives. First, the actual dollar costs of the curriculum must be obtained. Second, a measure of cost to society would place a value on any volunteer labor or donated space and materials and would add those costs to the actual expended costs. Both cost perspectives would be used in the analysis of benefits and costs or cost-effectiveness of the curriculum.

  9. ANALYSIS AND REPORTS

  10. Reports based on the evaluation should report the estimated impacts of the mathematics curriculum on teachers’ practices, classroom activities, and children’s outcomes. They also should describe the implementation experience and should discuss how the curriculum could be implemented on a broad scale, using the Head Start training and technical assistance system for support. Reports should be written for a broad audience, with stand-alone summaries that can be understood by program staff and policymakers, and more-detailed reports that summarize the research design, sample characteristics, analytic approaches, and findings in a clear and accessible way. In this section, we discuss the approach to estimating impacts and conducting the cost-effectiveness analysis.

    1. Estimating Impacts of the Mathematics Curriculum

    2. Using a random assignment evaluation design means that fairly simple estimation methods can be used to determine the impacts of the quality enhancements at a point in time. Under random assignment of classrooms, center-level impacts are calculated as the simple differences in the mean value of outcomes for children in the enhancement classrooms and the control classrooms, and then the center-level impacts are averaged for the overall impact estimate.12 This provides an unbiased estimate of the impact of the enhancement compared to the status quo. Under random assignment of centers, the center-level mean outcomes are estimated, then, separately for the enhancement and control group, the center mean outcomes are averaged over all centers in that group. The simple difference in the means between enhancement and control centers is the impact estimate.

      More-precise estimates can be obtained by estimating regression models. Regression procedures can improve the precision of the estimates and adjust for any residual differences in the observable characteristics of program and control group members due to random sampling and interview nonresponse. Regression models take the following form:

      (1) Y = α + XΒ + γT + ε ,

      where Y is an outcome variable; X is a vector of explanatory variables; T is an indicator that equals 1 for members of the enhancement group and 0 for members of the control group; α, β, and γ are parameters to be estimated; and ε is a random-error term. The estimate of the parameter γ is the estimated impact of the quality enhancement compared with regular Head Start services.

      Because random assignment will have been conducted at the classroom or center level (depending on the design), the regression adjustment takes the form of a hierarchical linear model of child development consisting of two nested levels. By specifying the model at each level, we can conduct analyses for the appropriate units of analysis and can conduct statistical hypothesis tests that correctly account for the clustering of children within classrooms. While not strictly necessary for conducting impacts because the evaluation is based on a random assignment design, adjusting the impacts with an HLM model can help increase the precision of the estimates.

      For the design involving random assignment of centers, the analytic model is the following, where the variables are indexed by child (i) and centers (j):

      Child-level model:

      (2) Yij (t) = αYij(0) + ΒXij + Cj + εij + ε1ij.

      Center-level model

      (3) Cj = ηTj + δZj + ε2j.

      For the design involving random assignment of classrooms, the analytic model is the following, where the variables are indexed by child (i) and classrooms (k):

      Child-level model

      (4) Yik (t) = αYik(0) + ΒXik + lk + ε3ik.

      Classroom-level model

      (5) lk = λTk + ΨWk + ε4k.

      where Y(t) is the outcome at follow-up period t; X is a set of child characteristics, such as gender; T is a variable indicating whether the child was in an enhancement classroom or center; Z is a set of center-level variables, such as whether classes are full-day; W is a set of classroom-level variables, such as the teacher’s education level; and ε1, ε2, ε3, and ε4 are disturbance terms assumed to have a mean of zero and to be uncorrelated with each other. Parameters to be estimated include α and Β, vectors of coefficients on the child baseline characteristics; c, the center effect; Y or λ, the effect of the mathematics curriculum in the two models; δ , the coefficients on the center variables; and Ψ|, the coefficients on the classroom variables.

      The statistical techniques used to estimate regression-adjusted impacts in equations (2) and (4) will depend on the form of the outcome, Y. If the dependent variable is continuous (such as the score on the Children’s Math Assessment), ordinary least squares methods produce unbiased estimates of the parameter Y or λ. However, if the dependent variable is binary (such as whether the child’s score on the Woodcock-Johnson Applied Problems subtest was one or more standard deviations below the norm), logit or probit maximum-likelihood methods will be used to obtain consistent parameter estimates.

      The estimation models presented here assume that all centers or classrooms are weighted equally. Thus, in the case of center-based random assignment, the average outcome measure for each treatment center is averaged with those of all other treatment centers to obtain the mean outcome for all treatment centers. Larger centers thus do not receive greater weight in influencing the overall mean score for treatment (or control) centers. If the enhancement were implemented in a few centers with six or eight classrooms and several others with two or three classrooms, we would not want the results in the larger centers to overwhelm the results for all treatment centers. Averaging results across all centers regardless of center size addresses the question, “Does the enhancement work in the average center?” This approach is appropriate because the purpose of the evaluation at Stage 2 is to measure how well the enhancement works in a collection of centers overall. The same is true if random assignment takes place at the classroom level. The Stage 2 evaluation will address the question of whether the enhancement works in the average classroom, without allowing larger classrooms to have a greater influence on the results than smaller classrooms.

    3. Subgroup Analyses

    4. Because the effectiveness of the mathematics curriculum may differ by program setting or by characteristics of the children served, it would be useful to determine the groups for which the curriculum is most effective. This type of subgroup analysis would then enable individual programs to decide which enhancements might be useful to them. For example, analysis may demonstrate that the mathematics curriculum is effective only in programs that offer full-day services or that its impacts are stronger when teachers have higher educational credentials at the outset of training.

      The subgroups of interest will depend on the quality enhancement to be tested. Examples of center- or classroom-level characteristics that can define subgroups include:

      • Full-day or part-day program
      • High-fidelity implementation or incomplete implementation
      • Teachers’ qualifications
      • Center or program size

      Examples of categories of child and family characteristics (measured prior to the experience of the quality enhancement) for subgroup analysis include:

      • Child’s gender
      • Child’s English proficiency
      • Mother’s education level
      • Level of family income
      • Parents’ employment status

      Subgroup estimates can be obtained using the same procedures described above for calculating overall impacts,13 but these calculations are made for particular subgroups. Regression-adjusted subgroup estimates can be obtained by introducing an interaction term that is the product of the treatment indicator and an indicator of membership in the subgroup of interest. This term is entered into the appropriate model shown above for the level of random assignment and for whether the subgroup is defined at the child level or at the classroom or center level.

      Unless the overall sample is very large, however, it will be possible to detect impacts only for large subgroups of the population, as subgroup estimates, which are based on only part of the full sample, are less precise than are full-sample estimates. For example, in the center random assignment design with minimal baseline data, our sample of 1,233 children (in 74 centers) is sufficient for detecting impacts with effect sizes of .20 (one-fifth of a standard deviation on the outcome measures) or more. However, for a subgroup that includes 50 percent of the children across all the centers (for example, children whose mothers have less than a high school diploma), impacts with effect sizes of .25 or more can be detected. For a subgroup that includes all of the children in half the centers (for example, full-day programs), impacts with effect sizes of .29 or more can be detected.

    5. Cost-Effectiveness Analysis

    6. A cost-effectiveness framework should be used to evaluate the costs and benefits of the mathematics curriculum. This type of analysis does not attempt to place a dollar value on impacts. Instead, impacts are measured in a common unit, such as an effect size (the impact divided by the standard error of the outcome measure). The impact in effect-size units is compared with costs measured in dollars. For each quality enhancement, an effect size per dollar spent on the enhancement can be calculated. For example, if the mathematics curriculum were to produce an impact on mathematical skills of 0.3 in effect-size units, and if the cost was estimated to be $10 per child, then the cost-effectiveness of the curriculum would be 0.03 per dollar. Measuring cost-effectiveness in this way enables program administrators to compare the cost of producing impacts they consider important using the same metrics, so that enhancements can be assessed in terms of their ability to provide the most “bang for the buck.”14

      The enhancement is likely to have impacts that vary in size across the outcomes measured. Therefore, the estimate of cost-effectiveness will depend on the outcome used to measure it. Researchers can report the range of cost-effectiveness estimates using the impacts on outcome measures considered to be most important. For example, the outcomes most important for a mathematics curriculum are children’s developing mathematical skills. The cost-effectiveness of several enhancements designed to influence children’s early mathematics skills could be compared using a common outcome measure of those skills.




1 Six other PCER sites are evaluating general curricula that researchers expect could influence mathematics skills in addition to language, literacy, and social-emotional skills. We do not discuss them in this chapter because they do not focus primarily on mathematics.(back to footnote 1)

2 The effect sizes shown here represent the minimum impact as a percentage of the standard deviation of the outcome measure that can be detected at the site level. (back to footnote 2)

3 The two integrated curricula most commonly used in Head Start, High/Scope and Creative Curriculum, have been revised to include greater concentration on early mathematics topics and approaches. However, if Head Start programs have not implemented more recent versions of the curricula or if they did not receive recent technical assistance and training on the curriculum, the mathematics components of their programs likely will be weak. (back to footnote 3)

4 If impact estimates were expected to be externally valid (representative of all Head Start centers), then sample sizes would have to be larger; see Appendix B for details. (back to footnote 4)

5 If teachers in the enhancement group discuss the enhancement strategies with control-group teachers, the control group could implement a version of the enhancement, and the evaluation would not provide a valid test of the impacts of the enhancement. Impact estimates would be biased downward, making it more difficult to conclude that the enhancement was effective. To avoid such contamination of the control group, we recommend implementing enhancements at the classroom level only if the potential for spillover from enhancement to control-group classrooms is small. (back to footnote 5)

6 We discuss implementation further in Section C. (back to footnote 6)

7 Randomly assigning new teachers avoids the chance that new teachers will be recruited with the specific needs of the enhancement in mind, a practice that would bias the results of the study. (back to footnote 7)

8 The plan for selecting children after random assignment of centers or classrooms is discussed in Section D. (back to footnote 8)

9 We assume that the evaluation will include 16 or 17 children per center in the initial sample, with a 90 percent response rate to the spring follow-up data collection. (back to footnote 9)

10 Currently, NRS assessment scores are available only to individual programs aggregated to the program level. Making the scores available for evaluation in the way that we describe would require approval from the Administration for Children and Families and consent from the families. (back to footnote 10)

11 Content validity indicates that the set of items comprising a measure is a good representation of the construct being measured. Concurrent validity refers to sufficiently large correlations between the measure and another measure of the same construct (measured contemporaneously in the same sample). Predictive validity refers to a sufficiently large correlation between the child outcome measure and a subsequently measured construct that is theoretically associated with the child outcome. (back to footnote 11)

12 Center-level impacts are averaged so that larger centers do not dominate the impact estimates. (back to footnote 12)

13 For classroom-level random assignment, calculate mean impacts first at the center level, and then average across centers. For center-level random assignment, calculate center-level mean outcomes and average across centers in the enhancement and control groups. (back to footnote 13)

14 In contrast, a cost-benefit analysis requires researchers to convert impacts into “benefits” that are valued in dollars. This task is complicated by the fact that the evidence linking differences in child assessment scores with future employment and earnings, involvement with the criminal justice system, and use of public assistance programs is quite tenuous. Accordingly, we do not recommend conducting a cost-benefit analysis based on outcome data collected from one year in Head Start. (back to footnote 14)

 

Table of Contents | Previous | Next