Skip Navigation
Administration for Children and Families  
ACF
ACF Home   |   Services   |   Working with ACF   |   Policy/Planning   |   About ACF   |   ACF News   |   HHS Home

  Questions?  |  Privacy  |  Site Index  |  Contact Us  |  Download Reader™  |  Print      

Office of Planning, Research & Evaluation (OPRE) skip to primary page content
Advanced
Search

Table of Contents | Previous | Next

CHAPTER V: SMALL-SCALE EVALUATION: APPROACHES TO SUPPORTING CHILDREN'S SOCIAL-EMOTIONAL WELL-BEING

Children’s development proceeds along multiple dimensions simultaneously, with gains in one domain supporting gains in another. For example, children’s ability to persist in a task and to manage frustration helps them to master new cognitive tasks, such as learning the sounds of letters and sounding out new words. Similarly, children’s growing language skills enable them to dispel or avoid frustration by articulating their needs clearly and negotiating conflict, rather than by throwing objects or hitting others. Accordingly, young children’s social-emotional development is important in supporting their cognitive development, with the cognitive gains, in turn, supporting prosocial behavior (Raver 2002; Raver and Knitzer 2002; Shonkoff and Phillips 2001).

Key aspects of children’s social-emotional development (for example, enthusiasm and curiosity about new activities) are recognized by kindergarten teachers as critical elements of children’s readiness for school (Heaviside and Farris 1993). Head Start teachers also have noted that even one or two disruptive childrenmake it difficult to create a learning environment for the class as a whole. Research also supports the notion that children’s emotional health is positively correlated with the children’s early school success (Raver 2002). Despite the current emphasis on preparing Head Start children for kindergarten through early language and literacy activities, many teachers are concerned that significant behavioral issues must be addressed first.

The Head Start community, consistent with the program’s comprehensive approach to enhancing children’s development, has taken important steps toward addressing these issues in the classroom. As one of the national consultants that comprise part of the Head Start Training and Technical Assistance System, the Center for the Social and Emotional Foundations of Children’s Learning has developed web-accessible guides for creating classroom environments that promote positive behavior, minimize disruptive behavior, and address other behavioral issues. All Head Start programs are required to arrange for the provision of screening and treatment services from mental health professionals in their communities, but some have contracted for more-intensive, on-site, ongoing services. One of the Quality Research Center (QRC) Consortium research teams is evaluating a curriculum that promotes positive social-emotional behavior as well as language and literacy through the reading and discussion of children’s books describing situations calling for impulse control, empathy, or anger management (Administration for Children and Families 2004; Kupersmidt and Bryant 2001). A comprehensive intervention targeting both teachers’ and parents’ skills in managing children’s behavior and children’s social and behavioral skills has been implemented in many preschool and Head Start classrooms; evaluations of the intervention have found improvements in children’s behavior and in the classroom climate (Webster-Stratton 1998; Webster-Stratton et al. 2001).

This chapter describes approaches to evaluating enhancements designed to support preschool children’s social-emotional development, and to minimize disruptive behavior in the classroom, using Stage 2 evaluation designs. As discussed in the previous chapter, in contrast to nationally representative Stage 3 designs, these designs are called “small-scale” designs; in general, however, the sample sizes for these evaluations will be larger than those used in either the Preschool Curriculum Evaluation Research (PCER) project or the QRC Consortium projects. We focus on an approach to supporting children’s social-emotional development that includes two elements: (1) teacher training to encourage positive child behavior, and to manage the classroom to minimize disruptions; and (2) an intervention with individual children (and their parents) who exhibit conduct problems (including high levels of aggression, defiance, and oppositional and impulsive behaviors). This chapter goes beyond the discussion in Chapter IV because the evaluation design must include intermediate measures of the intervention’s teacher/classroom and parent components, and must measure changes in the well-being of the specific children receiving intensive services and of children who are members of the class more generally.

The approaches we describe to Stage 2 evaluations in this chapter and the previous one can be easily extended to evaluate other quality enhancements that can be implemented at the center or classroom level. For example, initiatives to improve children’s health and fitness could be evaluated using these approaches. The steps in planning the evaluation are laid out in these chapters; the specific decisions regarding how to implement the enhancement; how to ensure fidelity; and how to measure outcomes will be specific to the enhancements. Those decisions, in turn, should grow out of prior work to develop the enhancement idea and pilot test implementation procedures.

  1. ALTERNATIVE APPROACHES

  2. A diverse set of strategies for supporting children’s social-emotional well-being have been developed. The strategies can be classified in part by the breadth of their focus on children: Universal approaches involve all children in the classroom, while individually focused interventions target particular children exhibiting disruptive behavior or behavior indicative of withdrawal. The strategies also can be classified by the adults involved: Some interventions are implemented at the level of the teacher and classroom and might not involve parents to a significant degree; others involve both parents and teachers to improve the consistency and supportiveness of the home and classroom environments; still other interventions involve a mental health specialist working intensively with a child, the parent, and the teacher. The following examples illustrate the diversity of approaches that have been developed:

    • First Step to Success (Walker et al. 1998). First Step to Success is both a universal and an individual-child-focused intervention, and while mainly emphasizing the classroom, includes the parents of individual children who are targeted. The enhancement consists of training and technical assistance for teachers provided by a skilled clinician. Teacher training includes classroom management, the teaching of social skills to children, and positive and proactive discipline. Students requiring more-targeted intervention are provided a two-month collaborative home and school intervention program delivered by the clinician.

    • Preschool Behavior Project (Kupersmidt and Bryant 2001). The Preschool Behavior Project is a universal intervention that consists of training teachers to establish proactive teaching and behavior management practices in the classroom; to build strong relationships with children; and to teach children to solve social problems constructively through dialogic reading of selected books with themes tied to the Second Step program, which addresses empathy, anger management, and problem solving (Grossman et al. 1997; Whitehurst et al. 1994). Researchers have implemented two versions of the program: one involving highly trained, supervised clinical consultants working with teachers and parents, and a second involving teacher training by less-specialized technical assistance staff from within the program.

    • The Incredible Years: Parents, Teachers, and Children Training Series (Webster-Stratton 1998; Webster-Stratton et al. 2001). This program targets parents, teachers, and children. The program is directed toward individual children with conduct problems (by training parents and working with children) as well as more universally, by training teachers who implement a child training curriculum universally in the classroom. Most evaluations have been based on parent- or family-level random assignment and have focused on the parent and child components. An evaluation of the teacher-training component within Head Start classrooms was combined with the parent-training component, so that the independent contribution of the teacher-training component could not be measured. The evaluations found favorable impacts on the behavior of parents, teachers, and children.

    • Partnership-Directed, School-Based Approach to Child Physical Abuse and Neglect (Fantuzzo et al. 1996). The “Play Buddy” intervention focuses on socially withdrawn victims of child physical abuse or neglect. “Play buddies,” who are other children in the classroom with exceptionally positive peer interaction skills, are coached by the teacher and parents to initiate and engage in positive interactions with socially withdrawn peers.

    • Social-Skills Curricula for Head Start (Conduct Problems Prevention Research Group 1999; Domitrovich and Greenberg 2001; Frey et al. 2000; Serna et al. 2000). While still in the planning and development stages, several classroom-wide curricula are being developed to promote positive social behavior, encourage impulse control, and improve children’s ability to communicate about their feelings.

    • Starting Early, Starting Smart (Springer et al. 2003; Karoly et al. 2001). This intervention integrates behavioral health services in either primary health care settings or early childhood settings to promote access to family/parenting education, mental health services; and substance abuse services; improve parenting skills and family well-being, and strengthen child development. In early childhood settings, the initiative typically involves placing a mental health consultant in the settings, although the primary strategy involves intensive work with families. The initiative is being evaluated in 12 sites based on random assignment of families to an intervention and a control group.

    For a few of the strategies described here, rigorous designs have been used to conduct the evaluations; for most of them, however, evaluations using rigorous designs and sufficient sample to detect moderate levels of impact have not been completed. Some of the enhancements have only recently been adapted for and implemented in Head Start programs. Others have rigorous evaluations in progress, but the evaluation results and critical features of the sampling plan and research design are not yet available. Most of the interventions described above could benefit from further Stage 1 development activities.

    Nevertheless, we discuss evaluation designs for two examples—the First Step to Success and the Incredible Years—because they introduce a design issue not present in the curriculum example discussed in Chapter IV. This design issue is the dual focus of these enhancements on classroom-management strategies (which can be evaluated using an approach similar to that discussed in Chapter IV) and on intervention with individual children and their parents (which requires a design that identifies and follows individual children and their parents). The other enhancements that are directed toward all children in the classroom can be evaluated using an approach similar to that discussed in Chapter IV, while those with an individual-child focus can include an evaluation component that examines a subset of children likely to receive those services, as we discuss in this chapter.

  3. STUDY DESIGN

  4. The fundamental issues to be addressed by an evaluation of a social-emotional behavioral intervention are whether the classroom environment is managed so that behavioral issues are minimized; whether children’s behavior is less disruptive and more socially competent; and whether children in the classroom make more progress in language development, early literacy skills, and early mathematics relative to children in classrooms that are not implementing the interventions. For the intervention focusing on individual children receiving intensive services and their families, the research questions include addressing whether parenting practices and the home environment are more positive and minimize negative behaviors, and whether the behavior and academic progress of the children identified for the intervention are improved. This section discusses key elements of research designs to address those questions. We begin by discussing the quality enhancement and its counterfactual, the research questions, sampling strategies, random assignment plans, and sample sizes.

    1. The Quality Enhancement and Its Contrast

    2. Implementing positive classroom management strategies and providing special services for children from chaotic homes who are acting out in class are not new ideas for Head Start. Teachers with an early childhood education background are likely to have heard about many of the strategies for positive management of children’s behavior. In addition, the requirement that Head Start programs forge links with community-based health services may lead to cooperative agreements that provide staff with access to mental health professionals who can assist children needing special services. In the 2002-2003 program year, eight percent of Head Start families accessed mental health services, six percent received child abuse and neglect services, four percent received domestic violence services, and four percent received services to prevent or treat substance abuse (Hart and Schumacher 2004). Two percent of Head Start children were referred for mental health services in the same program year. Many programs have access to mental health professionals on site; these professionals can provide consultation to Head Start staff about individual children and can meet with the staff, children, and parents about specific issues.

      However, despite the access to mental health professionals for consultation, and services to address mental health and family crisis needs provided to many Head Start families and children, many Head Start teachers still cite children’s behavior as a significant issue that takes time away from language, early literacy, and other activities that are expected to occur in the classroom (Paulsell et al. 2004). Several reasons may explain why staff continue to perceive a strong need for behavior management strategies. First, although most teachers may be aware of some of the principles of positive management of children’s behavior, they may also have difficulty implementing these principles in the classroom without “hands-on” assistance. Moreover, a new classroom of children can present one or more new behavioral issues, ranging from hyperactivity, to oppositional defiant disorder, to aggression, and to withdrawal, which cannot be resolved quickly. Teachers must have strategies for carrying on classroom activities over time while individual behavioral issues are being addressed. Finally, some programs do not have access to mental health professionals or services, and the ones that do have access might not receive as much help as they feel they need to address behavioral issues in their classrooms. Thus, the counterfactual for an evaluation of an enhancement addressing children’s social-emotional well-being would probably be highly variable classroom management practices and a smaller proportion of individual children receiving more-intensive services to address more-serious behavioral issues.

      The enhancements proposed for Stage 2 evaluation have two components that are basic to both enhancements: (1) management of the classroom to minimize opportunities for disruptive behavior, and (2) strategies for addressing children’s disruptive behavior when it occurs. To implement the enhancements, teachers must be trained and given technical assistance so that the principles are translated into the teachers’ arrangement of the classroom, scheduling of activities, and behavior in the classroom. Because the extension of these enhancements to also provide specialized, intensive services to individual children who exhibit more-serious problems adds complexity to the evaluation design, we will discuss this extension as another optional design. The enhancements and counterfactuals to be discussed include the following:

      • The first design option will compare the basic enhancement (teacher training and classroom management) with regular Head Start services. As we discuss below, centers participating in the evaluation will be randomly assigned to implement the enhancement or not, and the analysis will compare outcomes based on a sample of children in the centers.

      • The second design option will compare the basic classroom enhancement plus individual services for children needing specialized social-emotional support (1) with the basic classroom enhancement, and (2) with regular Head Start services. Centers participating in the evaluation are randomly assigned to implement the basic classroom enhancement, the classroom enhancement plus individual services, or basic Head Start services. Children are then randomly assigned to the different types of centers, so that those with the greatest behavioral needs are not placed predominantly in the enhancement centers. The research will focus on a random sample of children in the classroom. This design will provide an estimate of the impact of the full package of services compared with regular Head Start services, the impact of the basic classroom enhancement compared with regular Head Start services, and the “value added” by the individual-child services piece.

      • As a third design option, children likely to be referred for intensive services could serve as the specific focus of additional sampling. Using a definition of children with very high levels of behavior problems (based on maternal report using a behavior problems scale at baseline), approximately 10 to 20 percent of the children in each center could be included in the sample. This group would overlap somewhat with a sample of children chosen at random from the center as a whole. By measuring outcomes for this subgroup of children, researchers would be able to measure the impact of the two enhancement options compared with regular Head Start services on children with higher levels of behavioral problems at enrollment, a group that is likely to include most of the children referred for intensive services.

      To clarify subsequent discussion of the three plans in this chapter, key features of the three designs are summarized in Table V.1. We discuss the third design as a stand-alone option in order to focus on its unique details; however, because policymakers interested in these questions also likely would be interested in the more general questions of the impacts on all children in the classroom, the samples described in Options 2 and 3 would be combined. The remainder of this chapter discusses each design feature in more detail and provides the rationales for the decisions summarized in the table.

      Table V.1 Essential Features of the Three Optional Research Designs for Evaluating Enhancements to Support Children's Social Emotional Well-Being
        Option 1 Option 2 Option 3
      Intervention Teacher training to improve classroom management and teacher-child interactions to promote positive child behavior and minimize disruptive behavior Teacher training to improve classroom management and teacher-child interactions to promote positive child behavior and minimize disruptive behavior

      Child-level intensive intervention to address significant behavioral issues
      Teacher training to improve classroom management and teacher-child interactions to promote positive child behavior and minimize disruptive behavior

      Child-level intensive intervention to address significant behavioral issues
      Random Assignment Centers
      One enhancement
      One control
      Centers
      One enhancement
      One control
      Centers
      One enhancement
      One control
      Sampling 15 children per center 15 children per center 5 or 10 children per center identified by baseline parent report
      Data Collection Consent form with demographic information
      Direct child assessment
      Teacher self-administered questionnaire
      Classroom Observation
      Consent form with demographic information
      Direct child assessment
      Teacher self-administered questionnaire
      Classroom Observation

      Consent form with demographic information
      Direct child assessment
      Teacher self-administered questionnaire
      Classroom Observation

      Parent Report at baseline
      Parent Interviews
      Child Observations

      SAQ = self-administered questionnaire.

    3. Research Questions

    4. The first set of research questions about the basic classroom-level enhancement (without the intensive intervention for specific children) focuses on direct impacts of the enhancements on classroom activities and child outcomes. The first evaluation option would address the following questions:

      • Do the classroom management and teacher-child interaction strategies reduce classroom disruptions and reduce time spent correcting children’s behavior?

      • Do the classroom management and teacher-child interaction strategies increase the amount of classroom time spent on language development, early literacy, and early mathematics activities?

      • Do the classroom management and teacher-child interaction strategies improve children’s cooperative and prosocial behaviors and reduce behavioral problems?

      • Do the classroom management and teacher-child interaction strategies lead to improved language, early literacy, and early mathematics achievement for children in the classroom?

      An evaluation of the classroom-level enhancement with the intensive intervention for specific children (the second evaluation option) would address the following questions:

      • Do the classroom strategies with individual-child intervention reduce classroom disruptions and reduce time spent correcting children’s behavior relative to regular Head Start services? Compared with the classroom strategy alone, does this more intensive option reduce classroom disruptions and reduce time spent correcting children’s behavior?

      • Do the classroom strategies with individual-child intervention increase the amount of classroom time spent on language development, early literacy, and early mathematics activities relative to regular Head Start services? What is the value added of the more intensive approach compared with the classroom-level approach by itself?

      • Do the classroom strategies with individual-child intervention improve children’s cooperative and prosocial behaviors and reduce behavioral problems relative to regular Head Start services? What is the value added of the more intensive approach compared with the classroom-level approach by itself?

      • Do the classroom strategies with individual-child intervention lead to improved language, early literacy, and early mathematics achievement for children in the classroom relative to regular Head Start services? What is the value added of the more intensive approach compared with the classroom-level approach by itself?

      By Stage 2, we expect that implementation is likely to be successful because enhancement developers will have honed their techniques for implementing the intervention in Head Start settings. Nevertheless, understanding the impacts of the enhancement on children’s development and on classroom activities hinges on whether or not the training and technical assistance process led teachers to implement the enhancement with high fidelity. The larger number and diversity of centers and classrooms involved in the Stage 2 study might lead to new challenges, either for the “fit” between the enhancement and the program or for the training and technical assistance procedures. The evaluation should therefore address the following questions to examine whether the enhancement was implemented to high fidelity, what steps were taken to reach that point, and what was learned from the process:

      • Were teachers able to implement the classroom management techniques and teacher-child interaction strategies fully and with a high degree of fidelity? For the second, more intensive child-focused option, were mental health professionals able to respond to teachers’ requests for assistance with particular children, and did these children receive a level of services to meet their social-emotional needs?

      • What strategies were used to implement the enhancement? How much training was provided, and over what time period? What amount and types of technical assistance were provided? How much time did the mental health professional spend on site, and how many children and families could be served?

      • What challenges were encountered, either for the “fit” between the classroom management techniques and teacher-child interaction strategies and the program or for training and technical assistance procedures, and how were they resolved? What challenges were encountered in providing individual-child services?

      Finally, although the evaluation will examine whether the behavioral enhancement is effective overall, the Head Start community also will be interested in whether it is effective across different subgroups of children and families, and across different program designs. To examine effectiveness in this way, the following questions would be addressed:

      • What were the impacts of the classroom behavior enhancement on key subgroups of children; for example, children with lower verbal skills at baseline? What were the impacts for Head Start programs with differing characteristics; for example, programs in which teachers have higher levels of education?

      Some caution should be exercised in conducting subgroup analyses in Stage 2; if many subgroups are examined, some impacts will emerge simply by chance. Moreover, because the samples of grantees and centers are not representative of the broader Head Start population, the subgroups are similarly not representative. Drawing conclusions about how well the curriculum works in specific subgroups would require that the sample frame be designed to include a representative sample from those subgroups. However, this approach would not be consistent with the Stage 2 design, which is to conduct an evaluation that yields internally consistent impact estimates for a group of program grantees and centers that volunteer to participate in the study. Impact estimates for subgroups in a Stage 2 design are valid for the subgroups attending the centers included in the sample. Obtaining a representative sample of specified subgroups would increase the sample size requirements of the study. Thus, serious investigation of subgroup impacts must wait until a Stage 3 evaluation, which will examine a sample that is representative of Head Start programs at the regional or national level.

      One subgroup of obvious interest is the children who are identified as needing intensive behavioral services. To address research questions about the impact of the intervention on children who needed intensive services during the year, we have included the third design option that focuses sampling on children identified at baseline as having high levels of externalizing behavioral problems. In the absence of such a strategy, measuring impacts for the subgroup of children in the sample who received intensive behavioral services during the Head Start year would be challenging; to do so, the researcher would have to identify the children in the control group who would have received intensive services had they been in the intensive enhancement group. Because the sample of children for Options 1 and 2 is randomly drawn from all children in the center, it will not necessarily include the children receiving intensive services, as those children are expected to comprise between 10 percent and 20 percent of the population of children in the center. Option 3 is designed to address questions about that subgroup.

    5. Major Activities and Timetable for the Evaluation

    6. Several activities must be performed to carry out the evaluation:

      • Draft study design and protocol for recruiting program grantees and centers and submit for Office of Management and Budget (OMB) review
      • Identify programs and centers to participate
      • Develop data collection instruments and prepare and submit review packages for the Institutional Review Board (IRB) and OMB
      • Conduct and monitor random assignment
      • Train teachers to implement the enhancement in randomly selected centers; provide technical assistance and additional services and supplies needed for implementation; monitor fidelity to implementation
      • Identify a sample of children who have been given consent to participate
      • Collect data from the enhancement and control groups
      • Analyze the data and report findings

      We recommend that the first four tasks occur during the first year of the evaluation, with implementation, sampling of children, and collection of data during the second year, and additional data collection, analysis, and reporting during the third year (see Figure V.1). Because the Head Start year typically runs from August or September through May or June, the timing of activities will proceed most smoothly if the evaluation activities begin in January or February. We discuss the steps in more detail in the rest of this chapter.

      The first year of evaluation activities will be dominated by OMB review and by recruitment of grantees and centers. OMB review of the study design and protocols for recruiting grantees and centers must be completed before researchers and curriculum developers can obtain information about prospective centers or negotiate agreements to participate. We have estimated two months to draft and submit a package to OMB that includes the study design and recruiting protocols, and six months for OMB review. During the OMB review period, researchers would be able to conduct activities that do not involve information-gathering from prospective centers. For example, data collection protocols could be developed and submitted for OMB review, and the study design and data collection plans could be submitted for IRB review. Data systems for tracking children and managing evaluation data could be developed. The Administration for Children and Families (ACF) could inform the Head Start community about the pending evaluation and encourage participation. After receiving OMB clearance, the curriculum developer and researchers would be able to contact interested grantees and centers, and to begin the recruiting process. We estimate that recruitment, executing agreements, and randomly assigning centers will require approximately four months.

      Figure V.1 Schedule of Activities for Small-Scale Evaluation of a Classroom Behavioral Enhancement Three-Year Study Beginning in January
      [D]

      The main activities in the second evaluation year (beginning in January or February) will consist of training teachers to implement the classroom management techniques and teacher-child interaction strategies to high fidelity, and conducting the implementation study. In the fall, the main tasks will include ensuring children are placed in centers without regard to behavioral issues, obtaining consent for children to participate in the study, sampling children, and collecting baseline data. Ideally, teacher training and implementation will occur in the spring so that, when the new Head Start year begins, children enter a classroom that is managed and organized according to the techniques for maximizing positive behavior and minimizing disruptive behavior. Third-year evaluation activities will include collecting follow-up data in the spring on classrooms and children, analyzing the data, and reporting the results.

    7. Sampling, Random Assignment, and Sample Sizes

    8. Stage 2 designs generally include grantees and centers that have agreed to participate in the study and were selected because of their location or interest rather than their ability to represent Head Start centers and grantees regionally or nationally. At this stage, the evaluation focuses mainly on whether the quality enhancement could be effective in a set of those grantees and centers. As we discuss in more detail in Appendix B,1 because the emphasis is on internal validity, or a rigorous test of the curriculum within the set of grantees and centers in the evaluation, the sample size requirements are lower than if grantees and centers were selected to represent all Head Start grantees and centers. For Stage 2 designs, a set of program grantees and centers interested in participating must be identified, random assignment completed, and classrooms and children selected to participate in data collection.

      Identification of Grantees and Centers. Various methods could be used to recruit grantees. Methods of recruiting a broad set of programs include working through the Head Start Bureau to communicate with all regional offices, sending faxes to all programs, and sending emails to program directors. However, Stage 2 evaluations ideally should be geographically focused to support careful implementation of the enhancement, so more-targeted methods probably would be more effective than would broadly disseminated flyers or similar methods. With support from the Head Start Bureau and selected regional offices, the enhancement developer and research team also could approach directors of Head Start programs within a geographic area and could ask colleagues in other areas who might help with implementation to contact program directors in their own areas. If a broader call for participation is made first, interested programs could be asked to help the enhancement developer to approach other programs in their areas. The initial contact materials should briefly explain the study’s focus (in this case, an approach to classroom management and addressing children’s behavioral problems) and should briefly summarize the benefits of participation. The contact materials also might indicate that all centers in the program will have an equal chance of participating in the study, but that only some of the ones in each program will be chosen to implement the enhancement. Those not chosen to implement the enhancement in the first year will be given priority to implement it, if desired, after the follow-up data have been collected. Programs will receive information about the study’s findings, and will be partners in the research study. We expect that several programs will want to learn more about the evaluation as a result of this recruiting effort.

      Ideally, program grantees and centers chosen for the study would have the following characteristics:

      • Centers are clustered in a small number of geographic locations to simplify implementation.

      • Centers are not already implementing one of the social-emotional enhancements described earlier in the chapter but are interested in doing so. For the intervention that includes intensive child-focused services, participating centers do not currently have access to intensive, on-site mental health services for children.

      • Grantees and centers are willing to cooperate with the study’s random assignment and data collection requirements.

      Recruitment of grantees and centers will be easier if their staff believe they will gain more from participation than they might lose. Accordingly, after the initial contact has been made, the benefits to sites must be explained in detail, and concerns about study burden must be discussed. Enhancement developers and research staff should visit program directors and other relevant administrative staff to discuss the enhancement, its expected benefits to children, what will be involved in implementing it, and the research aspects of the evaluation.

      Researchers will explain the program’s role in ensuring that the study yields useful information about the effectiveness of the classroom-wide intervention and the intensive child-focused intervention. For example, researchers will have to work with program staff to obtain information required to implement random assignment (for example, the number of centers and classes in each center, teachers’ names, class sizes, and children’s ages, as well as procedures followed for application to the program and placing children in centers). Program staff will have to maintain random assignment statuses (for example, by ensuring that teachers in the intervention group do not share information about the curriculum with control-group teachers). Researchers will have to monitor the integrity of random assignment over time, a key piece of information to demonstrate the reliability of the study.2 One important aspect of monitoring random assignment for this enhancement is to ensure that children with behavioral problems are not disproportionately assigned to centers offering the enhancement. Researchers also will have to work with program staff to schedule child assessments and classroom observations. These evaluation-related requirements are balanced by the chance to implement a classroom management and behavioral approach that could benefit children, and by the chance to work with the enhancement developer to ensure that the approach fits the needs and interests of Head Start classrooms, families, and children, and that it can be incorporated easily into Head Start program activities. Benefits to the control group are more challenging to identify, but an important benefit that could be offered for control-group participants in a Stage 2 evaluation is first priority to implement the enhancement after the children participating in the evaluation have finished their Head Start year.

      A program’s agreement to participate should include a memorandum of understanding that describes the benefits to the program, and that specifies the respective responsibilities of the enhancement developer, researchers, and program in this joint research undertaking. A similar agreement should be developed with each participating center. This level of detail ensures that misunderstandings that have the potential to threaten the success of the research study do not arise after it is too late to resolve them. A memorandum of understanding also offers a useful way of informing new staff about the study.

      Other Program, Center, and Child Sample Selection Criteria. Because group-randomized designs require relatively large sample sizes to detect impacts of moderate size (20 percent of the standard deviation of the outcome measure), it can be costly to expand the sample to provide sufficient power to detect similar levels of impact in subgroups. For this reason, the programs, centers, and children to be included in the sample should be defined carefully, with the most important research questions in mind when doing so.

      One sample-definition strategy is to include a broad set of programs with different characteristics. This approach would address the question of the enhancement’s effectiveness across the range of Head Start programs and families. In Stage 2 evaluations, however, the programs will not have been sampled randomly from all Head Start programs, so the sample will not truly reflect the diversity of programs. If the evaluation finds that the enhancement is not effective, it will be difficult to determine whether it could be effective in some program subgroups.

      A more useful strategy is to target programs with a narrower set of characteristics in common, so that the evaluation measures the effects of the enhancement in a more defined set of circumstances. Nevertheless, for the classroom management and child behavioral enhancement, diversity in some key areas could be useful. About half the children served by Head Start attend half-day programs; therefore, we recommend that the group of participating centers include an approximately even split between full-day and half-day programs. Children’s age is another characteristic that varies across (and within) Head Start classrooms. We recommend including both three- and four-year-old children in the evaluation, as the enhancement is designed for both age groups, and most behavioral measures can be used and compared across both groups.

      The first and second design options can be implemented using a random sample of three classrooms per center and eight children per classroom.3 This sample would address classroom-level research questions (for example, whether the average level of behavioral problems is lower; positive social behavior increases; and children are progressing further in language, early literacy, and early mathematics than in the absence of the enhancement). Although not all of the classrooms and children in the center would be part of the research sample, the enhancement would be implemented in every classroom in a center assigned to the enhancement group.

      For the third design option, which focuses on the well-being of the children who receive intensive behavioral services, researchers will have to identify (in both the enhancement and control groups) children who are likely to need intensive services to address behavioral issues (as this group is likely to be only about 10 percent of all children in the center). To do so, we recommend that researchers obtain parent reports about children’s behavior using a psychometrically strong scale measuring children’s behavioral problems. The parent report would be completed early in the Head Start year (as part of the consent process or at program application). The subgroup of children for the research sample in the third option will then be identified based on higher-than-average levels of parent-reported externalizing behavioral problems (including aggression and hyperactivity).

      We recommend using parent reports so that the group of children can be identified at baseline, before they have experienced different classroom environments. Using teacher reports instead could delay identification of children, as it might take teachers as much as two months to distinguish children who are acting out because they are unfamiliar with the Head Start classroom from children who need intensive services. Moreover, the enhancement classrooms and regular Head Start classrooms might lead the same child to behave differently, thus influencing the selection of this subgroup. Thus, although parent reports will not enable researchers to perfectly identify children who would be referred by teachers during the year for intensive behavioral services, the overlap should be substantial, and this strategy sidesteps the issues that might arise when asking teachers to select this group.

      Because teacher reports suggest that approximately 10 percent of Head Start children exhibit disruptive behavior, two or three children per class, or four to six children per center, would likely be identified for services (Kupersmidt et al. 2000). If referral services are available, teachers might approach a professional about more children than they would identify as having significant behavioral issues in the absence of services. Thus, our strategy for including children in this subsample should be somewhat more inclusive to ensure that nearly all children receiving intensive services are included. We recommend including in the sample five children per classroom (approximately 25 percent of all children in the class) who have the highest incidence of parent-reported externalizing behavioral problems. With a 90 percent response rate, we assume that four children per classroom will be available for follow-up assessment in the spring.

      The evaluation could combine the sampling strategies of Options 2 and 3 to include this sample of children at high risk for needing intensive behavioral services, as well as a random sample of all children in the center. The two samples are likely to have little overlap. Combining the sampling approaches would enable the evaluation to address both research questions focusing on children likely to need the intensive services and the broader questions of whether the behavioral interventions lead to a greater focus on language, literacy, and mathematics in the classroom, and whether corresponding gains for children occurred in those areas.

      Random Assignment. We recommend center-level random assignment for the classroom management and child behavioral interventions discussed in this chapter because the likelihood of spillover, or nonrandom allocation, of children to classrooms is too great if classrooms are randomly assigned. Children’s behavioral problems are among the greatest challenges for preschool teachers, so it is highly likely that teachers assigned to the enhancement group would give good advice to teachers in the control group who are struggling with difficult behavioral issues. Moreover, within a center, directors allocate children based on several factors, including gender balance, classroom schedule, and the fit between a child’s temperament and the teacher’s strengths. Directors who are aware of a child’s behavioral problems would be strongly tempted to place the child in one of the classrooms receiving enhanced behavioral services, which would threaten the validity of the study. We expect that strategic assignment of children is less likely to be an issue if centers were randomly assigned, as decisions about children’s placement in centers are generally based on such factors as center locations and schedules relative to the family’s residence and schedule.

      If centers are randomly assigned, the sample size requirements (and thus, evaluation costs) will be higher than if classrooms were randomly assigned, as randomly assigning larger units generally decreases the power of the design (see Appendix B). Moreover, we give up two other advantages of classroom-level random assignment if centers are randomly assigned instead. First, classrooms within one center are more likely to be similar in population served and classroom management strategies than are two centers. Random assignment at the classroom level also might offer a slight advantage over center-level random assignment at the recruiting stage. Center directors might prefer the 100 percent chance that at least some classrooms in their centers could implement the enhancement under classroom-level random assignment over the 50 percent chance that no classrooms would implement the enhancement under center-level random assignment. Furthermore, having both enhancement classes and control classes within their centers enables directors to gain a sense (although not rigorous evidence) of how the new approach seems to be working.

      Nevertheless, the potential for spillover and strategic assignment of children to classrooms outweighs the potential benefits of classroom-level random assignment. In addition, center-level random assignment has two other potential advantages. First, teachers in the enhancement group within a particular center would be able to discuss the new techniques without fear of revealing ideas to control-group teachers. These discussions could enhance the overall success of implementation. Moreover, program developers often view full-center implementation as more realistic than an implementation in which enhancement and control classrooms are in the same center because they have the ability to work with everyone in the center and staff can talk freely about their implementation experiences. Second, classroom configurations and teachers change from year to year—and even from May to August—as enrollments are finalized, available space in the preschool is negotiated, and teachers’ plans become settled. These changes would present fewer difficulties for the evaluation if an entire center were to either follow the enhancement or continue its usual practices than if individual teachers or classrooms were randomized to one group or the other.

      Random assignment should take place during the year before outcome data are collected, so that enhancement developers can fully implement the classroom management and teacher-child interaction strategies and can ensure fidelity.4 Under this schedule, if initial implementation begins in March or April of the program year, teachers will have time to learn and practice the new techniques before the next school year begins.

      If centers are to be randomly assigned, participating programs will have to give the researchers a list of participating centers, so that random assignment can begin. Within a single grantee, centers would be randomly assigned to the control or treatment group(s). For Option 1, an even number of centers is needed (half for the enhancement group and half for the control group). For Options 2 and 3, the number of centers should accommodate assignment of one-third of the centers to the control group, and one-third to each of the two enhancement groups.

      Selection of Classrooms and Children into the Research Sample After Random Assignment. Although the evaluation design may call for random assignment of centers, it will not be necessary to collect data from every classroom and every child in the center. Including three classrooms per center and eight children per classroom will provide a sample large enough to estimate average outcomes under random assignment for centers. Beyond that number, additional children add data collection costs at a constant rate but add less to the power of the design to detect impacts. As mentioned, the design for Option 3 would focus sampling on approximately four children per classroom who are most likely to be referred for intensive, individual services to address behavioral issues.5

      An alternative sampling approach that increases the power of the design is to sample children from the center without regard to classroom (the approach described in Chapter IV). This approach provides a sufficient sample of children to measure child outcomes, but the sample of classrooms (used for teacher- and classroom-level analyses) would not strictly be representative of the center’s classrooms. Instead of randomly selecting classrooms for the classroom-level observations, researchers could observe the three classrooms per center that include most of the children selected for the child sample. The advantage of this approach is that the sample size of children needed to detect the same levels of impact at the same level of power is approximately 25 percent lower and the number of centers required is approximately 10 percent lower than under the previously discussed design, which would randomly select classrooms and then children within classrooms.

      Sample Sizes. Table V.2 shows the number of centers, classrooms, and children required for the evaluation of a child behavioral enhancement under alternative designs. Contrasting two different enhancements to regular Head Start services or to each other requires a larger sample than does a simple comparison of one enhancement with regular Head Start services. Focusing the sample on children with higher levels of parent-reported behavioral problems reduces the number of children per center, requiring a slight increase in the number of participating centers but an overall decrease in the number of children if assumed levels of precision (15 percent to 20 percent of a standard deviation of the outcome measures) are to be preserved.

      Under the first design option, which contrasts the classroom-level enhancement with regular Head Start services, if the desired level of precision in detecting impacts were .20 (one-fifth of a standard deviation on the outcome measures), researchers would randomize 92 centers (46 to each group) and would then randomly select 1,656 children from those centers, assuming the availability of only minimal baseline data.6 Minimal baseline data would include demographic information about the family collected as part of the consent form and children’s scores from the fall Head Start National Reporting System (NRS) assessments.7 Baseline demographic data allows us to control for these characteristics statistically when we estimate impacts, which reduces the variance of the impact estimate and thereby increases the power of a particular sample size to detect impacts. (Section E describes the statistical models that could be used in the analysis of impacts.) Sample sizes would have to be substantially larger if the desired precision level for impact estimates were .15, as opposed to .20.

      Table V.2. Sample Sizes Required for a Stage 2 Evaluation of a Classroom Management and Child Behavioral Intervention Under Alternative Designs and Random Assignment of Centers
        Total Number of Centers Total Number of Classes Initial Sample of Children
      Option 1: Classroom-Level Intervention Only; Sample from All Children MDE = 0.15 No baseline 204 408 3,672
      Minimal baseline 164 328 2,952
      MDE = 0.20 No baseline 116 232 2,088
      Minimal baseline 92 184 1,656
      Option 2: Classroom-Level and Individual-Child Interventions; Sample from All Children MDE = 0.15 No baseline 306 612 5,508
      Minimal baseline 246 492 4,428
      MDE = 0.20 No baseline 174 348 3,132
      Minimal baseline 138 276 2,484
      Option 3: Classroom-Level and Individual-Child Interventions; Sample of Children with Behavioral Problems MDE = 0.15 No baseline 420 840 4,200
      Minimal baseline 336 672 3,360
      MDE = 0.20 No baseline 237 474 2,370
      Minimal baseline 189 378 1,890
      Note: Sample size calculations assume that two classrooms per center are randomly selected for the research sample. We also assume that the research sample for options 1 and 2 initially includes 9 children per classroom and 90 percent of the initial sample is available at followup. We assume that the research sample for option 3 initially includes 5 children per classroom and that 4 children per classroom are available at followup. “Minimal baseline” means that demographic information is collected as part of the consent form, and that NRS fall outcome data are available so that the R2 for the regression-adjustment of the impact estimates is .20.

      Sample size calculations also assume a two-tailed test of statistical significance with 80 percent power and a 95 percent confidence level. The sample size calculations do not include an adjustment for the design effect of weighting for sample nonresponse. Appendix B provides details about the calculations.

      MDE = minimum detectable effect; NRS = National Reporting System.

      Under the second design option, which includes a second enhancement group that adds to the basic classroom enhancement the offer of intensive services to children with more-serious social-emotional issues, the sample of centers and children expands by one-third. If the desired level of precision in detecting impacts were .20, researchers would randomize 138 centers (46 to each of three groups) and would randomly select 2,484 children from those centers if minimal baseline data were available.

      Under the third design option, which examines the same two enhancements and control group as under the second design option but focuses on a sample of children identified by parent reports at baseline as having higher-than-average behavioral problems, the number of centers is higher, but the number of children required for the sample is lower. We expect that only five children per classroom will be identified as having high levels of behavioral problems so have assumed four children per classroom in the final analysis sample (after nonresponse) under this design, rather than eight. The clustering of the sample means that including more centers adds more to the power of the design than is lost by reducing the number of children per classroom and per center. Thus, if the desired level of precision in detecting impacts were .20, researchers would randomize 189 centers (63 to each of three groups) and would randomly select 1,890 children from those centers if minimal baseline data were available.

      Alternatively, researchers could combine Options 2 and 3 while keeping the number of centers at Option 2 levels and accepting the loss of power to detect impacts in the subsample of children with high levels of behavioral problems. Thus, if an Option 2 study were conducted using 138 centers, two classrooms per center, and eight children per classroom, which can detect effect sizes of at least .20 if minimal baseline information is available, the subsample of children with behavioral problems would be sufficient to detect effects greater than .20 but less than .25, which seems adequate.

  5. IMPLEMENTING THE CLASSROOM AND CHILD-FOCUSED INTERVENTIONS

  6. The goals of implementation in Stage 2 are twofold. First, the enhancement should be implemented to high fidelity in the classrooms within the centers that were assigned to the enhancement group. High fidelity ensures that the evaluation measures the impact of the enhancement as designed, rather than measures a watered-down version that was implemented incorrectly. Second, procedures and manuals should be clear enough and should cover a broad enough range of Head Start program situations so that, in the future, it would be feasible to implement the enhancement on a broader scale, using Head Start training and technical assistance staff, with only minimal assistance from the enhancement developer. To achieve this goal, the procedures and manuals will have to be revised as necessary after implementation on the basis of information obtained while implementing the enhancement in the centers. Issues that arise during initial training or during the technical assistance period will have to be addressed and documented so that the training materials subsequently can be revised in a way that reflects what has occurred in the classroom during implementation.

    At the beginning of Stage 2, the plans are available for training teachers on principles of classroom arrangement, scheduling activities, and interacting with children to promote positive behavior and minimize opportunities for disruptive behavior, as these plans will have been established during the enhancement development period (Stage 1). At the start of implementation, a clear plan to implement the classroom-based aspects of the enhancement to high fidelity will therefore already be in place. Manuals will be available that describe room arrangements in detail; give teachers examples of activities; provide numerous examples of how to deflect or defuse conflict situations, and how to identify and acknowledge positive behavior; and offer trainers a plan to work with teachers during an intensive initial period and over time, while the teachers are trying the new methods in their classrooms. The manuals should indicate the intensity of initial training and ongoing technical assistance, the duration of the training and technical assistance, and the training and technical assistance staff’s qualifications. These manuals do not guarantee that the implementation will proceed smoothly in all centers, but they do provide a basic plan that will have been refined in several Head Start centers. Challenges encountered and resolved during implementation of a Stage 2 evaluation will form the basis for modifying the enhancement or the training and technical assistance.

    For a small-scale evaluation (consisting of 50 to 150 centers, depending on whether one or two versions of the enhancement are tested, whether the sample is drawn from all children in a center or focuses on children with high levels of behavioral problems, and whether minimal baseline data have been collected), we expect that the enhancement developer will coordinate training for the teachers in centers assigned to the treatment group. Depending on the design, the training and technical assistance staff will work with teachers in either one-half or two-thirds of the centers. An evaluation of this size could operate in approximately five to ten geographic locations, each with 5 to 15 enhancement centers and an appropriate number of control-group centers. The enhancement developer should have a site coordinator in each location and a staff of trainers to offer initial training and on-site technical assistance during both the implementation period (March through May of the second year) and the evaluation year (September through December of the second year and January through June of the third year). The enhancement developer should visit the sites periodically to conduct some of the large-group training, address any issues that arose during training and technical assistance, and monitor implementation.

    The intensive child-focused intervention will require that a mental health professional be available to each center assigned to the treatment group that combines the classroom-level and individual-child services. The mental health professional will consult with teachers about individual children whose behavior raises concerns and, as appropriate, will observe the children in the classroom, provide strategies to the teachers for addressing behavioral challenges, and meet with parents to discuss issues in the home that may be contributing to the observed behavior. The mental health professional also will provide intensive, one-on-one services for approximately two months to children who can benefit from learning and practicing skills in communicating feelings, empathy, anger management, conflict resolution, and developing friendships.

    Prior to implementation of the classroom-based social-emotional enhancement, the enhancement developers should communicate with the leadership and stakeholders of the participating programs at all the levels involved, including the program director, education coordinators, center directors, parent Policy Councils members, and others, as appropriate. Care must be taken to ascertain the most efficient selection of staff to train. For example, training all members of a teaching team, including teaching assistants and aides, as well as the lead teacher, should help to ensure optimal implementation of the classroom management ideas and principles for working with children, as well as the continuation of the approach in the event of teacher turnover. All relevant staff should at least be informed about the goals and approaches of the behavioral enhancement.

    Implementing a classroom management enhancement is likely to require two or three days of initial training in a specialized setting (such as a school or university that has facilities for both large and small groups). Training should include an overview of the classroom management approach and video footage of teachers practicing the skills involved in this approach to classroom management in classrooms of preschool children, so that teachers in the enhancement group can visualize what the approach looks like in practice. Training should then cover a series of modules that describe each strategy. Practical exercises involving role playing and hands-on experience with the major strategies should be offered so that the teachers can practice what they have learned. Techniques for direct and observational assessments of children and links between behaviors that these techniques are promoting and required elements of the Head Start Child Outcomes framework also can be covered in training sessions. The training should wrap up with summaries, questions and answers, and discussion of the technical assistance plan.

    After training has been completed, technical assistance staff will visit classrooms periodically to observe how teachers are practicing the techniques, and to discuss any questions or difficult behavioral issues that teachers are facing. The staff should visit once per week at first, for about one to two hours, and could then taper the visit schedule to every other week and, eventually, to once per month. During the visits, the staff should discuss with the teachers any questions or issues faced while practicing the techniques. They should observe the classrooms; complete measures of fidelity; and discuss with the teachers what they have observed, what aspects of implementation are going well, and what steps the teachers can take to improve their techniques. In the fall, refresher training just before classes begin would likely be helpful to teachers. Trainers also should provide intensive training to new teachers who have joined enhancement centers, measure fidelity in all enhancement classrooms, and provide targeted technical assistance to teachers who need additional help with implementation.

    1. Measuring Implementation

    2. Researchers will have to monitor random assignment throughout the implementation period to ensure that the enhancement and control groups remain separate. During this time, they also will measure the process of implementation and fidelity of the ssocial-emotional enhancement. During the evaluation year, they will continue to monitor implementation and fidelity and, as discussed in the next section, will collect measures of teachers’, classrooms’, and children’s outcomes for the impact analysis.

      Measures of Implementation and Fidelity. Researchers will visit a subset of approximately 30 percent of the enhancement centers to measure implementation processes and fidelity to the enhancement.8 The number of centers included in the implementation study is generally determined as a compromise between the evaluation budget and the need to measure implementation experiences in several of the centers. (A list of topics, by data collection mode, is shown in Table V.3.) Since the implementation study is intended to provide insights into what implementation strategies worked well and what did not, the implementation staff should help classify programs by their implementation experience (for example, how well and how easily implementation occurred) and centers can then be randomly selected from each of these groups to participate in the implementation study. Visits should take place during the spring of the implementation year (year 2), after teachers have received training and technical assistance, and again in the fall, approximately one month after classes begin. Researchers will conduct classroom observations to measure how closely the classroom practices match the enhancement’s design, and how well the services of the mental health professional meet teachers’ and children’s needs. Researchers will conduct semistructured interviews with training and technical assistance staff and with key program staff, including directors, education coordinators, and health coordinators, as well as focus groups with teachers to explore the content and quality of training, the fit between the enhancement and the teachers’ classroom experiences, and the content and quality of technical assistance. Questions will focus on what aspects of training and technical assistance worked well; what challenges were encountered; how well the approach fits with the program; and what aspects of the enhancement, training, and technical assistance might have to be changed.

      Although enhancements that are evaluated at Stage 2 are expected to have well-developed implementation plans and thus a high degree of success in implementing to high fidelity in the participating centers, it is not possible to anticipate every challenge to implementation. Accordingly, measures of implementation and fidelity are required both to improve the next generation of implementation materials (as described above) and to ensure that the evaluation can measure the effects of a well-implemented enhancement (discussed in Section E). Measures of fidelity to the enhancement design and the quality of implementation obtained from classroom observations as part of the impact study will be used to define subgroups that enable researchers to estimate impacts of the classroom management approach among centers that implemented the enhancement with high fidelity. Because fidelity and the quality of implementation are not experimentally determined (relative to centers that do not implement to high fidelity), these estimates will have to be interpreted with caution, but they will provide a measure of the effectiveness of the enhancement approach in “high-fidelity” sites relative to the overall impact.

      Table V.3: Potential Implementation Study Topics for an Evaluation of Social-Emotional Behavioral Intervention
      Implementation Study Topics Data Collection Methods
      Direct Observation Program Records and Documents Semi-Structured Interviews with Key Program Staff Semi-Structured Interviews with T/TA Providers Focus Groups with Teachers
      Initial Staff Training What was the content of training?   X   X X
      What were the qualifications of trainers?       X  
      What was the intensity and duration?       X  
      How well did training prepare staff to implement the classroom management techniques? X   X X X
      How well did the approach support other classroom activities? X   X X X
      How could training be improved?     X X X
      Enhancement Were the principles of classroom arrangement followed in classrooms observed?       X X
      Were classroom activities structured to minimize disruptive behavior?       X X
      Do teachers have more class time available for language, literacy, and mathematics activities than before?     X X X
      Are teachers using positive behavioral strategies with children? X   X X X
      Are teachers using appropriate strategies to minimize negative behavior?     X X X
      Technical Assistance and Support Who provided technical assistance and support? Enhancement development staff? Health coordinators? Others?     X X X
      What types of support did staff receive in providing enhanced services? Observation and feedback? Modeling? Conferences between teachers and technical assistance staff? Written reports?     X X X
      How often was this assistance provided?     X X X
      What topics were covered?     X X X
      What types of questions were raised? What support Was needed?     X X X
      How helpful was the technical assistance?     X X X
      What other types of support did staff need?     X X X
      Lessons Learned Which aspects of training went well?     X X X
      What factors associated with programs or teachers made implementation go well?     X X X
      What challenges were encountered in training? In technical assistance?     X X X
      What factors associated with programs or teachers made training or technical assistance more challenging?     X X X
      What strategies were used to overcome the challenges?     X X X
      How well has the classroom management approach supported other program activities?     X X X
      How well has the approach seemed to support the needs of Head Start children and families?     X X X
      How did teachers address any issues regarding the fit of the approach with the Head Start program?     X X X
      What did Head Start families think of the classroom approaches?     X X X
      How well did the mental health professional work with teachers?
      What challenges, if any, were encountered in this partnership, and how were they resolved?
               
      How well did the mental health professional work with parents?
      What challenges, if any, were encountered and how were they resolved?
               
      How well did the mental health coordinator work with individual children?
      What challenges, if any, were encountered and how were they resolved?
               
      How can the enhancement be improved?     X X X
      How can T/TA be improved?     X X X
      T/TA = training and technical assistance.
  7. OUTCOMES MEASUREMENT AND DATA COLLECTION PLANS

  8. A comprehensive assessment of the impact of enhancements to support children’s social-emotional development would examine the impact of the intervention on children, teachers, and the overall structure of classroom activities. If the enhancement is successfully implemented, it should enable the teacher to manage the classroom so that all children are actively engaged in language development, early literacy, early mathematics, and play, and so that the teacher does not have to spend disproportionate amounts of time with children who require additional support.

    Outcome measures selected for the evaluation should have the following properties:

    • Relevance to School Readiness Goals and the Head Start Child Outcomes Framework. The Head Start Child Outcomes Framework (COF) outlines eight domains of early learning and development (see Appendix A), including social and emotional development, approaches to learning, and the development of skills in language, early literacy, and early mathematics, all relevant to this enhancement.

    • Sensitivity to Intervention. Measures should reflect outcomes malleable to intervention, as opposed to more trait-like qualities (such as temperament).

    • Appropriateness for a Culturally Diverse, Low-Income Population. The selected measures should adequately assess school readiness of low-income 3- to 5-year-olds from diverse cultural populations, including those who do not speak English in the home.

    • Adequate Psychometric Properties. Measures selected should be reliable and valid. Reliability means that they measure the same construct across various settings (for example, the classroom or the home), on repeated occasions (test-retest reliability), when administered or rated by different interviewers/observers (interrater reliability), and when subsets of items are administered to identical samples (split-half reliability). Validity refers to the degree to which the measure taps the underlying construct it purports to measure, keeping in mind linguistic and cultural considerations.9

    • Valid and Reliable for Intended Mode of Administration. A measure might not be valid or reliable as reported if it is used with a group for whom it was not designed or with a mode of administration for which its reliability and validity have not been tested.

    • Prior Use in Large-Scale Surveys and Intervention Evaluations. Prior use is helpful because it suggests that the measure is practical and feasible to use in large-scale research and because it provides a benchmark for the scores of children participating in the evaluation.

    • Cost and Burden. The cost or burden of data collection strategies, including training requirements, respondent burden, and program administration burden, should be minimized.

    Table V.4 describes the classroom and child outcomes to be measured and the recommended measures of each one. Measures of teachers’ knowledge and attitudes about developmentally appropriate practice can be taken from the Family and Child Experiences Survey (FACES). Measures of the classroom environment will include practices that support social and cognitive well-being as well as overall classroom quality. Measures of children’s behavior will be based on teacher reports and direct observation. Measures of children’s language, early literacy, and mathematics ability could rely on the NRS assessments, as those outcomes are more distal from the behavioral outcomes targeted by the intervention. For Option 3, which focuses on outcomes for children likely to receive intensive services for serious behavioral issues, a parent interview or combined interview and home visit will be important for gauging the impacts of the services.

    To obtain data for the impact analysis, a baseline assessment of children’s behavior will be obtained in the fall from teacher reports and time-sample observations; a similar follow-up assessment will be conducted in the spring. Children’s progress in language and mathematics will be assessed by the fall and spring NRS. Because three-year-old children in the sample will not receive an NRS assessment, we recommend using the NRS instrument for three-year-olds in the spring followup only. A teacher interview, also conducted during the fall and spring, will obtain information on teachers’ attitudes about and knowledge of developmentally appropriate practice. The classroom observations (in the enhancement and control classes), which needs to be conducted only during the spring, will measure overall classroom quality and the classroom management strategies that teachers learned as part of enhancement training. Taking stock of the overall quality of the classroom is an important step in helping to understand the pathways by which the classroom management approaches may be changing practice and potentially affecting child learning. Children’s demographic characteristics will be obtained from an information sheet completed by parents as part of the consent process; for Option 3 (which includes a sample of children with high levels of reported behavioral problems), this form also will include parents’ reports of their children’s behavior. A parent interview conducted in the spring, either by telephone or in person, will provide information on parenting practices, the home environment, and the parent’s perspective on the child’s behavior.

    Before any data are collected, the research team must draft instruments, obtain IRB and OMB approval, obtain parents’ consent for their children’s participation, and randomly select children to participate based on their eligibility status and their consent status. We discuss these steps in the remainder of this section.

    Table V.4: Measures of Intermediate and Child Outcomes Associated with a Social-Emotional Behavioral Intervention
    Outcome Recommended Measure Type of Measure
    Teachers’ Knowledge Attitudes and knowledge about developmentally appropriate practice Teacher Beliefs Scale (Burts et al. 1990) Teacher survey
    Classroom Processes Materials and teacher activities to promote learning Early Childhood Environment Rating Scale-Revised (Harms et al. 1998) Observation and Teacher Survey
    Classroom practices that promote positive behavior and minimize disruptive behavior Adaptation of the Inventory of Practices for Promoting children’s Social and Emotional Competence (Center on the Social and Emotional Foundations for Early Learning 2003). Observation
    Children’s Development Behavioral problems Child Behavior Checklist (Achenbach and Rescorla 2000) Teacher report
    Social competence Social Competence and Behavior Evaluation (LaFreniere and Dumas 1995) Teacher report
    Approaches toward learning Preschool Learning Behaviors Scale (McDermott et al. 2000) Teacher report
    Child behavior toward peers and adults in the classroom a Howes Peer Play Observation Scale adapted for FACES Observation
    Mathematics ability Woodcock-Johnson Applied Problems (Woodcock et al. 2001) NRS assessment
    Language ability Peabody Picture Vocabulary Test, 3rd Edition (Dunn and Dunn 1997) NRS assessment
    Early literacy Woodcock-Johnson Letter-Word Identification (Woodcock et al. 2001) NRS assessment
    a Because this measure is very intensive and focuses on one child at a time, it is recommended for use only as part of an evaluation of Option 3, which focuses services on individual children most likely to be referred by teachers for intensive, individualized services to address behavioral issues.

    FACES = Family and Child Experiences Survey; NRS = National Reporting System.

    Develop Data Collection Instruments. Data collection instruments will be written during the first year of the study. The instruments will include the consent form, with a form requesting demographic and child behavior information; the teacher survey; the child assessment and observation protocol; and the classroom observation protocol. Many of these instruments will include standardized assessments that will have to be formatted to simplify administration by a trained assessor. Others (such as the teacher questionnaire) will rely on questions that have been used in other studies. After the data collection instruments have been developed, they will be pretested to ensure that respondents understand the questions, that the flow proceeds logically and smoothly, and that the time required to complete them is reasonable.

    IRB and OMB Research Review. Research on human subjects must be reviewed by an IRB, which considers the benefits of the research to society, the programs, and the participating families and weighs those benefits against the cost of the research to the families and program staff. The IRB also reviews protection of research participants from harm by ensuring that confidentiality is maintained. In addition, if the evaluation is federally funded, data collection instruments and the research plan must be approved by OMB. The data collection instruments are reviewed to ensure that they do not overlap with other ongoing federal data collection efforts, and that burden is not excessive. These reviews will be conducted during the nine months preceding the start of the evaluation year, while the intervention is being implemented.

    Consent. Active consent for children to participate in the research must be obtained from parents or guardians. The parent consent form will clearly inform parents (guardians) about the duration of the study, the types of assessments that will be administered, and the voluntary nature of participation. An information sheet will be included in the consent package to collect basic demographic information about the family, such as age, race and ethnicity, and family structure, as these variables will be needed to improve the precision of impact estimates, as well as to define subgroups. If the sample is to include children with high levels of reported behavioral problems, the consent form and information sheet will include a questionnaire for the children’s parents that asks about child behavioral problems as well as positive social behavior, for balance.

    The consent process could proceed more smoothly if it is incorporated into the home visits that many programs make to families. During these visits, which typically occur just before children attend class for the first time, teachers bring forms that parents must complete before the start of the school year. The teachers are available to explain the forms, and to ensure that they are completed correctly. If the study’s consent is part of this process, the teachers would be able to explain the nature of the study, what will happen if the children participate in the research, and the voluntary nature of the children’s participation.

    Child Eligibility Criteria for the Study. Because the sample of children for Option 1 or Option 2 is intended to represent all three- and four-year-old children in each center, all children participating in Head Start should be eligible for the study. Although teacher training will begin in the final months of the preceding Head Start year, we expect that children attending Head Start the next year will have very little exposure to a fully implemented enhancement. We therefore recommend including all Head Start children in the potential sample for the study, rather than limiting the sample to children new to Head Start during the year in which fall and spring data are collected.

    For Option 3, researchers need a sample of children with high levels of externalizing behavioral problems who are thus most likely to be referred for intensive, child-focused services during the year. These children will be selected based on parent reports at baseline of high levels of aggressive behavior, hyperactive behavior, oppositional defiant behavior, and emotional reactivity. Based on previous studies of Head Start classrooms, we expect approximately 10 to 20 percent of children to meet the threshold for this sample.

    Selection of Classrooms and Children for the Study. When classrooms and children’s enrollments are established in August, researchers will work with center directors to obtain a list of teachers (in three- and four-year-old classrooms) and the number of children enrolled in each classroom. Two classrooms from each center will be selected with probability of selection proportional to class size. Parent consent forms will be distributed to parents of children in those classrooms. Returned consent forms and associated information sheets will be sent by program staff to the researchers for processing. The researchers will enter data from the forms to classify children by center, by classroom, by consent status, and by demographic characteristics. A sample of nine children from the pool of eligible children in each research classroom will be randomly selected for the data collection. For Option 3, the information from parent reports of children’s behavioral problems will be entered and assessed. Approximately five children per classroom who meet a threshold cutoff for high levels of behavioral problems will be included in the sample.

    Parent consent rates typically vary across classrooms, as rates at which parents return the forms and consent can depend on how organized the parents are, their degree of connection with the classroom teacher, whether parents talk to each other about participating in the study, and how diligently the teacher follows up with parents to return the forms. However, the study will typically not have information about children without parental consent; in this case, the study will have to generalize the results based on those with consent to the entire class. However, if some aggregate, class-level data are available (such as demographic composition or scores from the NRS assessments), this information can be used to adjust classroom-based estimates for nonresponse by weighting the responders to reflect the true aggregate classroom-level composition.

    Teacher Self-Administered Questionnaire. Teachers’ attitudes about managing the classroom to promote positive behavior and to minimize opportunities for negative behavior can influence how well the teachers implement the enhancement approach. We recommend giving teachers a short (no more than 15-minute) self-administered questionnaire during the fall, while the child assessments are in progress. This questionnaire could be adapted from the FACES teacher interview and could include questions about the teacher’s background, attitudes, and beliefs about developmentally appropriate practice, as well as a short behavior problems scale for children in the research sample. During the spring data collection, the teacher questionnaire will omit the teacher background questions (unless the teacher is new) but will include a short behavior problems scale and a short scale measuring approaches toward learning for each child in the research sample. The behavior problems scale and the social skills rating scale will provide additional information on children’s behavior based on the teachers’ observations during the year. The measure of approaches toward learning will help to identify positive factors, such as curiosity and attentiveness, that are associated with academic success.

    Classroom Observation. Measuring the overall quality of the classroom with a commonly used observational protocol will enable researchers to understand how the classroom management techniques have contributed to the overall quality of the classroom environment. Thus, we recommend using the Early Childhood Environment Rating Scale-Revised (Harms and Clifford 1998) to measure classroom quality. In addition, a measure of the fidelity of teacher practices to the enhancement is important for gauging whether the measured impacts correspond to generally high-fidelity implementation, or whether the enhancement was not fully implemented. We recommend that the evaluation team work with the Center on the Social and Emotional Foundations for Early Learning to adapt its Inventory of Practices for Promoting Children’s Social and Emotional Competence into a briefer observation tool. This tool would examine the ways in which a teacher builds positive relationships with children, creates a supportive environment, and promotes social and emotional well-being through his or her teaching strategies. We recommend that the classroom observations be conducted during the two weeks before the spring follow-up assessments begin.

    Parent Interview and Home Visit. This portion of the data collection plan is an option if the sample includes children likely to be referred for intensive, individualized services. These services are likely to involve not only the child, but also the teachers and parents, who will be shown how to interact more positively with the child, and how to create an environment at home and at preschool in which the child can thrive. Because of the greater cost associated with home visits, parent information could be collected via telephone interview; however, the home observations have greater validity. We recommend using the Home Observation for Measurement of the Environment (HOME; Caldwell and Bradley 1984), preschool form. Numerous studies have linked scores on the HOME with children’s cognitive, social, and emotional well-being. A short version of the HOME has been developed for the National Longitudinal Study-Child Supplement, which includes items that can be asked during a telephone interview. We also recommend using the Parenting Stress Index-Short Form (Abidin 1995) to measure stress in the parent-child relationship; this parent-report measure could be administered either in person or by telephone. The parent interview also would include measures of the child’s behavior from the parent’s perspective, including the Child Behavior Checklist (Achenbach and Rescorla 2000) and the Social Competence and Behavior Evaluation (LeFreniere and Dumas 1995) or Social Skills Rating Scale (Gresham and Elliott 1990).

    Child Assessment. The primary hypotheses about the impacts of the improved classroom management techniques are whether the enhancement (1) improves children’s social behavior, and (2) reduces behavioral problems. Accordingly, we recommend that the child assessments focus on children’s social behavior. The Howes Peer Play Scale has been adapted for the FACES project to enable observers to measure the activities and behaviors of as many as six children per classroom. This approach would accommodate assessment of the sample of children proposed for this evaluation. The measure could be adapted to yield measures of children’s prosocial behavior, conflicts with other children, and other important outcomes. As a secondary effect, the classroom management techniques in this evaluation are intended to enable teachers to spend more productive classroom time supporting children’s language, early literacy, and early mathematics skills. To address the research question about children’s progress in those areas, we recommend using the NRS assessments. NRS assessment data potentially could be available for all of the four-year-old children in the sample, and they also could be administered easily to the sample’s three-year-old children during the spring followup. Using the NRS assessments would save the expense and the burden on programs and children of collecting an additional 20 to 30 minutes of assessment data on children’s language, early literacy, and mathematics skills. The child assessment protocol could then focus on obtaining child behavioral measures more central to the evaluation.10

    An intervention to influence children’s behavior is highly unlikely to have an immediate—or even near-term—effect on the children. Moreover, some children may act out at the beginning of the Head Start year while they are adjusting to new groups of children, to their classrooms and school schedules, and to their new teachers. Therefore, a very early baseline is unlikely to provide a valid measure of a child’s behavior. Instead, we recommend a field period that starts at least one month after classes have begun at the Head Start center, with a six-week window for data collection at the sites. To assess how children’s behaviors have been influenced by their experiences in the Head Start classroom and by what they have learned during the Head Start year, the follow-up assessment should be conducted as close to the end of that year as possible. We recommend that the follow-up data be collected during a six-week window that ends two weeks before the end of the year, and that data collection be matched to the timing of the fall assessment so that classes assessed early in the fall field period also are assessed early in the spring field period.

    Costs of the Enhancement. Program administrators care not only about the effectiveness of the quality enhancements, but also about the costs of these enhancements over and above current expenditures. To support analyses of the cost of the enhancement relative to its impacts (for cost-effectiveness analysis) or benefits (for cost-benefit analysis), researchers will collect information on the costs of implementing the classroom behavioral approach and information on the additional cost of the intensive child-focused services relative to the cost of the program without either enhancement. Researchers should measure two types of costs: (1) the upfront, one-time costs of beginning implementation of the enhancement; and (2) the ongoing additional costs of the enhancement. Among the first set of costs are the cost of initial teacher training, including the enhancement developer’s and training staff’s time; the cost of substitute teachers hired to cover classrooms while the regular teachers attend training; and the cost of additional staff days for teachers who are paid for their days of training. The first set of costs also includes costs associated with the training staff’s technical assistance visits and any costs associated with teachers attending extra training sessions outside their normal classroom duties (for example, periodic group discussions, if any). Time spent by teachers and trainers would be valued at these staff’s hourly wage, including fringe benefits. If the training were to supplant the usual teacher training sessions conducted during the year, those routine training costs would be subtracted. Usually, however, new enhancement training is an add-on expense, rather than a substitute one. If space for training is rented, the cost of the rental is included as well. Materials, including documents, videos, and other training materials, should be valued at their cost.

    The ongoing additional costs of the classroom-level behavioral enhancement would include the costs of ongoing technical assistance and costs to train new teachers. Teachers might require additional materials for the classroom, either to organize the space to prevent conflicts from occurring (for example, between children engaged in quiet activities and children engaged in more active, noisy activities), or to ensure sufficient supplies of popular items for all children who may want to use them. In addition, the cost of the mental health professional who consults with teachers about particular children and provides intensive services to children must be included in the cost of the enhancement; that individual’s hourly rate would be used, with any corresponding reduction in other child mental health services used by the program subtracted. The cost of refresher training and ongoing technical assistance to teachers during a normal program year to ensure that the classroom management techniques continue to be implemented to high fidelity is another ongoing cost. For example, technical assistance staff might make two or three visits to each classroom during the program year to observe, measure fidelity, and meet with the teachers and with the Education Coordinator to discuss classroom practices and to respond to questions. Finally, overall teacher turnover in Head Start is approximately 15 percent per year (National Institute for Early Education Research 2003). Accordingly, an ongoing cost of the enhancement is the cost of training and technical assistance to 15 percent of the teachers in the enhancement group each year. Assuming that the evaluation includes 50 to 150 centers in the enhancement group, 8 to 23 teachers would have to be fully trained each year.

    Three different approaches to estimating the costs of implementing the enhancement could be used:

    • Identify the costs associated with implementing the classroom management approach and the services of the mental health professional for individual children and ask center directors for this information

    • Obtain the full budget for the enhancement center for the year prior to enhancement implementation and the full budget for the current year and estimate the cost of the enhancement as the difference in the two budgets (adjusting for the normal cost inflation from one year to the next)

    • Obtain the full budget for the enhancement centers and the full budget for the control-group centers and estimate the cost of the enhancement as the difference between enhancement and control-group center costs

    The first approach is the least burdensome for the enhancement centers (and eliminates burden for the control centers) but could fail to include costs associated with the enhancement. The second approach is more burdensome for the enhancement centers than is the first one, and it does not require a response from the control centers. However, if anything other than the implementation of the enhancement changes from one year to the next, the estimate of the cost of the enhancement will be inaccurate. The third approach is the most burdensome, involving extensive collection of cost information from both enhancement centers and control centers, but it would provide the most accurate estimate of the costs of implementing the enhancement. If a formal benefit-cost analysis is to be performed, the more accurate cost data would be needed.

    Information about these costs can be obtained from semistructured interviews with program and center directors, from the enhancement developer and training/technical assistance supervisory staff, and from program records. All cost information must be obtained in dollars; however, to ensure that the dollar values collected at various time points represent the same value, the dollar values collected from informants and records should be either inflated or deflated, using the Consumer Price Index, to represent dollar values in a single target year (for example, the analysis and reporting year).

    Finally, cost information sometimes is obtained using two perspectives. First, the actual dollar costs of the enhancement must be obtained. Second, a measure of cost to society would place a value on any volunteer labor or donated space and materials and would add those costs to the actual expended costs. Both cost perspectives would be used in the analysis of benefits and costs or cost-effectiveness of the enhancement.

  9. ANALYSIS AND REPORTS

  10. Reports based on the evaluation should report the estimated impacts of the enhancement on teachers’ practices, classroom activities, and children’s outcomes. They also should describe the implementation experience and should discuss how the classroom management approach and child-focused services could be implemented on a broad scale, using the Head Start training and technical assistance system for support. Reports should be written for a broad audience, with stand-alone summaries that can be understood by program staff and policymakers, and more-detailed reports that summarize the research design, sample characteristics, analytic approaches, and findings in a clear and accessible way. In this section, we discuss the approach to estimating impacts and conducting the cost-effectiveness analysis.

    1. Estimating Impacts of the Behavioral Enhancement

    2. Using a random assignment evaluation design means that fairly simple estimation methods can be used to determine the impacts of the quality enhancements at a point in time. Under random assignment of centers, the center-level mean outcomes are estimated, after which, separately for the enhancement group and the control group, the center mean outcomes are averaged over all centers in that group. The simple difference in the means between enhancement and control centers is the impact estimate.

      More-precise estimates can be obtained by estimating regression models. Regression procedures can improve the precision of the estimates and can adjust for any residual differences in the observable characteristics of program and control group members due to random sampling and interview nonresponse. Regression models take the following form:

      (1) Y = α + XΒ + γT + ε ,

      where Y is an outcome variable; X is a vector of explanatory variables; T is an indicator that equals one for members of the enhancement group and zero for members of the control group; α, Β, and γ are parameters to be estimated; and ε is a random-error term. The estimate of the parameter γ (is the estimated impact of the quality enhancement compared) with regular Head Start services.

      Because random assignment will have been conducted at the center level, the regression adjustment takes the form of a hierarchical linear model (HLM) of child development consisting of two nested levels. By specifying the model at each level, we can conduct analyses for the appropriate units of analysis and can conduct statistical hypothesis tests that correctly account for the clustering of children within classrooms. Although not strictly necessary for conducting impacts (because the evaluation is based on a random assignment design), adjusting the impacts with an HLM model can help to increase the precision of the estimates.

      For the design involving random assignment of centers, the analytic model is the following, where the variables are indexed by child (i) and centers (j) :

      Child-level model

      (2) Yij (t) = αYij (0) + ΒXij + Cj + ε1ij ,

      Center-level model

      (3) Cj = ηTj + δZj + ε2j ,

      where Y(t) is the outcome at follow-up period t; X is a set of child characteristics, such as gender; T is a variable indicating whether the child was in an enhancement classroom or center; Z is a set of center-level variables, such as whether classes are full-day; and ε1 and ε2 are disturbance terms assumed to have a mean of zero, and to be uncorrelated with each other. Parameters to be estimated include α and Β , vectors of coefficients on the child baseline characteristics; c, the center effect; γ, the effect of the enhancement; and δ , the coefficients on the center variables.

      The statistical techniques used to estimate regression-adjusted impacts in equations (2) and (4) will depend on the form of the outcome, Y. If the dependent variable is continuous (such as the score on the NRS Math Assessment), ordinary least squares methods produce unbiased estimates of the parameter γ. However, if the dependent variable is binary (such as whether a child was rated as having behavioral problems in the clinical range), logit or probit maximum-likelihood methods will be used to obtain consistent parameter estimates.

      This estimation model assumes that all centers are weighted equally so that essentially, the average outcome measure for each treatment center is averaged with those of all other treatment centers to obtain the mean outcome for all treatment centers. Larger centers thus do not receive greater weight in influencing the mean score for treatment (or control) centers. If the enhancement were implemented in a few centers with six classrooms and several others with two or three classrooms, we would not want the results in the larger centers to overwhelm the results for all treatment centers. Averaging results across all centers regardless of center size addresses the question, “Does the enhancement work in the average center?” This approach is appropriate because the purpose of the evaluation at Stage 2 is to measure how well the enhancement works in a collection of centers overall. At Stage 3, when the results of the evaluation will be representative of all Head Start programs, it will be important to address the question of whether the enhancement worked for the average Head Start child, and as a consequence, the impacts measured in larger centers and program grantees should have greater weight than those measured in smaller centers and grantees.

    3. Subgroup Analyses

    4. Because the effectiveness of the enhancement targeting children’s classroom behaviors may differ by program setting or by characteristics of the children served, it would be useful to determine the groups for which the enhancement is most effective. This type of subgroup analysis would then enable individual programs to decide which enhancements might be useful to them. For example, analysis may demonstrate that the classroom management approach is effective only when teachers have less than a bachelor’s degree.

      The subgroups of interest will depend on the quality enhancement to be tested. Examples of center-level characteristics that can define subgroups include:

      1. Full-day or part-day program
      2. High-fidelity implementation or incomplete implementation
      3. Teachers’ qualifications
      4. Center or program size

      Examples of categories of child and family characteristics (measured prior to the experience of the quality enhancement) for subgroup analysis include:

      1. Child’s gender
      2. Child’s English proficiency
      3. Mother's education level
      4. Level of family income
      5. Parents' employment status

      The same procedures for calculating overall impacts can be used to obtain subgroup estimates, with the calculations made for particular subgroups.11 Regression-adjusted subgroup estimates are obtained by introducing an interaction term that is the product of the treatment indicator and an indicator of membership in the subgroup of interest. This term is entered into the appropriate model shown in Section E.1 for the level of random assignment, and for whether the subgroup is defined at the child level or at the center level.

      However, unless the overall sample is very large, it will be possible to detect impacts only for large subgroups of the population, as subgroup estimates, which are based on only part of the full sample, are less precise than are full-sample estimates. For example, our sample of 1,490 children (90 percent of the initial sample in 92 centers) is sufficient for detecting impacts with effect sizes of .20 (one-fifth of a standard deviation on the outcome measures) or more. However, for a subgroup that includes 50 percent of the children across all the centers (for example, children whose mothers have less than a high school diploma), impacts with effect sizes of .23 or more can be detected. For a subgroup that includes all of the children in half the centers (for example, full-day programs), impacts with effect sizes of .28 or more can be detected.

    5. Cost-Effectiveness Analysis

    6. A cost-effectiveness framework should be used to evaluate the costs and benefits of the enhancement. This type of analysis does not attempt to place a dollar value on impacts. Instead, impacts are measured in a common unit, such as an effect size (the impact divided by the standard error of the outcome measure). The impact in effect-size units is compared with costs measured in dollars. For each quality enhancement, an effect size per dollar spent on the enhancement can be calculated. For example, if the classroom management approach were to produce an impact on children’s cooperative behavior of 0.3 in effect-size units, and if the cost were estimated to be $10 per child, then the cost-effectiveness of the enhancement would be 0.03 per dollar. Measuring cost-effectiveness in this way enables program administrators to compare the cost of producing impacts they consider important using the same metrics, so that enhancements can be assessed in terms of their ability to provide the most “bang for the buck.”12

      The enhancement is likely to have impacts that vary in size across the outcomes measured. Therefore, the estimate of cost-effectiveness will depend on the outcome used to measure it. Researchers can report the range of cost-effectiveness estimates using the impacts on outcome measures considered to be most important. For example, the outcomes most important for a social-emotional behavioral intervention are children’s behavioral problems, social competence, behavior toward peers and adults in the classroom, and approaches toward learning. Alternatively, the cost-effectiveness of several enhancements designed to influence children’s social-emotional development could be compared using a common outcome measure.




1 If impact estimates were expected to be externally valid (representative of all Head Start centers), then sample sizes would have to be larger; see Appendix B for details. (back to footnote 1)

2 If teachers in the enhancement group discuss the enhancement strategies with control-group teachers, the control group could implement a version of the enhancement, and the evaluation would not provide a valid test of the impacts of the enhancement. Impact estimates would be biased downward, making it more difficult to conclude that the enhancement was effective. To avoid such contamination of the control group, we recommend implementing enhancements at the classroom level only if the potential for spillover from enhancement to control-group classrooms is small. (back to footnote 2)

3 Assuming a 90 percent response rate to the spring follow-up child assessment, nine children per class would have to be sampled at the beginning of the year. (back to footnote 3)

4 We discuss implementation further in Section C of this chapter. (back to footnote 4)

5 The plan for selecting classrooms and children after random assignment of centers is discussed in Section D. (back to footnote 5)

6 We assume that the evaluation will include two classrooms per center and nine children per classroom in the initial sample, with a 90 percent response rate to the spring follow-up data collection. (back to footnote 6)

7 Currently, NRS assessment scores are available only to individual programs aggregated to the program level. Making the scores available for evaluation in the way that we describe would require approval from ACF and consent from the families. (back to footnote 7)

8 Note that, as part of the impact evaluation, researchers will observe two classrooms per center to measure fidelity and classroom processes; these observations will be conducted in all enhancement and control centers. The implementation study will provide in-depth information about implementation experiences in a sample of enhancement centers. (back to footnote 8)

9 Content validity indicates that the set of items comprising a measure is a good representation of the construct being measured. Concurrent validity refers to sufficiently large correlations between the measure and another measure of the same construct (measured contemporaneously in the same sample). Predictive validity refers to a sufficiently large correlation between the child outcome measure and a subsequently measured construct that is theoretically associated with the child outcome. (back to footnote 9)

10 NRS data are not currently available below the program level, and the Head Start Bureau has not given permission for the programs to keep copies of children’s NRS assessments. The Head Start Bureau would have to approve the use of NRS assessment data at the individual level to support evaluation of Head Start quality enhancements. (back to footnote 10)

11 For center-level random assignment, calculate center-level mean outcomes and average across centers in the enhancement and control groups. (back to footnote 11)

12 In contrast, a cost-benefit analysis requires researchers to convert impacts into “benefits” that are valued in dollars. This task is complicated by the fact that the evidence linking differences in child assessment scores with future employment and earnings, involvement with the criminal justice system, and use of public assistance programs is quite tenuous. Accordingly, we do not recommend conducting a cost-benefit analysis based on outcome data collected from a single year in Head Start. (back to footnote 12)

 

Table of Contents | Previous | Next