Table of Contents | Previous | Next |
CHAPTER III: STAGE 1: DEVELOPING QUALITY ENHANCEMENT IDEAS
Head Start programs are a natural laboratory for innovations in early childhood education. At any given time, numerous initiatives are under way to improve the quality of the Head Start experience for children. Quality enhancements to the Head Start program may be generated by programs themselves in order to address their own specific needs, or they may be initiated by the Head Start Bureau or regions in response to more general needs. The enhancement may consist of a “pre-packaged” curriculum or teacher training module that is purchased from a third-party vendor, or it may be a program management practice or community outreach program created by a Head Start program. Although there may be a wealth of ideas in practice, little information is available about which enhancements work, or, at a more basic level, how they work.
Given the plethora of ideas and current practices, it can be a challenge to the Head Start community to sift through these activities to identify program enhancements that are ready for and worthy of rigorous evaluation. An initial development stage as we describe in this chapter offers a systematic framework for an enhancement’s development, ensuring that good ideas have an opportunity to progress to promising practices while ideas that are less well-developed or are not easily replicable are filtered out. This stage is a period of enhancement definition, documentation refinement, and early experimentation with implementation and measurement. If an enhancement is unable to meet the goals of Stage 1 by achieving clarity and replicability, then it should not be considered for evaluation in Stage 2. As such, an initial development stage can contribute significantly to Head Start by ensuring that only the most promising quality enhancements undergo rigorous evaluation.
We begin this chapter by discussing the rationale for a development stage. We then turn to a discussion of the specific goals of this stage, including the tools to be developed and the conditions to be met before a quality enhancement is ready for evaluation. We also discuss the duration and activities of Stage 1, including the products that should result and the players who should contribute to the work of this development stage. We conclude the chapter by presenting three examples of specific quality enhancements to illustrate the process and goals of Stage 1 activities.
-
THE DEVELOPMENT STAGE: RATIONALE AND OVERVIEW
GOALS OF THE DEVELOPMENT STAGE
Defining the Enhancement
-
Who or what is the target of change? What is expected to change in order to bring about the improvement in program and service quality?
-
What aspect or domain of children’s development and learning is expected to improve as a result of the quality enhancement? Through what avenues?
Documenting Implementation
- the initial training and/or technical assistance to launch the enhancement
- the ongoing training and/or technical assistance to support implementation
- the ongoing activities to enhance services (that is, day-to-day implementation)
Training and Technical Assistance: Initial and Ongoing
Content. Details a training curriculum and provides training materials and activities that are specific to the quality enhancement to be implemented
-
Intensity. Specifies the amount of training to provide within a particular time period (such as a week), and the trainer-trainee ratio. For example, is training provided during a workshop with 50 participants or one-on-one in the classroom?
-
Duration. Specifies the length of training at a given level of intensity. For example, does an initial training workshop last three days, or is it provided during a series of daylong workshops over a longer period? How many months of follow-up support are provided?
-
Quality. Specifies the educational and skill level of the T/TA provider and the quality of the training curriculum and other resource materials. For example, how much experience in the practical implementation of the enhancement or how skilled at using adult learning strategies should the trainer have?
Day-to-Day Implementation Activities
Developing Matters
Measuring Quality of Implementation
Measuring Fidelity
-
A measure exists that can be used as it is. Some enhancements may have measures that are suitable for capturing fidelity. For example, a quality enhancement for programs that do not use a specific classroom curriculum might be to adopt such a curriculum. If the evaluation of this enhancement strategy involves comparing adoption of the High/Scope curriculum with adoption of the Creative Curriculum, then fidelity measures that have been developed by the respective curriculum publishers can be used at Stage 1.1 Evaluators can determine whether the existing cutoffs provided by the publishers are meaningful in distinguishing what is happening in classrooms; they also could develop additional cutoffs based on Stage 2 analyses of how the fidelity measures related to intermediate and child outcomes. In addition, because these measures of curriculum fidelity are long, evaluators might determine at Stage 2 whether they can be streamlined for the Stage 3 field test.
-
A measure exists that has to be adapted slightly. In some cases, an existing fidelity measure may require only minor changes in the way that it is conducted or in the content of the items. For example, the PCER team studying Ready, Set, Leap! adapted an observational measure from a longitudinal study (Vellutino and Scanlon 2001) into a very detailed sampling procedure called Classroom Language Arts Systematic Sampling and Instructional Coding (CLASSIC; Scanlon et al. 2003; and Scanlon and Vellutino 1996 and 1997). When an existing measure is adapted, Stage 1 work could determine whether the measure taps the key dimensions of the activities that teachers should be doing; cutoffs for different levels of fidelity can be developed in Stage 2.
-
No suitable measure exists, but the structure or content of an existing measure can be adapted. The Early Language & Literacy Classroom Observation Toolkit (ELLCO; Smith and Dickinson 2002) is an observational measure of the classroom literacy environment with a structure that could easily be tailored to specific literacy enhancements. For example, the ELLCO, as it currently exists, would not be sufficiently tailored to an enhancement that focuses on dialogic reading, but the items in it could be altered to address the target behaviors while preserving its basic observation structure.2 Alterations would be made in Stage 1, and cutoffs would be determined in Stage 2 based on the relationship of the new fidelity measure to intermediate and child outcomes.
-
No suitable measure exists. During Stage 1, evaluators will work with program developers and Head Start staff to document exactly what they expect to happen in classrooms if a particular enhancement is implemented with a high degree of fidelity. One way to discuss this topic with program developers and front-line staff is to facilitate discussions about what would look different in a classroom if the classroom were to implement the enhancement with fidelity. What would an observer expect to see? What observation period would be necessary in order to determine whether what is happening is faithful to the model?
Intensity. How often do classroom staff perform the target behaviors with children? Are they done with the children in a group or in one-to-one interactions?
-
Duration. How long do target behaviors at the group or child level last? Are the behaviors sustained throughout the program year?
-
Quality. How well do classroom staff members implement the target behaviors? Is the quality of the implementation consistent across all staff?
- How much measurement development work is required?
- What are the training requirements for observers/coders?
- How long is each observation, and how often does each occur?
- How can the work conducted during the development stage inform reduction of the resources required to measure fidelity during Stages 2 and 3?
-
Inter-Rater Reliability. On each fidelity measure, can observers be trained to meet high standards of inter-rater reliability? (In other words, are all of them able to demonstrate that they are counting or rating what happens in the classroom in the same way?) Most researchers require that exact agreement or agreement within one rating point between observers correlate with each other at the .90 level to meet reliability standards.
-
Internal Consistency Reliability. Do the fidelity measures “hang together” in meaningful scales that have good statistical properties? Most researchers require that the intercorrelation among scale items reach at least .70.
-
Concurrent and Predictive Validity. Do the fidelity measures correlate with other measures of classroom quality (for example, the ECERS-R or the Arnett Caregiver Interaction Scale)? To provide evidence that the fidelity measures capture something important about classroom quality, they should correlate with other measures. Do the fidelity measures tap dimensions important for children’s outcomes? If so, they should correlate with or be predictive of child outcomes.
Intermediate Outcomes
-
Relevance and Sensitivity to Enhancement Goals and Potential Spillover. The measures should focus on aspects of the environment and adult behavior targeted by the specific enhancement, but they also should be broad enough to capture other changes that might occur. An enhancement that focuses resources and staff attention on building early mathematics skills may unintentionally reduce the frequency or quality of teacher-child interactions related to literacy activities. Including a global measure of classroom quality enables evaluators to determine whether implementing the enhancement results in any additional positive or negative effects on Head Start classrooms other than the ones directly targeted by the enhancement. The selected measures should have demonstrated sensitivity to these changes in inputs as staff training, education, and experience.
-
Adequate Psychometric Properties. All measures should have adequate reliability and validity for use in classrooms designed for children from low-income families and with program staff and parents who care for these children. In general, measures should have a demonstrated internal consistency reliability of .70 or higher. (This level is generally accepted as an adequate demonstration of reliability.) In some cases, however, reliabilities as low as .65 might be tolerable if better measures of the same construct are not available. In addition, measures collected through observation must demonstrate good inter-rater reliability. The general standard is an exact agreement or agreement within one rating point, or a kappa correlation between observers of .90 or higher. In some cases, lower values (.85, for example) may be tolerable if no other measure exists. It is important to include measures with demonstrated predictive validity (that is, significant correlations with child outcomes or other outcomes targeted by the enhancement) because they provide another gauge of the likelihood that the enhancement will be effective.
-
Previous Use in Large-Scale Surveys and Intervention Evaluations. To increase the comparability with other national studies and intervention evaluations, measures used in other national studies of similar populations (for example, FACES, the Head Start Impact Study, and the PCER project) should be selected. If a measure taps an important intermediate outcome but has not been used in a large study, evaluators should determine whether it has ever been used in settings similar to Head Start.
-
Reasonable Cost and Burden. The measures must be able to be administered reliably by trained field staff, rather than by highly experienced graduate students or evaluators. Most of the measures will be observational, and therefore potentially costly. Consequently, efforts to streamline the observation protocols and to reduce the length and complexity of observer training are critical. In addition, the intermediate outcome measures should impose minimal burden on Head Start children, parents, and classroom staff. At most, a few clarifying questions can be asked of staff or parents as part of the observational protocols, but minimal disruption of the setting is the usual standard for observational measures.
Child Outcomes
Measuring Nontargeted and Targeted Child Outcomes. At the least, selected child outcome measures should reflect the particular aspect of school readiness that the Head Start enhancement is targeting. However, a particular Head Start enhancement may affect, for better or worse, outcomes that the enhancement is not targeting directly. On the one hand, school readiness is a multidimensional concept (see, for example, Love 2003; McWayne et al. 2004; Raver and Knitzer 2002), and children’s progress in one area of readiness has been shown to predict readiness in other areas (see, for example, Hampton 1999; Ladd and Coleman 1997; Tramontana et al. 1988). Progress in a few targeted areas may therefore produce positive spillover into other, nontargeted areas of development (Bronfenbrenner 1979 and 1986; Zaslow et al. 1995). On the other hand, as a direct consequence of focusing an enhancement on one area of children’s development, the enhancement may focus less on other areas of development (relative to the regular Head Start control group), leading to negative spillover (or “displacement effects”) into nontargeted areas of development. Thus, an evaluation strategy should consider including measures of school readiness that may not be targeted directly by a particular Head Start enhancement, but that theory or practice suggests may nevertheless be affected by the enhancement.
Measuring Positive and Negative Child Outcomes. Outcome measures that cover all aspects of a given child development construct must be included. If they are not, it will not be possible to detect both improvements in positive outcomes and decreases in problem outcomes (favorable impacts) and decreases in positive outcomes and increases in problem outcomes (unfavorable impacts).
Measuring Narrow and Broad Constructs/Outcomes. Child outcomes targeted by Head Start enhancements can be narrow, broad, or very broad. The Head Start Child Outcomes Framework delineates narrow constructs ("indicators," such as confidence), broad constructs ("domain elements," such as self-concept), and very broad constructs ("domains," such as social and emotional development). Evaluators must clearly articulate not only which domain or domains of child development are likely to be distinctly affected by the enhancement over and above "regular" Head Start, but which constructs within a developmental domain (for example, at the domain element or indicator level) are likely to show impacts. The more narrow the construct expected to be affected by the enhancement, the more fine-grained or detailed the measure must be. For example, an enhancement that explicitly targets improving children's concentration will need a measure of "concentration" that is sensitive to the Head Start enhancement. A more global measure of "approaches to learning"—of which concentration is but one key ingredient—may or may not be sufficiently sensitive to, or distinctly affected by, the Head Start enhancement (relative to a regular Head Start control group). Of course, constraints relating to cost and respondent burden play a key role in selecting a measurement strategy. The fewer the resources, the more the evaluation may have to focus on measuring only those child outcomes—positive or negative, narrow or broad—most directly targeted by the Head Start enhancement.
ACTIVITIES AND DURATION OF STAGE 1
Level 1: Defining the Enhancement. Some enhancements may be in the very early development stages, with little more than a good idea based on a potentially strong intervention theory. These enhancements still are detailing key components, defining how the enhancement should function at full implementation, and articulating the theory of change.
-
Level 2: Documenting Implementation. These enhancement strategies will lack clarity in that they have not completely documented the theory of change and/or the implementation process. This category will include enhancements that are little more than good ideas that have yet to be implemented by any program, or that are practiced by one or a handful of programs managing mostly by intuition. It also will include enhancements that have some documentation, but not at a level necessary for effective replication.
-
Level 3: Developing Measures of Fidelity and Outcomes. Some enhancement strategies may have strong documentation and may have been adopted by multiple programs, but the measurement work to select or develop measures to assess the quality of implementation and the fidelity to enhancement has not occurred or is incomplete. Programs implementing these enhancements may be using available data to assess progress but have not developed standardized measurement for use over time or across programs to assess the success of implementation and/or replication.
-
Level 4: Replicating the Enhancement in a Variety of Settings. Documentation has been developed and measures for quality of implementation and fidelity to enhancement exist, but neither are being used to assess implementation of the enhancement in diverse sites. Enhancements implemented in fairly homogeneous settings may need expanded, measured replication that will provide critical feedback in refining documentation and measurement work to support future replication in a variety of settings.
Implementation Studies
Outcomes Studies
Products from Stage 1
Implementation Manual(s). The implementation manual should present the theory of the enhancement (by describing what it is and what it has been designed to accomplish), and a step-by-step guide to implementation from startup through ongoing activities. The implementation manual also should contain a measurement section that details the data sources and measures selected for assessing the quality of implementation, assessing the fidelity to enhancement, and examining intermediate outcomes. This section should include background information about each measure (how and why it was selected and/or developed), specifics about reliability and validity (psychometric properties, inter-rater reliability, and internal consistency), and the actual measurement instruments and data sources.
-
T/TA Guide. The T/TA guide may be either a separate document or part of the implementation manual. Regardless of the method of presentation, the guide must describe the details of the content, intensity, duration, and quality of both the initial and ongoing T/TA.
Implementation study findings. Findings from the implementation studies will summarize the implementation process and will discuss the challenges to implementation and lessons for replication. Implementation study findings should include a discussion of the overall quality of implementation and the fidelity to the enhancement model in both the original site and in new sites.
Outcome study findings. Findings from the outcomes study should report on intermediate outcomes, the outcomes’ association with the child outcomes of interest, and correlation of the outcomes to the level of quality of implementation and fidelity to enhancement measured in the study sites.
Entities Involved in Stage 1 Activities
Enhancement developers. In partnership with Head Start programs, enhancement developers are responsible for defining the key elements of an enhancement, the details of each element, and the methods of integration into the Head Start program. Enhancement developers also should formulate a clear motivation and theory of change. Some developers, such as curriculum design experts, may contribute to the selection or development of measures to test fidelity to the enhancement model.
-
Head Start program staff. Program staff may function as enhancement developers and, as such, would perform their roles either on their own or with the assistance of researchers. Ideally, Head Start program staff will be involved in the feedback cycles that contribute to implementation and documentation refinement based on their experiences with early rounds of implementation.
-
Research partners. Research partners will play a substantial role in measurement development by explicitly defining the theory of change, and by selecting and/or developing measures for each measurement domain. Researchers will also develop and conduct implementation studies in Stage 1, the findings of which would contribute to enhancement refinement and measurement testing. In addition, research partners will formulate and conduct outcomes studies in Stage 1 designed to measure the magnitude of any changes in intermediate and child outcomes that could be suggestive of effects of the enhancement. All of these activities will be stronger with strategic input from and discussion with Head Start program staff.
EXAMPLES OF STAGE 1 ACTIVITIES FOR THREE ENHANCEMENTS
Family Foundation Project, University of Arkansas for Medical Sciences (Little Rock, AR)
Level 1: Defining the Enhancement
Level 2: Documenting Implementation
Community resource assessment. Conducting a community resource assessment may be the first step necessary for new Head Start programs to determine whether the Work Program can be implemented using existing community resources, or whether this component needs full development driven by the Head Start program itself. The Work Program calls for case managers and job developers who can help Head Start parents to connect with jobs, support services, and other social services. Hiring, training, and housing these staff is likely to be beyond the resource capacity of many individual centers and/or Head Start grantees. The implementation manual will have to address critical start-up and sustainability issues, such as identifying and securing funding from state and local government, foundation, or private sector sources, and identifying case management and job development resources within the local area that can be linked with and, possibly, adapted to meet the criteria of the Work Program.
-
Staffing and services of the Work Program. The manual should address such topics as the Work Program staff’s qualifications, critical staff training elements particular to serving Head Start parents, a suggested ratio of case managers/job developers to the number of Head Start parents served, guidance on identifying and linking with potential employers and service agencies, and details about the types of job development resources that the Work Program should provide (for example, job listings, job skills workshops, and assistance with resumes).
-
Staffing and services of the Brief Parenting Interventions. This section should provide detailed information about such items as the qualifications of staff who will facilitate Brief Parenting Interventions, methods to increase participation in Brief Parenting Interventions, specific lesson and facilitation plans for specific Brief Parenting Interventions sessions, decisions about the timing and frequency of the Brief Parenting Intervention sessions, the suggested numbers of participants in the sessions and/or ways to adapt sessions to various group sizes, and identification and linkage of parents with parenting education resources in the community,
-
Staffing and services of the Fatherhood Initiative. This guidance should include details about identifying the appropriate Head Start staff to lead and implement the Fatherhood Initiative, training elements for staff, methods of engaging fathers in the Work Program and Brief Parenting Interventions, and developing and implementing father-centric activities in FFP components and/or throughout the Head Start program.
-
Potential for replication. FFP has been developed to serve a specific county in Arkansas, but it probably has broader applicability. Enhancement developers, with assistance from evaluators, should consider and document how the program parameters may vary by community or by particular Head Start program in order to provide thoughtful, comprehensive guidance that can support replication across a diverse set of programs.
-
Accessing T/TA: Implementation guidance should detail any on-going technical assistance that will be made available from the enhancement developers or, possibly, from Head Start regions and/or evaluators for the different components of the FFP. For example, FFP developers plan to have web-based training materials available for the Brief Parenting Interventions.
Level 3: Developing Measures
Examples of Brief Parenting Intervention Models and Measures
Activities and Length of Stage 1 for the FFP
English Language Learners Project, Community Development Institute (Denver, Colorado)
Level 1: Defining the Enhancement
Level 2: Documenting Implementation
-
What percentage of the Head Start student population is non-English speaking? What primary language do these children speak at home?
-
What does the current child assessment system look like? Does it include an assessment of a child’s status and progress in his/her home language? If it is not coordinated and comprehensive, it may be necessary to provide T/TA to help programs enhance their current assessment system for all children in order to achieve the goal that ELL children are assessed accurately. In the absence of a solid framework for child assessment, programs probably will not be able to achieve the thresholds for quality of ELLP implementation and fidelity that will be necessary to progress to the next stage of evaluation.
-
What is the Head Start staff’s level of knowledge of ELL assessment and instruction? To design initial T/TA plans that will raise the staff’s ELL knowledge to a standard level, T/TA providers will have to be aware of the range of staff ELL skills and abilities in ELL instruction. T/TA providers may be able to integrate the assistance of staff with strong ELL competency into training efforts.
-
General implementation guidance. The implementation manual might begin with a section on general implementation information that would outline such items as the center staff who must participate in ELLP training for the center to reap the full benefits of the enhancement, recommended distribution of ELL children across the classrooms in participating Head Start centers, and whether and how the ELLP approach can be adopted for use by programs with smaller numbers of ELL children.
-
ELL assessment. The assessment section should include examples of assessment tools and methods, accompanied by detailed descriptions of how to use them, when, and by whom.
-
ELL instruction. The instructional practices section would include specific lesson plans and/or classroom activities, with guidance on the flow and timing of each. This section must provide suggestions on how to integrate ELL methods with the classroom’s existing methods. Head Start teachers have multiple goals, so the ELLP should provide specific examples on how ELL methods can be integrated into existing activities and lessons, rather than displacing them, or how ELL can be targeted to individuals or small groups of children.
-
Engaging families. The section on engaging families should present the critical topics for communication with parents (for example, child assessment information and activities to promote family literacy activities) and methods for encouraging parental involvement in the Head Start program.
Level 3: Developing Measures
Activities and Length of Stage 1 for the ELLP
Violence, Intervention and Prevention Program, Circles of Care, Melbourne, Florida
Level 1: Defining the Enhancement
Level 2: Documenting Implementation
-
Counseling services. Circles of Care should define the framework for counseling services in terms of the qualifications of counselors, the types of counseling that should be made available, and the frequency and duration of services.
-
Case management services for fathers. Guidance should specify the qualifications of case managers, the ideal ratio of case managers to fathers, and the range of services and supports for fathers. Although services may vary from community to community, documentation should include some protocols for the quality, frequency, and duration of interactions between case managers and fathers. It will also be critical to describe outreach techniques for engaging fathers and sustaining contact with them over time.
-
Building Strong Families. Documentation for the Building Strong Families component may be the most comprehensive to produce. This component will have to include a training section that details the qualifications of trainers, the content of training, and the intensity and duration of training to prepare staff members who will administer this parent instruction component of VIP. Documentation also will have to include instructional guides for staff who will facilitate Building Strong Families workshops or sessions with parents that detail the topics, learning objectives, and activities to engage parents in discussion.
-
Level 3: Developing Measures
-
Activities and Length of Stage 1 for the VIP Project
It is not possible to evaluate rigorously every enhancement idea or strategy. Even though a planned variation evaluation design provides the opportunity to test multiple conditions at once, important decisions and choices about what to evaluate still must be made. A development stage can help to identify the ideas that are backed by well-defined theories of change; feasible measurement frameworks to assess changes; and clear, thorough documentation for implementation in broad and diverse settings. In addition, descriptive outcome studies in Stage 1 should be suggestive that the enhancement has the potential to produce positive improvements in selected child outcomes.
The development stage ensures that the techniques for full and successful implementation have been refined, and that researchers can accurately measure the quality of implementation and the fidelity to the enhancement’s intended goals. An intervention can fail for two main reasons: (1) the theory underlying the intervention is flawed, or (2) the intervention has been implemented poorly. Understanding the challenges of implementing an enhancement in the real world is important, but a test of a poorly implemented enhancement may provide misleading results. During the development phase, researchers and program developers will test approaches to implementation to ensure that programs can both implement the quality enhancement well and maintain fidelity to the model. Especially if broader implementation is anticipated, a careful study of implementation and fidelity can provide useful information for future training and technical assistance (T/TA) efforts. It can also provide information about the kinds of variation in implementation to be expected across different types of programs.
Stage 1 can also serve as a continuous feedback cycle for positive program improvement. While programs are defining implementation procedures, they also are continuously refining the enhancement model by adjusting implementation in the face of unexpected challenges or circumstances. In this same way, Stage 1 also may demonstrate that an enhancement cannot be replicated in all Head Start programs, but that it has value that could be relevant for specific population or risk subgroups.
The challenge, and the ultimate value, of the development stage is to take the quality enhancement ideas bubbling up from programs, curriculum developers, researchers, and others, and to move them toward greater clarity and replicability. Thoroughly testing, documenting, and measuring implementation during Stage 1 will improve the overall quality of a rigorous evaluation in Stages 2 and 3 by limiting the likelihood of poor implementation, and by increasing the confidence in the results.
The goals of the development stage are to (1) define the enhancement by documenting the theory of change and by detailing the key elements that must be visible at full implementation, (2) refine and document the implementation process, and (3) establish measures that can be used to assess the quality of implementation and fidelity to the enhancement model. At the end of this stage, clear, thorough documentation of the enhancement must have been developed for use by other programs for replication, and by researchers to assess implementation as part of an evaluation. The extent of work required to accomplish these goals by any particular enhancement strategy will vary. Some enhancements may be fairly well documented and may already have been implemented in some Head Start programs. Nevertheless, a development period—the length of which may vary by enhancement—provides an opportunity to refine documentation, and to determine whether the enhancement can be implemented successfully by other Head Start programs.
At the outset, enhancements in Stage 1 should meet the initial criteria of relevance and distinctiveness. As such, each enhancement should be supported by a strong, relevant theory of change and should have key elements that set it apart from current Head Start practice. The first goal of Stage 1 is to ensure that an enhancement has a clear, comprehensive definition through a documented theory of change. A firm understanding of the theory of change will motivate staff, guide implementation, and focus eventual measurement selection and development. A clear and focused theory of change also can serve as a good “sales” pitch to other Head Start programs at the point of replication.
At this stage of enhancement definition, the theory of change should communicate three key components, as shown in Figure III.1:
What is the quality enhancement? What are the essential components that set this enhancement apart from current practice? What should this enhancement look like when fully implemented?
[D]
|
Note that Figure III.1 is not an actual example of a logic model, but is a broad representation of the key components of a theory of change and their connections. At this stage, a highly stylized logic model may not be necessary. What is necessary is a fundamental understanding of the underlying theory of the enhancement that can be communicated in clear, focused, descriptive documentation. Documenting the theory of change in Stage 1 is equivalent to telling the “story” of the enhancement—what it is, what changes it should generate in the program, and what improvement it should produce for Head Start children.
After the enhancement has been clearly defined, the next step is to understand the implementation process in detail, refine it, and document it. Implementation documentation refers to implementation manuals, plans for classroom activities, parent-training materials, teacher-training protocols, and other materials that specify the steps necessary to implement the quality enhancement with high fidelity to the enhancement model in a large number of Head Start programs.
Many innovative, exciting enhancement strategies already are under way throughout the Head Start community, but programs may be implementing these new ideas on the basis of their intuition, rather than on documentation. For the field to truly benefit from these innovations through evaluation, greater clarity in documentation is required. Without it, other programs will have difficulty replicating the techniques, which, in turn, may limit the potential for success. In essence, then, Stage 1 documentation activities are an opportunity for programs to share what they believe to be “best practices” while laying a firm foundation for Stages 2 and 3 that follow. The implementation process of quality enhancement strategies generally consists of three key components:
In Stage 1, it is the task of enhancement developers and/or program staff to specify each of these steps, and to then document them clearly and consistently for replication. Specification and refinement of the enhancement may be a somewhat experimental process in a development stage in which identifying the most effective and most efficient approaches to implementation is encouraged. In Stage 1, programs can test different approaches to implementation to identify the lowest-cost strategy that is effective in implementing the enhancement initiative to a high level of fidelity. Specification of the enhancement does not mean that, ultimately, the final model will be a one-size-fits-all model. Rather, the documentation will specify the dimensions that can be varied to accommodate differences in program resources, child and family diversity, or staff qualifications and/or composition, as well as the dimensions that are critical and that must be uniform in order to maintain fidelity to the enhancement model. Furthermore, even though Head Start programs share core requirements, each program is unique. As a result there may be a range of starting points for an enhancement, possibly determined through an initial program needs assessment before training begins.
In general, implementation begins with some form of initial T/TA to Head Start program staff who are expected to deliver an enhanced level of services to children and families. For example, delivering teaching services according to a new curriculum will require that teachers receive training in that curriculum. Delivering teaching services with appropriate responses to children’s behavioral challenges will require that teachers receive training in techniques for managing and responding to behavioral issues in the classroom. Improving links between the program and community-based health services could require the training of directors or key administrative staff in techniques for identifying community resources, initiating new relationships with health care providers, and developing partnership agreements to support new services.
A variety of strategies have been used to train Head Start staff to implement enhancement ideas. They include different options for training format, training intensity, and the skill level of the person conducting the training. Various approaches to providing resource materials for participant training have been used as well (for example, manuals, Internet-based resources, and distance-learning programs). These approaches differ in terms of cost (because of differences in the skill level of the trainers, the length of training, and other factors); they also are likely to differ in their effectiveness in conveying essential information about the enhancement to teachers or other staff who will implement the model.
The next step is to observe how well the training worked, and to continue to support effective implementation of the enhancement through ongoing T/TA. Some enhancement components or skills may take time to practice and incorporate into daily routines, but after the most important changes have been made, the program should be delivering an enhanced level of services consistent with the quality enhancement. Accordingly, the purpose of ongoing T/TA is to work with the teachers and program staff as they institute changes in the classroom or program to ensure that the enhancement is implemented to a high degree of fidelity. Technical assistance staff begin this process by assessing program services to determine how well they conform to the model. If inconsistencies are identified, technical assistance staff work with the teacher to further modify the classroom environment, activities, or behavior so that services adhere more closely to the model.
Documentation of both initial and ongoing T/TA produced in Stage 1 should specify the four key dimensions of the T/TA strategy:
The fundamental questions underlying the choice of T/TA strategies involve achieving a balance between the cost of the approach and the approach's effectiveness in ensuring that the quality enhancement initiative is implemented with a high degree of fidelity to the model. Understanding the range of potential implementation strategies and the relative costs and likely effectiveness of each one is essential for developing research designs that produce useful information for the Head Start community. The development stage provides an opportunity to change the intensity, duration, or quality of training in strategic ways to tailor the implementation. Some answers about implementation choices may be suggested in the development stage (for example, because a program that chooses a shorter initial training workshop than another program fails to meet fidelity thresholds). In contrast, other questions could be candidates for experimentation in the evaluation (for example, because two programs vary in the duration and/or intensity of the initial training while continuing to meet fidelity thresholds).
The last key component to the implementation process is the day-to-day implementation. Day-to-day implementation is the actual content of all T/TA as well as of the teaching, procedural, and resource guides for ongoing program activities. The content is the heart of the enhancement, and it will take considerable thought and consideration to document each aspect of it. The content for a curriculum will include class plans, teaching aids, classroom management techniques, the details of the timing and flow of activities, and details about lessons for a particular day and over time. For some enhancements, implementation may extend beyond the literal walls of the classroom or even the center. For example, some enhancement strategies, such as those designed to increase children’s access to health services, may involve the formation of partnerships with other local service providers.
After implementation has been thoroughly documented, the next goal of Stage 1 is to create a measurement framework to gauge the success of implementation, the effects of the enhancement on children’s environments, and ultimately, the effects on children’s development. This framework will include measures and methods to assess the quality of implementation, the degree of fidelity to the enhancement model, and intermediate and child outcomes (Figure III.2).
[D]
|
The implementation process, as discussed in the preceding section, refers to the inputs and steps necessary for putting the enhancement idea in place. Quality of implementation refers to the extent to which programs are able to bring together all the resources necessary (such as staff with sufficient qualifications and classroom materials) and to carry out all the steps necessary (for example, initial training, the formation of partnerships, and group supervision activities) to effectively reach the targets of change, and to implement the enhanced services as planned. Fidelity to enhancement is the degree to which the enhancement model delivers the enhanced services as intended. In other words, the targets of change look or function as should be expected after the enhancement has been fully implemented. For example, measures of implementation quality for an enhanced classroom intervention might include an assessment of the teachers’ qualifications, the teachers’ training on the enhancement, supervisors’ support, technical assistance from an outside trainer, and materials available in the classroom. To measure fidelity, teachers would be observed in the classroom to determine whether they are implementing the enhanced activities at the levels of quality and frequency expected. Intermediate outcomes differ from implementation and fidelity measures in that they provide more-global measures of change that are known to be related to child outcomes. If an enhancement affects the intermediate outcome measures, then the potential exists to affect child outcomes. Finally, child outcomes capture the changes expected to occur in children’s development and learning as a result of the enhancement.
To assess implementation quality, enhancement developers or evaluators can develop criteria that gauge how well each step is being implemented, based on plans, procedures, and other documentation. Criteria can be developed for each phase of implementation, from initial training to the intensity and duration of the enhanced service provision. Data on the processes of implementation would then be used to assess the extent to which resources have been brought to bear and implementation steps carried out according to plans and procedures. For example, to assess the quality of initial training, evaluators might compare the actual qualifications of trainers, content of training, intensity and duration of sessions, and teacher participation rates with those projected in the enhancement training plan. Some programs already may use available data to conduct their own self-assessments; these data could readily be adapted to assess implementation quality. Alternatively, if new measures are developed to assess implementation quality for an evaluation, they also may be useful for continuous program improvement efforts, and to identify programs’ technical assistance and training needs.
Depending on the resources available for the task, enhancement developers or evaluators could develop measures that are sensitive to fine-grained differences in implementation across programs, and that rely on multiple data sources, or they could develop measures that focus on assessing central features of the enhancement, and that require less-detailed information. Sets of Likert-type scales to rate each implementation step along various dimensions could provide a more in-depth analysis. Such an analysis might show, for example, that qualified trainers covered all the specified training topics, but that the training was shorter than expected, and that teacher participation was somewhat low. Alternative measures might consist of sets of “yes/no” indicators for whether specific steps were completed.
The National Evaluation of Early Head Start provides a useful example of how implementation quality measures can be developed and used in an evaluation. The implementation study conducted as part of the Early Head Start evaluation sought to examine all aspects of the comprehensive services provided through the program; thus, it probably was a larger, more complex effort than may be necessary for most quality enhancements. Nevertheless, the design and methodology of the implementation study could be adapted for more modest efforts. For the Early Head Start implementation study, researchers developed a set of 25 rating scales to assess implementation quality, referred to as “full implementation” in the Early Head Start evaluation (Paulsell et al. 2002). The scales were based on key requirements of the Head Start Program Performance Standards (U.S. Department of Health and Human Services 1996). Each scale contained five levels, ranging from minimal implementation (Level 1) to enhanced implementation (Level 5). Using a similar methodology to assess implementation quality of a Head Start quality enhancement, program developers or evaluators might use training plans and operations manuals to develop rating scales for key aspects of implementation (for example, initial training, support provided to teachers implementing the enhancement, and the frequency and intensity of enhanced services provision).
The rating process for the Early Head Start evaluation drew on multiple data sources (for example, semistructured interviews and focus groups conducted with staff, parents, and community partners; staff surveys; reviews of program records; and service use data) and summarized a large amount of detailed information about program implementation into a concise set of ratings. Evaluators aggregated the 25 ratings into a summary rating for each program area and an overall implementation rating for the program. Similarly, measures developed to assess the quality of implementation of Head Start enhancements in Stage 1 might draw on multiple data sources.
When a quality enhancement is fully implemented, it should be possible to observe changes to usual practice. Aspects of what children experience in Head Start classrooms or home environments should have changed due to the specific behavior, home, or classroom characteristic targeted for intervention. Measures of fidelity to the enhancement model quantify how closely those changes adhere to the “ideal” vision of how the enhancement should be implemented in the classroom or other setting. Because fidelity measures aim to quantify both aspects of the classroom or home environment and the behavior of teachers or others, the fidelity data usually will be collected through observation. Data will be collected to answer the following questions, among others: Did a target behavior occur? How often? How well was the target behavior carried out by the teacher or by an assistant teacher?
Because fidelity measures are designed to document the occurrence of specific behaviors or features of the environment targeted by the enhancement model, they must be tailored to the specific enhancement under study. The range of choices in existing measures will be restricted by the type of enhancement chosen; for example, if the enhancement is a classroom intervention, then evaluators must choose from among existing classroom observation measures. Evaluators will have to determine during Stage 1 whether they will adapt existing measures, or whether they will develop new measures that better tap the important features of the enhancement. The fidelity measurement status at the start of a study may fit one of four main variations: (1) a measure exists that can be used as it is; (2) a measure exists that has to be adapted slightly; (3) no suitable measure exists, but the structure and content of an existing measure can be adapted; or (4) no suitable measure of fidelity exists. Here, we describe these scenarios and their implications at each stage of evaluation.
The development or adaptation of a fidelity measure for a given enhancement, as well as its use in an evaluation, must meet a number of important criteria. The implementation framework described for T/TA is adapted here to apply to the fidelity measures. In addition to determining whether a target behavior occurred during a classroom or a home observation, measures of fidelity must meet the following criteria in order to document whether enhancements are implemented with fidelity at the classroom level:
The QRCs and the PCER grantees provide excellent examples of preschool experiments that have developed fidelity measures. The fidelity measures that each project team is using are tailored to the specific intervention under study and range from existing measures tailored to the study to measures developed specifically for the study. For example, in addition to adapting the CLASSIC measure to study Ready, Set, Leap!, the PCER team studying Building Language for Literacy developed a new fidelity checklist that observers use to code whether a list of specific activities occurred during the observation period. The experiences of the QRC and PCER grantees with these fidelity measures will provide ACF with important information about the challenges to measuring fidelity and the successes on which future Head Start enhancement studies can build. Evaluators studying future enhancements may be able to draw on or to adapt these measures to new research projects.
As we have discussed, evaluators must set a minimum threshold for determining whether children’s experiences are faithful to the enhancement model. An enhancement that has been designed to change a variety of behaviors requires many fidelity measures and either a minimum threshold for each measure or an overall threshold. A narrowly focused enhancement may require only one fidelity measure and one cutoff for determining whether the implementation is faithful to the model.
Cost Considerations. Clearly, the cost of measuring fidelity will be determined by the specific approach taken, whether a preexisting measure of fidelity can be used, the training requirements for the measure, and the frequency of the observations. The following cost-related questions are the main ones:
Measures of fidelity to the enhancement model, based on observations of classrooms and teachers or of parents and home environments, are expensive to develop and implement. Observational measures may be supplemented with teacher or parent logs that record the frequency with which these individuals perform target behaviors every day. Depending on the type of enhancement, reviews of teachers’ lesson plans may provide additional information about fidelity. However, the validity of the self-report measures must be tested against an observational measure. For most enhancements, any existing measures considered for adoption will likely be too general to capture the essence of the enhancement; as we discuss in the following section, they may be more suitable as intermediate outcome measures. For example, the fidelity of a literacy intervention that focused on teachers’ regular use of dialogic reading techniques (Whitehurst et al. 1994) with individual children could not be adequately or efficiently measured by using broad-based measures of quality and the literacy environment, such as the ECERS-R (Harms et al. 1998) or the ELLCO. The PCER research team resolved this problem by working with CIRCLE to develop its measure of language and literacy activities in the classroom for use in a multi-site evaluation. Where possible, adapting existing measures is one way to reduce the measurement development costs of the research phase. Of course, for many types of enhancements, no measures are available, so they will have to be developed on the basis of Stage 1 collaboration among the enhancement developers, Head Start program staff, and evaluators. This process would build on hypotheses about what must take place in classrooms in order to provide evidence that teachers have incorporated the enhancement into their daily activities.
Implementing observational or other qualitative measures of fidelity well can be costly. To ensure that all observers/coders rate fidelity in a similar manner, the observers must establish inter-rater reliability with either the developers of the measure or with someone who knows the measure well enough to be considered a “gold standard” coder. As the fidelity measures are refined through input from Stages 1 and 2, evaluators will determine the most efficient ways to establish inter-rater reliability on these measures. They also will design observer training to address common problem areas. For Stages 2 and 3, the cost of establishing inter-rater reliability can be incorporated into the cost of conducting a centralized training of observers that includes reliability visits to community child care and Head Start facilities during training. For example, High/Scope (2003) reports that training to acceptable levels of inter-rater reliability on its Preschool Program Quality Assessment (PQA), a comprehensive rating instrument “designed to evaluate the quality of early childhood programs and identify staff training needs,” takes three days.3 High/Scope researchers recommend that the first two days include review and time to practice the items using videotapes of early childhood settings and actual visits to preschool programs. The third day is to be used for a full observation (with half the day spent observing a classroom to complete the classroom items, and half spent conducting interviews to inform the agency items).4
Most measures of classroom quality are conducted in two to four hours. Given that Head Start’s center-based services must be offered for at least three and one-half hours per day, observation for two to four hours may be sufficient for measuring fidelity. Some large-scale studies require two half-day observations to ensure reliability, but the majority of studies require only one. For enhancements that focus on what parents do with children, the fidelity measure requires either an observation in the home or the use of a structured interview. When the enhancement is parent-focused, the evaluators must determine how to sample the children in the classroom, as home observations are very expensive.
Reliability and Validity. To choose among existing fidelity measures, evaluators must determine whether the measures have sound psychometric properties. In the absence of existing measure or after a measure has been adapted, evaluators must gather information about the measure’s reliability and validity. The most important characteristics are:
Intermediate outcomes are outcomes affected by the enhancement prior to its influencing child outcomes. Intermediate outcomes are global measures reflecting characteristics or conditions of the center, classroom, or family environment that are likely to change (intentionally or unintentionally) due to the enhancement. Intermediate outcomes also are theorized to predict child outcomes. Thus, it is through changes in intermediate outcomes resulting from implementing a Head Start enhancement that children are presumed to be affected by the enhancement. For example, intermediate outcomes germane to teacher- and classroom-focused enhancements include changes in the classroom environment and teaching practices that result from the teacher- or classroom-focused enhancement. Intermediate outcomes germane to a center-level intervention (such as management training of Head Start directors) include subsequent changes in center operations and management practices that result from the center-focused enhancement. If teacher- or classroom-level outcomes also change as a result of the center-focused enhancement, then an evaluation of a center-focused enhancement should plan to measure the classroom-level intermediate outcomes as well. Intermediate outcomes relevant to a parent- or family-level enhancement (for example, educating parents about activities and parent-child interactions that support children’s language development and literacy) include subsequent changes in parenting behavior and other aspects of the child’s home environment that result from the family-focused enhancement.
Intermediate outcome measures are designed to (1) capture broader aspects of change that will not be captured by the implementation and fidelity measures, (2) provide data using a measure with a proven link to targeted outcomes in children and adults, and (3) allow for comparison with other studies that have used the measure. In an enhancement that focuses on dialogic reading, evaluators might complement an observational fidelity measure of the number of times and duration that teachers used dialogic reading with a proven measure of classroom quality, such as the ECERS-R (Harms et al. 1998) or the Assessment Profile (Abbott-Shim and Sibley 2001). Including intermediate outcome measures enables evaluators to study why an enhancement might be effective in one classroom but not in another. For example, a classroom that implemented the dialogic reading enhancement with high fidelity according to a narrow measure of fidelity might not experience any other changes in overall classroom quality. The effect of the enhancement in that classroom might be less than the effect of the enhancement in a classroom that had both high fidelity and an overall increase in observed quality. Implementation study data could be used in this case to determine what else may have changed in the setting that increased classroom quality relative to the one that did not increase it. Choosing an intermediate outcome measure that has been widely used in research also provides a way to benchmark Head Start findings against other studies. For example, it would be possible to determine whether a classroom launching an intervention had a higher score on the ECERS-R than did the typical Head Start classroom measured in the Head Start FACES study, and whether its score was higher than the typical preschool classroom studied in a state pre-kindergarten study.
The specific methods and measures for assessing intermediate outcomes will be tailored to the enhancement under study. We expect that the four main categories of intermediate outcomes are those that assess changes in (1) the knowledge and skills of adults (directors, education coordinators, teachers, and parents); (2) classroom quality and what teachers do in the classroom with the children in their care; (3) the home environment and what parents do with their children; and (4) program partnerships related to providing additional services to children and families. Many potential intermediate outcome measures exist, but a goal of the development stage is to identify or create measures that meet several criteria:
As part of Stage 1 activities, evaluators, program staff, and enhancement developers will discuss their theories about the avenues through which the enhancement will affect children, and the ways in which the enhancement might spill over beyond its more narrow targets to affect other aspects of classroom quality, adults’ skills and knowledge, and other important aspects of program quality. After review of existing measures for their coverage of the identified areas, a consensus on which measure or measures to use must be reached. In the absence of an existing measure, evaluators will have to either adapt an existing measure or develop a new one.
A critical first step in selecting outcome measures is to clearly articulate hypotheses about how the Head Start enhancement is expected to affect children. Which child outcomes in which domains (whether targeted or not) are expected to change as a result of a child’s one-year exposure to the Head Start enhancement? The same criteria described for the selection of intermediate outcomes also apply to child outcomes. Specifically, measures of child outcomes must be relevant to school readiness goals; have demonstrated sensitivity to enhancement goals; be appropriate for use with a culturally diverse, low-income population; have adequate psychometric properties; be able to be administered with reasonable cost and limited burden to Head Start children, parents, and program staff; and, preferably, have been used in previously conducted large-scale surveys and intervention evaluations. The child outcome measures also must be valid and reliable for the intended mode of administration.
In addition to using sound, high-quality measures of important child outcomes, evaluations of Head Start enhancements must have a sound overall measurement strategy delineating the types and characteristics of outcomes that are important to measure. Several additional measurement issues should be considered as researchers and Head Start program partners begin identifying or creating a set of outcome measures to gauge the effects of a particular quality enhancement:
Evaluators also face considerations regarding the mode of measurement as they select child outcome measures. Specifically, they must determine the best data collection mode for a given outcome, the cost of collecting the data using the preferred mode, and whether measurement trade-offs exist such that a less preferred mode might be chosen instead of a more-costly preferred one. Some measurement modes are not interchangeable. For example, because some outcomes lack valid, reliable parent report measures, another mode must be used. In addition, the selected mode may constrain the types of outcomes that can be measured.
The timing of measurement must be considered when selecting child outcome measures. The frequency of measurement will affect costs, and measurement duration (with one or more follow-up periods after Head Start completion) will influence the use of particular measures. Some child outcome measures (for example, the PPVT-III, the Woodcock-Johnson Tests of Achievement, and the Child Behavior Checklist) are suitable for use with both preschool children and with elementary school-age children. In other cases, it may be necessary to use measures tapping newly relevant developmental or school performance constructs (for example, actual reading ability, rather than measures of pre-reading skills and knowledge) in order to collect data on relevant child outcomes during the elementary school years. Deciding when to measure child outcomes will depend largely on (1) the particular enhancement being implemented, (2) the theory explaining which aspects of child functioning the enhancement is likely to affect (in the short and the long run), and (3) the value placed on including measures of known predictors of later school success or failure (regardless of whether the enhancement explicitly targets them).
During the development stage, evaluators will have to integrate decisions about the type of child outcome measures (targeted/nontargeted, positive/negative, and narrow/broad), the timing of measurement, and the measurement mode to develop the outcome measurement framework. If suitable outcome measures that are sensitive to the effects of the enhancement do not exist, then evaluators will have to conduct their measurement development work during this stage, before beginning the evaluation.
Members of the Head Start community currently are practicing numerous enhancement strategies, and additional ideas undoubtedly are percolating each day. Because these ideas and strategies are at different points along the continuum of achieving the goals of a development stage, the specific activities and duration of a Stage 1 will vary with the enhancement strategy under consideration. In general, we expect that potential enhancement strategies may be at one of four levels in achieving Stage 1 goals. The levels are not necessarily mutually exclusive, and we expect that many enhancements would be placed in one or more of them.
We expect enhancement strategies that are at Level 1 (defining the enhancement) to require between one and one-half to three years to achieve the goals of Stage 1. These enhancements will benefit from a planning phase devoted to defining the details of the key components, and to documenting the theory of change. This early planning phase may last six to nine months. Another six to nine months may be necessary to refine documentation, and to develop and test measures through implementation in a number of early sites. Additional replication might be necessary for testing implementation and measurement in a variety of settings. Depending on the documentation’s level of thoroughness and clarity and the diversity of early implementation settings, enhancements at Level 2 (documenting implementation) may take one to two years to achieve the goals of Stage 1. Enhancements at Level 3 (developing measures) may require one to one and one-half years to complete measurement work and to test replication. Those at Level 4 (replication) may take as little as six months or as long as one year to assess replication on a small scale before achieving Stage 1 goals.
Enhancements may be considered for a small-scale evaluation (Stage 2) when they have a well-defined theory of change; clear, thorough documentation of implementation; and a sound measurement framework that has been tested on a small, diverse scale. Implementation and outcome studies conducted during Stage 1 will determine the extent to which these criteria are met.
During the development stage, implementation studies will help to refine documentation, and to assess replication. The focus of an implementation study will depend on where the enhancement strategy lies along the spectrum of the four levels of achieving Stage 1 goals.
Levels 1, 2, and 3. The goals of an implementation study for enhancements at the first three levels will be explorative. Their purpose will be to determine the details of the implementation process and the lessons that will help to produce clear, thorough documentation, and to formulate quality and fidelity measures. The key research questions for these initial implementation studies are the following: (1) What are the processes of implementation, including initial training and ongoing technical assistance? and (2) What lessons can be drawn from the program’s implementation experiences?
Collecting information about lessons from a program’s early implementation experiences is especially important during this early stage of evaluation. Conducting interviews and focus groups with staff and technical assistance providers can identify implementation challenges and strategies that have the potential to resolve those challenges. Similarly, staff can provide important information about the usefulness of their training and support, as well as about aspects of implementation in which they need additional support. This information can be used to refine and strengthen implementation plans and strategies for use by other programs, and to determine the types and intensity of T/TA that staff must receive if they are to do a good job of implementing the enhancement strategy.
Level 4. Level 4 enhancements will require an implementation study that is more evaluative in assessing replication and the effectiveness of quality of implementation and fidelity measures. This type of study may look very similar to an initial explorative study, and it will contribute to the feedback cycle for continuous program improvement and refinement of documentation. However, it also will include the use and testing of the specific quality and fidelity measures under development. The following two research questions will be added to the initial implementation studies’ key research questions: (1) What is the quality of implementation? and (2) What is the degree of fidelity to the enhancement model?
Measurement experimentation. The Stage 1 development phase for new fidelity measures can allow for exploration of alternative measurement methods that can cost-effectively provide accurate measurement for the subsequent stages. For example, in the development stage, a fidelity measure that requires direct observation of the classroom at multiple points in time may be used in tandem with a staff survey about attitudes and beliefs. If the two measures demonstrate high correlations with each other, it may be possible to use only the staff survey during the later evaluation stages, when cost parameters may be more restrictive. Alternatively, experimentation in Stage 1 may determine that a short version of a detailed observation tool may function equally or nearly as well as the longer version. Information collected through an evaluative implementation study can inform these decisions.
Data Collection Methods. Table III.1 provides a detailed set of implementation study topics that could be included in both explorative and evaluative implementation studies. The table indicates the implementation study questions (about process, quality, and lessons learned) that each topic addresses, along with their possible data collection methods. Depending on available resources, one or more data collection methods could be used. The use of multiple methods during the development stage would enable evaluators to collect more-detailed information about key topics from a variety of respondents, and to triangulate findings across several data sources. For example, teachers may report in a staff survey that, after initial training, they still did not feel adequately prepared to implement the enhanced services. During a focus group with the teachers, evaluators could explore the aspects of training that appeared to be inadequate and could seek ideas about ways to improve the training. Interviews with technical assistance providers could yield additional information about teachers’ training needs, perhaps based on perceptions of the teachers’ educational levels and learning styles.
| Implementation Study Topics | Implementation Study Questions | Data Collection Methods | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Imple-ment-ation Processes | Imple-ment-ation Quality | Lessons Learned | Direct Observation of Service Delivery | Staff Survey | Parent Survey | Program Records | Semi structured Interviews with Staff or T/TA Providers | Focus Groups with Staff or Parents | ||
| Program Characteristics | Where is the program located? | X | X | X | X | |||||
| What is the program's size? | X | X | X | X | ||||||
| What type of agency operates the program? | X | X | X | X | ||||||
| How many years has the agency participated in Head Start? | X | X | X | X | ||||||
| Target Population | How does the program decide which families receive enhanced services? | X | X | X | X | X | X | |||
| What are the characteristics of the families and children served? | X | X | X | X | X | X | ||||
| Are services provided to children, parents, or both? | X | X | X | X | X | X | X | |||
| Staff Characteristics and Roles | Who provides the enhanced services? | X | X | X | ||||||
| What are their qualifications? | X | X | X | X | ||||||
| Who supervises and supports these staff? | X | X | X | X | ||||||
| Which staff participated in training? | X | X | X | X | ||||||
| What is the rate of staff turnover? | X | X | X | X | ||||||
| How well do staff understand the enhancement strategy? | X | X | X | X | ||||||
| Initial Staff Training | What was the content of training? | X | X | X | X | X | X | |||
| What were the qualifications of trainers? | X | X | X | X | ||||||
| What was the intensity and duration of training? | X | X | X | X | X | |||||
| How well did training prepare staff to provide the enhanced services? | X | X | X | X | X | |||||
| How could training be improved? | X | X | X | |||||||
| Providing Enhanced Services | What enhanced services are provided to parents and children? | X | X | X | X | X | X | X | X | |
| How often are the services provided? | X | X | X | X | X | X | ||||
| What is the intensity of service delivery? | X | X | X | X | X | X | ||||
| How long are the services provided? | X | X | X | X | X | X | ||||
| Are staff able to provide enhanced services at the intended intensity and duration? | X | X | X | X | X | X | ||||
| If not, why not? | X | X | X | |||||||
| Technical Assistance and Support | What support do staff receive in providing enhanced services? | X | X | X | X | X | ||||
| Do supervisors observe service delivery and provide feedback to staff? How often? | X | X | X | X | X | X | ||||
| Does the program receive regular technical assistance? How often? | X | X | X | X | X | |||||
| What topics have been covered? | X | X | X | X | ||||||
| How helpful is the technical assistance? | X | X | X | |||||||
| What other types of support do staff need? | X | X | X | |||||||
| Lessons Learned | Which aspects of implementation have worked well? | X | X | X | X | X | ||||
| What factors facilitate implementation of the enhanced services? | X | X | X | X | ||||||
| What challenges have staff experienced in providing enhanced services? | X | X | X | X | ||||||
| What strategies have staff used to overcome the challenges? | X | X | X | |||||||
| What are the staffs’ views on the enhancement strategy? | X | X | X | X | ||||||
| What are families’ views on the enhancement strategy? | X | X | X | |||||||
| How could the enhanced services be improved? | X | X | X | X | ||||||
An initial outcomes study in the development stage will begin to lay the framework for measuring intermediate and child outcomes in a rigorous evaluation. These studies will examine selected intermediate and child outcomes that are expected to change based on implementation of the enhancement. They are likely to use a standard pre/post methodology to measure the outcomes early in implementation (for example, during the fall semester or at program entry), and again at one or more points after implementation (for example, during the spring semester or at the end of the academic year). Although it will not be possible to attribute changes in the intermediate and child outcomes to the presence of the enhancement (or to its absence, if a comparison site design is used), the findings may be suggestive of an enhancement’s potential for success. If numerous sites can be included in the outcomes study component, the analysis also should examine the correlations among the levels of the quality of implementation, fidelity to the enhancement model, and changes in intermediate and child outcomes. In addition, the study should examine how intermediate outcomes are correlated with child outcomes.
Measurement experimentation. During this stage, when there are few implementation sites and costs can be contained, evaluators should consider using more-intensive methods and several measures to examine, test, and select appropriate outcome measures (for both intermediate and child outcomes). Their goal should be to identify the soundest, most efficient, and, ideally, least costly measures for use in the subsequent, larger stages of evaluation. For example, to measure child outcomes, evaluators may choose to use both a teacher-child interaction measure and a teacher-report measure to tap the children’s attention, engagement in classroom activities, and positive teacher interactions. If the two types of measures demonstrate high correlations with each other and with implementation of the enhancement, the less burdensome, less expensive teacher-report measure may be a good alternative for use in Stages 2 and 3. Furthermore, if new measures have been developed or if existing measures have been substantially revised for the enhancement, Stage 1 provides the opportunity to test the sensitivity and pyschometric properties of “created” measures against existing normed measures.
At the end of Stage 1, a number of products should exist that fall into the categories of (1) implementation products and, (2) evaluation products. Implementation products are the documentation that will prove useful to other programs that decide to implement the enhancement strategy. Evaluation products should be designed to contribute to the knowledge of enhancements throughout the Head Start community, so that program administrators can make informed decisions about enhancement choices, and evaluators can make informed decisions about which enhancements merit more-rigorous study. Evaluation products should be disseminated more broadly than the detailed implementation ones.
Implementation Products
Evaluation Products. The evaluation products will report findings from the implementation and outcomes studies, either separately or combined.
The evaluation product (or products) should also make recommendations for continued research involving the specific enhancement strategy, such as the potential for rigorous testing based on changes in intermediate and child outcomes, or on potential changes in the enhancement strategy that should be explored before beginning more-rigorous testing.
The activities in the initial development stage would be carried out by a combination of Head Start program staff and/or enhancement developers and outside researchers, possibly connected with a university partner or research firm.
Many enhancements to Head Start programs begin with a planning phase that might resemble the development stage discussed in this chapter. In practice, however, planning phases can look quite different from one initiative to another. The staged approach to evaluation detailed in this report adds guidance and prescription to enhancement development by viewing numerous potential enhancements as part of a holistic research agenda, rather than as many independent initiatives, each following its own course. Although enhancements may develop through different methods and along different timeframes, the development stage adds a degree of uniformity by specifying the types of activities that must occur and the products that must be produced during this early stage. By the end of Stage 1, each enhancement will have the elements in place that are critical for replication and for progression to the next evaluation stage.
The types of activities for Stage 1 that we have detailed are being encouraged and supported through Head Start Innovation and Improvement (I&I) grants. These grants, totaling $2.9 million, have been awarded to a range of national, state, and local organizations to support one-year planning phases for strategies that have the potential to strengthen Head Start programs and services. In the remainder of this chapter, we use three enhancement ideas developed by I&I grantees to discuss the type and flow of activities and the development of measurement frameworks that could be expected to occur in a development stage as outlined in this chapter. Note that these ideas are the basis for our examples and have been elaborated upon to illustrate the steps necessary for Stage I, but the activities we discuss may not actually be in process in the exact manner they are presented here.
The University of Arkansas for Medical Sciences is conducting pilot-phase work under an I&I program grant to fully define the key components of the Family Foundations Project (FFP) for eventual implementation in Head Start programs throughout Pulaski County, Arkansas. The FFP has been designed to build family resources and supports by connecting targeted Head Start parents with job opportunities and community support services through a Work Program, by improving parental skills and practices through Brief Parenting Interventions, and by increasing fathers’ involvement through a Fatherhood Initiative. In this section, we describe the steps to be taken to define the enhancement, which is in the very early stages of development; to refine and document implementation; and to develop measures of implementation, fidelity, parent, and child outcomes.
Motivation and goals. The University of Arkansas for Medical Sciences initiated the FFP in response to the specific vulnerabilities of Early Head Start and Head Start families in Pulaski County. The county has higher-than-average teenage birth and child poverty rates (38 births per 1,000 teens and 21.3 percent poverty, respectively). Eighty percent of the children in the targeted Early Head Start and Head Start programs live in single-parent families. Many county residents have poor access to health care; families are at risk for financial instability and experience high unemployment. In addition, 18 percent of pregnant women tested positive for illicit drug use across 16 public health clinics and 2 private clinics in the county. Combined, these factors explain the limited resources of Early Head Start/Head Start families and the low level of the parents’ involvement in the educational lives of their children. FFP developers believe that increasing the resources available to Head Start families through the Work Program and increasing the engagement of families in Head Start will increase the parents’ availability to, engagement with, and responsibility for their children, thereby minimizing the potential effect of multiple risk factors on child outcomes.
The goals of the FFP are to increase parents’ capacity to provide financial support and resources for their children, improve the quality of parent-child interactions and relationships, and increase fathers’ involvement with their children and with the Head Start program. In FFP sites at full implementation, we would expect to observe parents of Head Start children, particularly those in specific subgroups (fathers, mothers in substance abuse treatment programs, and teenage parents), connecting with community support services and jobs through the Work Program’s job development, case management, and job coaching services, and participating in Head Start’s Brief Parenting Interventions.
Enhancement components. At the time of the University of Arkansas for Medical Sciences I&I proposal, the FFP was little more than a conceptual framework. The components of the FFP—the Work Program, the Brief Parenting Interventions, and the Fatherhood Initiative intended to integrate noncustodial fathers into FFP activities—had been outlined, but the proposal for the planning grant focused on the core development work necessary to completely define this enhancement. For example, the details of the Work Program still must be finalized. These details relate to where the Work Program will be housed, how it will be staffed (the composition and qualifications of program staff), which services will be provided and how, and how the program’s funding and resources will be sustained. A Steering Committee will accomplish part of this work by establishing workgroups to target specific development areas (for example, adult education, corporate/marketing, and relationships with employers). As with the Work Program, the Brief Parenting Interventions and the Fatherhood Initiative also need specification. To help them to develop the Brief Parenting Interventions’ content and the Fatherhood Initiative’s methods of parent engagement, FFP developers have proposed conducting focus groups during the planning period with the targeted at-risk parent populations (fathers, mothers in substance abuse treatment programs, and teenage parents); the goal of the focus groups will be to identify parenting beliefs, norms, values, and areas of concern. In addition, the Fatherhood Initiative still has to identify and define father-centric activities that can be incorporated into the Work Program and Brief Parenting Interventions, determine whether father-only activities also may be provided, and identify those activities.
Theory of change. In addition to defining the components of the enhancement, enhancement developers must formulate an initial framework for the theory of change as part of the overall definition of the enhancement. The measurement work takes place later in the development stage and will involve researchers who will thoroughly define and specify a logic model for an evaluation; during this early phase, however, FFP enhancement developers should define their theory’s basic elements about how the FFP could improve child outcomes for Head Start children. For example, FFP developers have suggested that Head Start staff and the highest-risk Head Start families are the primary direct beneficiaries of FFP. According to the theory of change, these groups may therefore be the targets of change. Specifically, the FFP is intended to enhance staff skills in parenting education and staff resources for use in connecting families with jobs and community services. In addition, FFP is to target high-risk Head Start parents in order to increase the parents’ engagement in parenting workshops, improve their links with community and job resources, and increase their involvement in the educational lives of their children in Head Start.
After the targets of change have been identified, FFP developers should consider both the intermediate and child outcomes that they believe FFP will affect. To identify intermediate outcomes, the developers will find it helpful to examine the specific targets of change. FFP’s intermediate outcomes should encompass the program’s goals for Head Start parents who participate in FFP, such as increased job entry and retention rates, income gains, increases in social support networks, increases in parenting skills and knowledge, and improved parent-child interactions. Child outcomes are not specifically targeted by FFP but are likely to result from the intermediate changes in families. For example, increased father involvement can lead to improved child cognitive and social outcomes (Fagan 2000; Nord and West 2001; Nord, Brimhall, and West 1997; Parke et al. 1992; Tamis-LeMonda et al. 2004), and higher family income is associated with larger child vocabulary and better school achievement (Duncan and Brooks-Gunn 1997; Hart and Risley 1995; Future of Children 2005). The FFP intervention seeks to influence the child’s home environment, but not to achieve specific changes in that environment. Because the intervention does not include components that directly focus on the child, changes in child outcomes depend on the extent to which the FFP intervention changes the child’s home environment.
The process of defining the FFP enhancement more fully will provide a framework for the comprehensive documentation necessary to effectively replicate and evaluate the FFP. However, the necessary additions and refinements are likely to become evident only with the implementation of FFP in the Head Start programs throughout Pulaski County.
A development stage for FFP will have to produce a number of “implementation” products to fully describe both the components of FFP and the choices and decisions associated with implementation of each component. These products may or may not be stand-alone documents; we discuss them separately for the purpose of establishing the importance of each one. Ultimately, it will be necessary to integrate these pieces in order to present the holistic approach of FFP as a Head Start program enhancement.
Implementation Manual: FFP developers will have to produce an implementation manual explaining how to establish and administer the FFP by detailing implementation of the three key components—the Work Program, Brief Parenting Interventions, and the Fatherhood Initiative. The Work Program component of FFP may pose specific challenges when replicated among Head Start programs more broadly and will therefore need thorough documentation of the different avenues that programs may pursue to ensure full implementation. An implementation manual for the FFP should include the following elements:
T/TA Guide for Brief Parenting Interventions. In addition to an implementation manual, a T/TA guide will have to be developed for the FFP’s Brief Parenting Interventions. This guide will provide detailed information for the trainers who will train Head Start program staff. Guidance for trainers will have to include details about the necessary qualifications of the trainers themselves, the Brief Parenting Interventions parenting learning strands (training content), which Head Start staff are to be trained, culturally appropriate methods of Brief Parenting Intervention facilitation that may vary with the predominant language/culture of Head Start parents and children, and the intensity and duration of Brief Parenting Interventions training for staff. As with any training, different options should be presented so that trainers may accommodate the needs of different Head Start programs.
After the enhancement has been clearly defined, FFP developers and evaluators can begin creating the measurement framework. This work can be accomplished simultaneously with documentation creation and refinement. After this step has been completed, the measures may undergo early testing in the FFP implementation sites in Pulaski County.
Quality of implementation. Measures of the quality of implementation for FFP should focus on how well the sites meet the parameters of implementation as established by the developers. Although the routes to FFP implementation may differ among the sites, there are likely to be critical standards for the qualifications of staff, the breadth of the network of potential employers and service providers for referrals, the quality of staff training sessions, and the quality of the activities of the Fatherhood Initiative (Table III.2). Measures of the quality of implementation typically can be obtained and observed by conducting a qualitative study of site activities. In the case of FFP, it would be important to observe initial staff training sessions so as to gauge content coverage, the level of staff participation, and the intensity of that participation. Other measures could be obtained through observation and semistructured interviews with program staff, administrators, and community partners. Thresholds for the quality of implementation measures must be set with care, particularly in the case of those that measure the breadth of the potential employer/community support network and methods of engaging targeted Head Start parents. Because the FFP intervention reaches children only through the work of engaging and supporting parents, the performance bar in these areas will have to be set rather high if the intervention is to produce its intended effects on child outcomes.
Fidelity to enhancement. Fidelity to enhancement measures should capture whether the FFP in each site functions as intended, and whether the FFP’s targets of change accomplish what they are intended to do. For example, the staff-parent interactions should have the content, frequency, and duration expected with full FFP implementation; Work Program staff should make the link to jobs and community resources for Head Start families; the appropriate program staff should have a high level of competence in parenting education; Brief Parenting Interventions sessions should address parents’ needs for content and frequency; and the targeted Head Start families should participate in the Work Program and the Brief Parenting Interventions (Table III.2).
In many cases, fidelity measures will have to be created for the specific enhancement, and they will have to rely on direct observation. It may be possible to adapt existing measures for some of the fidelity measures, or to use existing principles when creating them. For example, when developing measures of the quality of interactions between parents and Work Program case managers or job developers, FFP developers and evaluators might refer to studies of employment programs that serve disadvantaged populations for guidance on which key elements to assess, and how to conduct those assessments. If systems already have been developed to track information specific to the Work Program and to engage parents in Brief Parenting Interventions sessions, it may even be possible to capture some FFP fidelity measures through administrative data. For example, evaluators should be able to use administrative data to determine the number and quality of links to jobs and community resources for targeted Head Start parents and to document the degree to which targeted parents participate in FFP components.
| Concept Domain | Potential Measurement Method Existing Tool | |
|---|---|---|
| Quality of Implementation | Work Program and Brief Parenting Interventions staff meet qualifications for positions. | Resume review; interviews with staff during site visits |
| Resources to support the Work Program and/or connections built with necessary community partners are in place to effectively implement and sustain the Work Program. | Interviews during site visits | |
| The work program has established effective links with the intended number/variety of potential employers and community service providers. | File/data review; interviews with staff, employers, and community service providers during site visits | |
| The ratio of Work Program staff to Head Start parents meets parameters. | Observation during site visits | |
| Brief Parenting Interventions staff training sessions meet quality parameters for qualifications of trainer/facilitator, frequency, and duration. | Observation and interviews during site visits | |
| Multiple methods of engaging targeted parent populations are in place. | Observation and interviews with staff during site visits | |
| Fatherhood Initiative activities meet quality parameters for staff qualifications, content, frequency, and duration. | Observation and interviews during site visits | |
| Fidelity to Enhancement | Interactions between Work Program staff (case managers and for job developers) and Head Start parents meet quality parameters for content, frequency, and duration. | Observation during site visits |
| The intended number and quality of job links for Head Start families have been made (staff skills). | File/data review; focus groups with Head Start parents and employers | |
| The intended number and quality of links between Head Start families and community services/resources have been made (staff skills). | File/data review; focus groups with Head Start parents and service providers | |
| Head Start program staff has competency in parenting education. | Post-test of staff trainees tailored to specific Brief Parenting Interventions concepts (must have knowledge of a specified percentage of training content) | |
| Brief Parenting Interventions sessions for parents meet quality parameters for content, frequency, and duration. | Observation during site visits | |
| The overall level of Head Start parent participation in the Work Program and Brief Parenting Interventions is as intended. | Administrative data observation | |
| The level of participation in the Work Program and Brief Parenting Interventions by targeted parent groups (fathers, mothers in treatment programs, and teenage parents) is as intended. | Administrative data observation | |
| Intermediate Outcomes | Employment entry, retention, and income of Head Start parents. | Program administrative data UI data |
| Level of involvement in Head Start program. | Parent Involvement and Satisfaction with Head Start (FACES Spring 2003) | |
| Breadth of social support network. | Carolina Parent Social Support Scale | |
| Parenting knowledge and skills of Brief Parenting Interventions participants. | Post-test of parent participants tailored to specific Brief Parenting Interventions concepts (must have knowledge of a specified percentage of content) | |
| Increased frequency and duration of parent-child reading time. | FACES Parent Interview "Activities with Your Child" | |
| Parenting stress. | Third Edition | |
| Household conflict. | FES Conflict Items | |
| Father presence and involvement with child. | Father Activities with Child-Early Head Start Father Study Measures | |
| Increased parenting support and warmth; decreased detachment and intrusiveness. | Parent-Child Interaction Task (NICHD) Parent-Child Relationship Scale (NICHD) |
|
| Child activity level (minutes in brisk activity and/or television viewing, other media). | Parent report; activity diaries | |
| Number of meals from fast food restaurants. | Parent report | |
| Child Outcomes | Child vocabulary. | PPVT-III |
| Book knowledge (interest in books and reading-related activities, such as listening to and retelling stories and pretending to read). | - Story and Print Concepts (FACES) - COR Language and Literacy Scales |
|
| Print knowledge (recognition of words as a unit of print, increased ability to associate spoken words with written words, and increased awareness of the mechanics of reading). | - Story and Print Concepts (FACES) - Pre-CTOPPP Print Awareness Subtest - Woodcock-Johnson III Tests of Achievement Letter-Word Identification Test - TERA-3 |
|
| Alphabet knowledge, phonological processing, early reading. | - Letter Naming Task (NRS) |
|
| Behavioral problems | - Child Behavior Problem Index (NICHD) |
|
| Prosocial behavior | - Howes Peer Play Observation Scale (FACES) - Parent-Child Interaction Task (FACES) - Social Skills Scale of the Social Skills Rating System - Social Competence Subscale of the Social Competence and Behavior Evaluation. |
|
| Child weight for height. | Height, weight, BMI | |
| Overall child physical health. | Consult large-scale health studies of low-income children. |
BMI = Body mass index; COR = Child Observation Record; FACES = Family as Child Experiences Survey; FES = Family Environment Scale; FFP = Family Foundations Project; NICHD = National Institute of Child Health and Human Development; NRS = National Reporting System; PPVT-III; Peabody Picture Vocabulary Test-Third Edition; Pre-CTOPPP = Preschool Comprehensive Test of Phonological and Print Processing; PSI = Parenting Stress Index; TERA-3 = Test of Early Reading Ability-Third Edition |
As before, FFP developers will have to set the thresholds rather high, particularly for participation and for frequency of intervention activities. Meeting the parental participation thresholds in FFP components may be a challenge, but it will be critical when testing the intervention theory. Even if the implementation of FFP is considered “full” and of high quality, it is possible that participation will be low given that many government and social service programs have difficulty engaging the types of individuals targeted by FFP. Factors exogenous to FFP that influence participation may have a stronger, detrimental effect. If FFP sites cannot engage parents to a sufficiently high degree, the intervention will have only a limited potential to produce the size of intended effects on child outcomes indicative of positive impacts. Similarly, if the frequency (or dosage) of exposure to Brief Parenting Interventions is too low, it is unlikely that child impacts could be detected.
Intermediate and child outcomes. We presented possible intermediate and child outcomes in a previous section. It should be possible to collect some of these intermediate outcomes, specifically the ones for the Work Program, with relative ease and at limited cost by using administrative data. For example, employment and income outcomes of targeted parents could be measured using employment placement data collected by Work Program staff and data from the state’s unemployment insurance records. Other intermediate outcome measures for the Work Program could use or adapt existing tools. For example, the Carolina Parent Social Support Scale may be a measurement option for assessing the breadth and depth of a parent’s social support network prior to and following participation in FFP.
Without specific definition to the components of FFP, it is difficult to specify additional intermediate and child outcomes. For the sake of example, we propose three Brief Parenting Interventions modules that might be universal among individual programs that are targeting noncustodial fathers, mothers with substance abuse problems, and teenage parents: reading with children; family relationships; and improving nutrition and health. Using these modules, we can propose intermediate and child outcomes and their corresponding considerations for use in an evaluation of FFP. Measurement tools exist to measure many of these concepts. In Table III.2, we make specific suggestions based on these examples, but other measures may be considered as well.
Module 1: Reading with Children. The first Brief Parenting Intervention module would focus on reading with children. The module’s goals would be to establish reading as a daily routine, introduce dialogic reading (in which a parent asks questions to further engage the child with the book), and demonstrate effective read-aloud techniques to parents. The intermediate outcome of interest would be increased frequency and duration of parent-child reading time. To measure this outcome, the FFP evaluation could adapt the “Activities with Your Child” section of the FACES parent interview, in which parents are asked six distinct questions pertaining to their support of and involvement with literacy activities. The child outcomes could include increased vocabulary, increased book and print knowledge, and increased letter identification (or alphabet knowledge).
To measure these child outcomes, FFP evaluators probably would be able to rely on existing measures. Measurement of these child outcomes typically is conducted through direct assessment. Although a number of measures on vocabulary are available and have been used with Head Start children, the PPVT-III is one of the most commonly used measures of children’s vocabulary, and it has strong psychometric properties. The PPVT-III has been used with Head Start populations; for FACES, trained paraprofessionals administered and scored a short version of the PPVT-III in about 10 minutes. The PPVT-III also can be used with elementary school-age (and even older) children, which could prove useful in an evaluation of FFP, given that child impacts may occur after the Head Start year.
Measures for assessing book knowledge, print knowledge, and letter identification are available; many of them capture elements of all three of those concepts. Some of these measures contain only a single relevant item, some have subscales, and still others are more comprehensive. For example, the Child Observation Record (COR) contains two items on children’s knowledge and appreciation of books, and one item on children’s knowledge of letters and numbers. The most in-depth measure of book knowledge and appreciation is the Story and Print Concepts measure, which also is available in Spanish. However, information from FACES on this measure’s psychometric properties suggests less-than-optimal reliability and mixed evidence of its validity with a Head Start population. For print awareness and concepts, the Preschool Comprehensive Test of Phonological and Print Processing (Pre-CTOPPP) includes a “print awareness task” that also contains a few items that tap this domain element. The Conventions Subtest of the Test of Early Reading Ability, Third Edition (TERA-3) is another in-depth assessment that has good reliability, and that can be used with elementary school-age children. (However, it is not available in Spanish.) The Letter-Word Identification Test of the Woodcock-Johnson III has good psychometric properties; it has been used with diverse populations, can be administered in eight minutes, and can be used with elementary school-age children.
Many of the same tools that measure print awareness also can be used to measure alphabet knowledge. The “print awareness task” of the Pre-CTOPPP taps the child’s ability to identify letters. The Alphabet Subtest of the TERA-3 has good psychometric properties and takes about 10 minutes to administer and score. The English and Spanish versions of the Letter-Word Identification Test of the Woodcock-Johnson III can be used with children of all ages to measure alphabet knowledge. A measure specific to alphabet knowledge is the Letter Naming Task (and its Spanish counterpart, Nombrando Las Letras) that was developed by the Head Start Quality Research Centers and is being used in the Head Start National Reporting System. It is a psychometrically sound measure that takes only about five minutes to administer and score.
Module 2: Family relationships. The second module would focus on supporting both the mother-father relationship, and the parent-child relationship (both mother-child and father-child) with the goals of providing parents with constructive ways to engage each other in support of their children’s development, reducing parent conflict, reducing parenting stress, and increasing positive communication between parents and their children. The expected focus on communication and positive social interactions would lead to the intermediate outcomes of reduced parenting stress, reduced household conflict, and increased father presence and involvement with the child. Existing measurement tools may be adapted for an FFP evaluation or may guide development of more refined measures specific to FFP.
Intermediate outcome: reduced parenting stress. The Parenting Stress Index (PSI) covers four major domains (total stress, child domain, parent domain, and life stress) and can be used with parents of children between the ages of one month and 12 years. The parent domain would be most germane to capturing the intermediate outcome of reduced parenting stress that could result from a Brief Parenting Intervention focused on family relationships. This domain consists of seven subscales that measure competence, isolation, attachment, health, feeling of role restriction, depression, and spousal support. FFP evaluators may find that the PSI can be informative in developing a measure of parenting stress. The short form of the PSI could also be explored as an alternative. In addition, although the PSI can be administered and scored by staff who do not have formal training in psychology or social work, interpretation of PSI scores does require such training.
Intermediate outcome: reduced household conflict. The Family Environment Scale (FES) is a tool that might be used in the second module to measure household conflict. The FES measures the family’s social environment and family functioning through 90 true-false items on a paper form that a parent completes. The FES’s instructions are self-explanatory, no training is required for administration, and the tool is easy to score. It is available in a number of languages and was used in the impact study of Early Head Start.
Intermediate outcome: father involvement. The Early Head Start impact evaluation also can be informative as a source of measures for father presence and involvement with children. The study developed four factor scores that measure the frequency with which fathers, or father figures, engage in four types of activities: (1) caregiving activities, such as helping their children to brush their teeth or take baths; (2) social activities, such as taking children to visit friends or relatives and going to restaurants; (3) cognitive activities, such as reading or telling stories and singing to the children; and (4) physical play, ranging from calm activities, such as rolling a ball, to rough and tumble activities, such as chasing games. The Early Head Start father activity scores have good psychometric properties (internal consistency reliability ranging from .72 to .84) and can be administered and scored with relative ease.
Intermediate outcome: parenting support. Increased parenting support and warmth and decreased detachment and intrusiveness are other possible intermediate outcomes of supporting the parent-child relationship. Even though a few measures of parent-child interactions exist, some measurement development work in this area probably would be necessary. The Parent-Child Interaction Task, used by the National Institute of Child Health and Human Development (NICHD), involves observation of a parent and child interacting during a semistructured, 15-minute play interaction. Ratings scales are used to assess the quality of the interaction, expressions of affect, and the child’s emotional regulation with the mother in a potentially exciting or frustrating activity. This measure has good reliability, but validity specific to Head Start children has not been determined. A less intensive, but potentially less reliable measure, is the parent report (the Parent-Child Relationship Scale) used by NICHD. This 15-item questionnaire assesses how warmly parents view their relationship with their children by asking parents to rate items on a five-point Likert-type scale.
Child outcomes for Module 2. The expected child outcomes would be reduced behavioral problems and increased prosocial behavior. (“Prosocial behavior” refers to a child’s interest in and ability to develop friendships and positive relationships.) Because many existing measures capture some aspect of children’s behavior and social relationships, FFP evaluators will have to make an informed assessment about what aspect (or aspects) of behavior are likely to be influenced by the Brief Parenting Interventions (or other components of FFP). Two potential broad measures of problem behaviors, both of which have been used in the FACES study, are the Behavior Problems Scale (BPS) and the Child Behavior Problem Index (CBPI). The BPS relies on teachers’ ratings of children’s behavior; the CBPI is a parent report based on 12 items of children’s negative behaviors that are relatively common among preschool children. Both measures are applicable only to preschool children. For the FFP evaluation, however, it may be necessary to identify or develop a measure that can be used with elementary school-age children, as impacts on child outcomes may be expected to occur beyond the Head Start year. The Child Behavior Checklist contains a parent and a teacher rating component of social competence and problem behavior for children aged 4 to 18 years. Measuring prosocial behavior may overlap with measuring problem behavior, as positive social relationships require such traits as good self-control and cooperation. However, prosocial behavior generally is a broader construct than is social competence that includes such behaviors as children’s willingness to talk with and accept guidance and directions from teachers, their ability to develop friendships, and their ability to express empathy and to care for others. Many measures of prosocial behavior require observational coding of interactions with peers (such as with the Howes Peer Play Observation Scale used in FACES), observational coding of interactions with parents (using the Parent-Child Interaction Task), or some other method to tap the dyadic nature of children’s social relationships (such as with a teacher). Other measures constitute validated scales or subscales of children’s social competence more broadly conceptualized on the basis of teacher or parent reports (for example, the 10-item social competence subscale of the Social Competence and Behavior Evaluation and the Social Skills Scale of the Social Skills Rating System).
Module 3: Improving nutrition and health. The third module would focus on nutrition and health with the goals of educating parents about how to establish food shopping and food preparation goals, sharing tips on stimulating preschoolers’ interest in eating nutritious meals and snacks, and engaging parents in reading their children’s satiety cues and increasing their children’s activity levels. The intermediate outcomes for this module could be measurement of activity level, such as minutes per day in outdoor activities or hours per day viewing television, or, possibly, the number of meals per week from fast food restaurants. These intermediate outcomes could be collected through parents’ reports, using questionnaires, interviews, or, possibly, activity diaries. Possible child outcomes of interest include child weight for height (or body mass index for children) and overall child physical health. Our review of potential child outcome measures conducted as part of a previous task in this study highlighted one study among recent large-scale studies and program evaluations involving low-income families with preschool-age children that measured children’s health status and practices. The NICHD Study of Early Child Care assesses children’s height and weight and also asks parents about any hospitalizations, diagnosed health conditions, and severity and impact of any illnesses that their children have experienced. In addition, the Early Childhood Longitudinal Study of a Birth Cohort measured height, weight, and other growth measures, as well as obtaining parent report about health status and medical services. In addition, both the Early Head Start Evaluation and FACES obtained parent report of health status and medical care. Measurement work in this area should focus on identifying or adapting existing measures of the specific aspects of child health related to nutrition that can be informed by large-scale health studies of low-income children.
The I&I planning phase for the FFP, intended to last nine months, ends with the completion of the enhancement definition through the specification of the FFP’s various components. However, to fully complete the activities of Stage 1, FFP will require additional time to refine documentation, and to develop and conduct preliminary testing of measures. At the end of the initial nine-month planning phase, FFP will be implemented in Head Start programs throughout Pulaski County. Over the course of an additional six to nine months, FFP developers can use the experience of the sites in Pulaski County to learn about the resources required for successful implementation, the different methods that programs use to develop employer and community resource networks and to engage parents, and the challenges to implementation. This information will feed into the enhancement refinement process to assist FFP in developing the comprehensive documentation necessary for expanded replication. During the six- to nine-month period, measurement-development work should be under way as well.
It is feasible and recommended that, as part of Stage 1, FFP be implemented in additional sites outside of Pulaski County. This expansion will enable developers to test replication in sites that may have characteristics and resources that differ from those of Pulaski County. Implementation should occur after an initial period of implementation in the original sites, when documentation can be considered comprehensive, and when measures have been developed or selected. The quality of the implementation and fidelity measures should be tested through implementation studies in both the original sites and the second-phase sites in order to assess threshold levels, and to test multiple measurement approaches, where applicable. In addition, developers should conduct pre-post outcome studies of the intermediate outcomes in all the sites. It is unlikely that any descriptive child outcomes study of FFP would be informative in Stage 1. Because this enhancement would affect child outcomes through the parent, changes in child outcomes are unlikely to occur in the short term.
For FFP to progress to a Stage 2 evaluation, the components of FFP must have been developed fully and documented clearly, and the implementation studies must suggest that replication is feasible, particularly in diverse sites. Two issues will be of particular importance for determining the replicability and feasibility of FFP necessary for progression to Stage 2. First, FFP developers will have to give substantial thought to the options for establishing the Work Program component in order to ensure that Head Start programs serving diverse populations and having varying levels of resources within Head Start and within the community can successfully achieve implementation. Second, the early implementation sites in Pulaski County and elsewhere will have to achieve the thresholds for participation so as to provide important lessons about methods of engaging parents and building participation to the levels necessary for evaluation. The studies of intermediate outcomes for FFP parents should reflect positive improvements of a magnitude that has the potential to create significant changes in child outcomes. As an enhancement in the very early stages of development, it would likely take two to three years for FFP to be ready for a Stage 2 evaluation.
The goal of the English Language Learners Project (ELLP), a project of the Community Development Institute (CDI), is to integrate ELL assessment and instruction into Head Start programs that serve large numbers of children who are learning English as a second language in order to improve the children’s English language acquisition and literacy skills. The ELLP will provide initial intensive training on ELL assessment and instruction for Head Start teachers, home visitors, and administrators, supported by ongoing training and technical assistance for three years. The Community Development Institute, with I&I support, is refining the ELLP through planning and implementation activities with 22 Head Start grantees (2 per region). This enhancement is well-defined, and its Guide to Working with English Language Learners has established a strong framework for documentation. Additional documentation, particularly on training and technical assistance, still must be developed, and measurement work will be necessary. Pilot implementation in the 22 sites will provide important information about the variations and challenges that can arise across different sites. This information will help to refine the enhancement and the supporting documentation, as well as assist in developing and refining measures of implementation, fidelity, and outcomes.
Motivation and goals. More than one-quarter of Head Start children speak a language other than English at home (see Table II.7; and U.S. Department of Health and Human Services 2002). Spanish is the most common non-English language, spoken by 23 percent of Head Start children. Relative to the national average, Migrant and Seasonal Head Start programs and programs in the West and Northeast have substantially higher percentages of Spanish-speaking children.5 Based on demographic projections, the proportion of ELL children in Head Start is projected to increase during the next decade and beyond (Espinosa 2004; Iglesias 2004). According to CDI, recent studies indicate that more ELL children drop out of high school than graduate. Studies also suggest that most U.S. schools do not adequately address the needs of ELL children, and that practical guidance that can be applied to the classroom is lacking. CDI initiated the ELLP with the belief that ELL instruction in Head Start classrooms will improve English language acquisition and will enhance the emerging literacy skills of English-language learners in the short run, thereby setting a course for potential improvement in educational outcomes in the long run.
Therefore, the short-term goals of the ELLP are to improve English language acquisition and literacy skills among Head Start children who are English language learners, and to increase the social and emotional competence of these children. To achieve these goals, Head Start programs in which the ELLP enhancement has been fully implemented should (1) conduct assessments using methods that accurately gauge the skills and abilities of ELL children, (2) provide effective ELL instruction in Head Start classrooms, and (3) use methods to engage families of ELL children in their children’s education. To fully define the enhancement, ELLP developers will have to detail those three key components of the project.
Enhancement components. To improve the accuracy of assessments for ELL children, the ELLP, presumably, will provide a toolkit of resources that can be used to tailor Head Start’s individual child assessments to the needs of ELL children. This toolkit is likely to include nonverbal methods of gauging a child’s abilities. Nonverbal methods generally are viable options only for measuring abilities in domains other than language development and literacy (for example, tasks that can be modeled for the child, such as identifying the “odd” item in a set of items sorting objects by size, and showing numbers on fingers). However, some basic literacy cues can be gleaned from nonverbal practices, such as whether a child is drawn to looking at books on his or her own, and whether the child holds the book upright and turns the pages from left to right. Other methods that typically are used in an assessment portfolio to gauge early language development and literacy include parent questionnaires/interviews, alphabet knowledge, and word associations with pictures presented on flashcards. The use of these methods will help teachers to gauge the level of English language acquisition; however, unless they are conducted in the child’s primary language, they will provide little information on the level of early literacy. In theory, each Head Start program has a staff member who is able to communicate with children and parents in their primary language. Methods of involving these individuals in the process of assessment may be another aspect of the ELLP’s approach to individual child assessment.
ELLP instructional activities for Head Start teachers, the second component to the ELLP, are not part of a formal curriculum. They consist of a set of lesson plans and activities for classroom use that are designed to integrate into such approaches as the Creative Curriculum or High Scope. The ELLP methods and principles will have to be broad enough and flexible enough to incorporate into any curriculum in use in Head Start classrooms. ELLP instructional activities may be shaped around the four stages of second language acquisition: (1) home language use (children continue to speak in their home language even while they are surrounded by people who are speaking in English); (2) the nonverbal period (children realize that their primary language is not understood, become quiet, and become observant of the uses of English); (3) the use of telegraphic and formulaic speech (children use individual English words or use these words in short, often incomplete, and ungrammatical sequences); and (4) the use of productive language (children begin to speak English relatively well, using phrases and sentences) (Gonzales 2005). Each stage may have specific teaching tips and goals, but ELLP methods are likely to integrate an array of practices into each lesson and/or activity for two reasons. First, children in Head Start classrooms are likely to be at different stages of English language acquisition. Second, children do not necessarily proceed through the stages in a linear fashion but exhibit traits of each stage at various points in time. For these reasons, teachers may be able to “customize” their practices when working with children independently but will need more inclusive ELL approaches for group lessons.
The third component—employing methods of engaging families of ELL children in their child’s education—may be the one that is least sharply defined by ELLP developers at this time. Because the ELLP’s goal is to better integrate families into their children’s educational lives and to promote literacy activities in the home, ELLP developers will need to develop an array of activities designed to reach out to and communicate with families, including home visits by staff members who speak the families’ primary languages, translation of program materials and child progress/assessment information into the primary languages spoken in the home, parent/teacher meetings in which teachers and a translator are present, and informational sessions and parent activities specifically for non-English speaking parents to increase their comfort level.
Theory of change. Through its three key components, the ELLP expects to increase English language acquisition and literacy by having impacts on both the Head Start classroom and the home environment. The enhancement’s targets of change are general staff’s level of knowledge about ELL principles (such as the four stages of second-language acquisition and the use of inclusive teaching practices), teachers’ skills in the areas of ELL assessment and instruction, and the level of skills of all Head Start staff (home visitors, teachers, and administrators) in interacting with the families of ELL children. Intermediate outcomes will measure the extent of change observable in the classroom and in parents’ behaviors that could result from increasing Head Start staffs’ ELL skills. Intermediate classroom-focused outcomes will examine changes in the quality of the language and literacy environment. They also will include a more global measure of classroom quality across a broader range of topics (to assess any “spillover” effects). Intermediate parent-focused outcomes might measure the parents’ level of involvement with the Head Start program and the frequency and duration of literacy activities in the home. Child outcomes of interest—the ELLP enhancement’s final outcomes—will focus on the domains of language development and literacy. In addition, through an improved ability to communicate with their teachers and their peers, ELL children could become more interested and engaged in learning as well as develop better social relationships.
Implementation products for the ELLP will fall into one of two categories: (1) guidance on specific T/TA modules for use by trainers and consultants who will prepare Head Start staff for the ELLP enhancement, and who will then support these staff throughout ELLP implementation; and (2) implementation guidance for use by the Head Start program staff that would describe in detail the assessment methods for ELL children, the ELL instructional methods for the classroom, and the methods of engaging the families of ELL children.
Program needs assessment. Because of the variation in Head Start programs, T/TA necessary to achieve full implementation is likely to vary across programs as well. To determine the content and intensity of T/TA needed to launch the ELLP, ELLP consultants will have to work with Head Start program staff to conduct a program needs assessment. Guidance on conducting this type of assessment should therefore be a primary element of any T/TA guidance. The goal of a program needs assessment is to identify prominent gaps in a program’s ELL knowledge, assessment, and instruction. T/TA guidance should detail the specific areas of interest for a program needs assessment, the sources of data and information, and the steps of the assessment. Three of the more obvious factors for consideration in such an assessment are:
T/TA guide. The ELLP T/TA guidance should specify the training options and set parameters for the critical and required training elements. For example, training on the four stages of language acquisition may be a standardized training module that should have little to no modification because the principles apply across all ELL students. However, training on specific instructional approaches may vary in content to adapt to cultural differences associated with children’s primary language. The T/TA guide could be organized around different topical modules (for example, the stages of language acquisition, assessment, classroom instruction and activities, and interactions with families), with each module detailing the core elements, such as trainers’ qualifications, core content, timing, and flow. Options for each module could offer choices about the intensity and method of training (which would depend on program resources, location, or program staff’s level of existing ELL knowledge), and that provide adaptations to content based on the predominant language of ELL children in the Head Start program.
Implementation manual. The ELLP’s current implementation manual for participating Head Start program staff (the Guide to Working with English Language Learners) gives this enhancement a strong starting framework for the second type of implementation product—program implementation guidance. The implementation manual should essentially be a “user’s guide” to the ELLP enhancement that provides detailed guidance on each of the key components—assessment, ELL instruction, and family engagement. Although all of the information in the implementation manual should complement and reinforce what staff learn through training, the manual differs from T/TA guidance in that it is targeted to direct “users,” rather than to trainers.
The implementation manual for ELLP will need to include detailed guidance for each of the three components. All three components to the ELLP are likely to look like a menu of options from which different Head Start programs can choose the approaches that would work best given their current child assessment practices, primary language of non-English-speaking children and families, size of the non-English-speaking student population, and staffs’ language abilities. The basic elements to the implementation manual should include:
Development and selection of evaluation measures are critical to the ELLP’s development stage. As a well-defined and focused enhancement, there are some clear directions that measurement of the ELLP could take (Table III.3).
Quality of implementation. Measures of the quality of implementation assess the extent to which initial and ongoing ELLP T/TA is delivered as prescribed by the program developers at CDI. In addition, an objective of the ELLP enhancement is to develop agreements with local community colleges to offer Head Start staff credit for their ELLP T/TA. Credit may be indicative of the quality of implementation by reflecting comprehensive T/TA coursework and a commitment on the part of the program.
Other measures of implementation should examine the extent to which the components of the ELLP have a solid framework for success. For example, implementation measures should assess the extent to which a comprehensive, coordinated child assessment system is in place by examining the existence of a common set of procedural guidelines for collecting assessment data on children. These guidelines can be an indication of the level of planning and consideration that has been given to a child assessment system. Similarly, a measure of the depth and breadth of methods to engage the parents of ELL children with the Head Start program can reflect the level of implementation of the family component of ELLP. Finally, plans for ongoing T/TA to support ELLP should be examined for an indication of a lasting commitment to the continued professional development of staff on ELL assessment and instruction and to the project as a whole. Implementation information will largely be gathered through direct observation or through interviews during site visits that should be conducted during the period of initial intensive training and during full-scale implementation.
Fidelity to enhancement. Fidelity measures will capture how well participating Head Start programs meet the ELLP’s functional objectives of conducting assessments, delivering ELL instruction to children, and engaging the children’s families. This measurement work will be intensive and will require careful consideration because measures will have to be developed for specificity to the ELLP model in measuring how well the targets of change (teachers’ skills, teacher/child interactions, and staff/parent interactions) react and adhere to the ELLP enhancement. Before fidelity measures can be developed, ELLP developers will have to define broader concepts relating to fidelity. For example, they will have to define the critical elements of ELL competency and accuracy in ELL assessments in order to develop measures to capture these concepts. Fidelity measures that focus on the classroom will have to measure the specific actions, practices, methods, and languages that teachers use in relation to the ELLP guidance on instructional practices.
Intermediate outcomes. Intermediate outcome measures could be drawn from existing tools. The Early Language and Literacy Classroom Observation (ELLCO) Toolkit measures the quality of the language and literacy environment in early childhood classrooms, has good psychometric properties, and has been widely used in large-scale studies. The Early Childhood Environment Rating Scale-Extension (ECERS-E), another widely used measure of global classroom quality, examines the four curricular areas of literacy, mathematics, science and environment, and diversity. The ECERS-E will capture indirect effects of the ELLP enhancement that may occur in the classroom environment beyond the direct effects on language and literacy. Other intermediate outcome measures involve changes in parental behavior. Existing tools use parent interviews to gauge the parents’ level of involvement in the Head Start program and the extent of literacy activities in the home. It may be too much to expect changes in the literacy activities that parents engage in with their children. However, these changes depend on the nature and intensity of the outreach activities to parents, and the extent to which the ELLP emphasizes family literacy. Many parents of ELL children may be illiterate not just in English, but in their primary language as well.
| Area of Measurement | Concept/Domain | Potential Measurement Method/Existing Tool |
|---|---|---|
| Quality of Implementation | Duration and intensity of ELL teacher training, staff training, and TA meet intended parameters. | Observation and interviews during site visits |
| Training covers intended content and activities. | Observation and interviews during site visits | |
| Trainer(s) meet standards for qualifications. | Resume review; observation during site visits | |
| Availability of college credit for ELL training | Review of course curriculum and agreement with local community college | |
| Level of participation in T/TA by "required" Head Start staff | Observation and interviews during site visits | |
| Common set of procedural guidelines for collecting individual child assessment information is in place. | Paper reviews and interviews during site visits | |
| Multiple methods of engaging parents of ELL children are in place. | Observation and interviews with staff during site visits | |
| Plans for continued T/TA to support the ELL enhancement are in place. | Paper reviews and interviews during site visits | |
| Fidelity to Enhancement | ELL competency of trained Head Start staff | Post-test of staff trainees tailored to specific ELL concepts (must have knowledge of a specified percentage of training content) |
| Breadth, frequency, accuracy, and consistency of assessments completed among ELL students meet ELL parameters. | Observation and interviews during site visits | |
| Use of assessment information to inform instructional activities in the classroom | Observation and interviews during site visits | |
| Teacher practice of ELL instructional skills (teacher skills, classroom activities, teacher/child interactions) | Classroom observation | |
| Interactions between Head Start program staff (home visitors, teachers, administrators) and parents of ELL children meet quality parameters in terms of content, frequency, duration. | Observation during site visits | |
| Intermediate Outcomes | Language and literacy environment in the classroom | ELLCO |
| Global classroom quality | ECERS-E | |
| Level of parental involvement in Head Start program | Parent Involvement and Satisfaction with Head Start (FACES Spring 2003) | |
| Increased frequency and duration of parent-child reading time | FACES Parent Interview "Activities with Your Child" | |
| Child Outcomes | Listening and understanding | PPVT-III TVIP - Pre-LAS 2000 Oral Language Component-Simon Says(English and Spanish versions) |
| Speaking and communicating | Pre-LAS 2000 Oral Language Component-Simon Says or Art Show (English and Spanish versions) | |
| Phonological awareness | Pre-CTOPPP-Elision Task (English and Spanish versions) | |
| Book knowledge and appreciation | - Parent Report of Child's Emerging Literacy (Parent Emergent Literacy Scale) (English and Spanish versions) - Story and Print Concepts (FACES) (English and Spanishversions) |
|
| Print awareness and concepts | - Bateria Woodcock-Munoz Pruebas de Aprovechamiento,Identificacion de Letra y Palabras - Woodcock-Johnson III Tests of Achievement-Letter-Word Identification Test - Parent Report of Child's Emerging Literacy (Parent Emergent Literacy Scale) (English and Spanish versions) - Pre-CTOPPP-Print Awareness Subtest (English and Spanish versions) - Story and Print Concepts (FACES) (English and Spanishversions) |
|
| Early writing | - Bateria Woodcock-Munoz Pruebas de Aprovechamiento-Revisada, Dictado - Woodcock-Johnson Revised Tests of Achievement-Dictation Test - Parent Report of Child's Emerging Literacy (Parent Emergent Literacy Scale) (English and Spanish versions) |
|
| Alphabet knowledge | - Bateria Woodcock-Munoz Pruebas de Aprovechamiento,Identificacion de Letra y Palabras - Woodcock-Johnson III Tests of Achievement-Letter-Word Identification Test - Letter Naming Task (NRS) English & Spanish versions - Parent Report of Child's Emerging Literacy (Parent Emergent Literacy Scale) English and Spanish versions - Pre-CTOPPP-Print Awareness Subtest (English and Spanish versions) |
|
| Initiative and curiosity | - COR Initiative Scale - PBLS - Social Skills Scale of the Social Skills Rating System-Teacher |
|
| Social relationships | - COR Social Relations Scale - Social Skills Scale of the Social Skills Rating System-Teacher - Social Competence Subscale of the Social Competence and Behavior Evaluation |
COR = Child Observation Record; ECERS-S = Social-Emotional Learning Checklist; ELL = English language learners; ELLCO = Early Language & Literacy Classroom Observation Toolkit; FACES = Family and Child Experiences Survey; NRS = Head Start National Reporting System; PBLS = Preschool Learning Behavior Scale; PPVT-III Peabody Picture Vocabulary Test, Third Edition—Revised; Pre-CTOPPP = Preschool Comprehensive Test of Phonological and Print Processing; TA = technical assistance; T/TA = training and technical assistance. |
Child outcomes: language development and literacy. Measuring child outcomes for an enhancement focused on ELL children is particularly challenging. Gauging English language acquisition over the course of the intervention may be somewhat straightforward, but measuring the broader concepts of language development and literacy skills that can occur in the primary, secondary, or both languages is more difficult. Recent reviews of the available child outcome measures available in languages other than English reveal that there are fewer than 20 norm-referenced standardized tests in Spanish, and almost none in other languages (Kochanoff 2004; Kochanoff et al. 2003). In addition, some of the existing measures have problematic flaws, such as being normed using data from monolingual speakers in other countries (for example, Mexico), rather than having norms that reflect the experience of English language learners in the United States. Another criticism of the existing tests is that the Spanish in the tests may reflect only one dialect, which makes the test more difficult, and possibly invalid, for speakers of other dialects (for example, Spanish spoken by children in Puerto Rico can be quite different from Spanish spoken by children who are second-generation Mexican immigrants to the United States).
Because few choices are available for standardized assessments in Spanish, many researchers create Spanish versions of assessments and interview questions without completely understanding the importance of equating the level of difficulty across the test versions. Simply translating an English assessment into another language will not solve the problem of the absence of valid, reliable tests in other languages.
In Table III.3 we list the domain elements in the language development and literacy domains, along with existing measurement tools that are available in both English and Spanish versions. However, even if a test has both English and Spanish versions, the two versions may yield data that are not comparable; the two tests may differ completely from one another or the norming samples may not have been comparable. The PPVT-III and its Spanish version, the Test de Vocabulario en Imagenes Peabody, are one of the best examples of this problem. The Spanish version of this widely used test is much older than the English version and yields data that cannot be combined or compared validly with the English version.
Researchers have attempted to avoid these potential problems by testing children who demonstrate competency in English and Spanish in both languages. This approach builds on empirical findings about the course of bilingual children’s language development (August et al. 2002; Iglesias 2004; Tabors et al. 2004). In summary, relative to their monolingual peers, children learning two languages perform lower on tests of their abilities in either language. When researchers combine bilingual children’s pass rates on comparable questions (for example, expressive or receptive vocabulary) across both languages, children perform at the same level as their peers. That is, they know as many words across two languages as their peers know in one language. This finding has important implications for studies of Head Start enhancements that focus on supporting language development for English-language learners. It is not clear that results from two different tests that are not calibrated at the same level of difficulty can be combined validly. Furthermore, it is not clear how such data would be used in impact analyses.
These new findings about how vocabulary development progresses among children learning two languages also have practical implications for evaluation design. Assessments cannot be too long if some children in a study must complete twice as many assessments as others (one in each language). Burden on the children also must be taken into account as measures are selected. Approaches such as matrix sampling, where not all children receive every test item, may be particularly useful for reducing burden on children who must take two versions of the assessments.
Child outcomes: approaches to learning and social relationships. Measuring child outcomes in other domain elements, such as initiative and curiosity and social relationships, typically is accomplished through parent or teacher reports, rather than through observational assessments. In Table III.3, we list only existing measures that rely on teacher reports, as the use of teacher reports reduces problems associated with questioning a group of parents who speak a variety of languages. “Initiative and curiosity” refers to a child’s eagerness to learn. It includes an increased ability to make independent choices, and to choose to participate in a variety of tasks and activities. The COR’s four-item Initiative Scale taps a child’s ability to express choices, solve problems, engage in complex play, and cooperate with program routines, but it does not contain items relating to a child’s curiosity. The Preschool Learning Behavior Scale, used in FACES 2003, assesses the child’s approaches to learning, including his or her motivation to learn and behaviors that enhance the child’s learning. “Social relationships” refers to a child’s growing interest in and ability to develop friendships and positive relationships with adults and peers. As shown in Table III.3, several existing scales measuring social relationships through teacher reports have been used in large-scale studies of low-income children.
The ELLP expects to spend about three months refining definitions of the enhancement’s key components. Early implementation experimentation will then begin in the 22 test sites. If the ELLP enhancement is intended for broad replication, it will be particularly useful to include in the 22 sites Head Start programs serving children who speak a range of languages, as well as programs with varying-sized ELL enrollments. The initial intensive T/TA for ELLP should begin well in advance of the trial implementation program year (for example, in January, in preparation for full implementation in the academic year starting the following September). CDI will use the experience of the 22 sites during the initial T/TA and the first year of implementation to further refine the definition and the documentation for the ELLP. Measurement development work also will take place during this time. With early implementation in 22 sites, CDI enhancement developers and selected evaluators may be able to experiment with different approaches and methods of measurement. This type of experimentation could be particularly useful given the range of issues, as discussed earlier, that are unique to the measurement of language development and literacy among preschool children for whom English is a second language.
An evaluative on-site implementation study should be conducted during the initial T/TA period to gather information about the quality of implementation. ELLP developers should conduct a second on-site study during the program year, to gauge fidelity. An outcomes study should measure intermediate classroom outcomes and child outcomes at the beginning and near the end of the program year. Some improvements in language development and literacy would be expected during the course of a normal program year in any Head Start program. Thus, at this early stage, changes in the intermediate outcomes may be more meaningful than any measured changes in child outcomes. The intermediate outcomes have a more direct relation to the activities of the enhancement and therefore may be more suggestive of the potential for change through the ELLP.
The ELLP might be ready for a Stage 2 study at the end of about one and one-half years, a period allowing for initial T/TA and one program year of implementation in the original test sites. However, because the plan for the full enhancement is to provide three years of ongoing T/TA, evaluation plans will have to consider this full implementation period. An important consideration for rigorous study of this enhancement (or of any other ELL-focused enhancement) is the breadth of its replicability and feasibility. First, a decision should be made about the applicability of the ELLP enhancement within the broader Head Start community, including programs with smaller numbers of ELL children. It is possible that the ELLP enhancement could be adapted to meet the needs of these programs, but that the programs will not be able to achieve the thresholds of fidelity if only minimal attention to ELL practices (or more focused attention to individual children) is needed. The decision could be made to test the ELLP enhancement only in Head Start programs that have a predetermined percentage of ELL children. Second, even with testing only in programs with relatively large populations of ELL children, evaluators will have to assess the feasibility of measurement across a range of languages if children will have to be tested in both their primary language and in English.
Circles of Care has developed the Violence, Intervention and Prevention (VIP) enhancement to integrate mental health problem prevention and intervention services into Head Start programs. Circles of Care is a behavioral health care organization in Brevard County, Florida, that provides mental health, alcohol, drug abuse, and related services to county residents through its hospital-based settings and state and county contracted programs. The VIP enhancement has four key components: (1) the Second Step preschool curriculum, which teaches social and emotional skills for violence prevention; (2) the Building Strong Families program for parents, which focuses on intensive child and family social skill building; (3) individual, group, and family counseling services to children and families with specific mental health needs; and (4) case management services for fathers. The VIP program will be a centralized effort, managed and staffed by Circles of Care for implementation in five Head Start programs in Brevard County. The I&I grant to Circles of Care supports an initial planning period to refine the development of the VIP program, and to build the infrastructure for implementation and evaluation.
Motivation and goals. The prevalence of social and emotional problems among preschool children is an issue of growing concern, as these problems can affect academic performance as early as first grade (Raver and Knitzer 2002) and persistent problems can have longer-term effects, including substance abuse, depression, violent behavior, and school failure. The Early Childhood Longitudinal Survey found that approximately 10 percent of children in an average kindergarten classroom easily become angry or engage in arguments or fights (West et al. 2000). Teachers in one study reported that about 40 percent of preschoolers in Head Start exhibited at least one disruptive or unsafe behavior each day (Kupersmidt et al. 2000). In response to increasing prevalence rates of problem behaviors in classrooms, clinical disorders, and children in families with risk factors, Circles of Care initiated this early intervention program to address the emotional and mental health needs of Head Start children and their parents.
When fully implemented, the VIP enhancement intends to increase the social and emotional competence of Head Start children by providing: (1) one or two 30-minute sessions for preschoolers each week, using the Second Step curriculum, which teaches prosocial behavior and improving teachers’ skills in helping children to incorporate the practice of emotion management and problem solving throughout the Head Start day, (2) instruction in social skills-building and parenting skills to the parents of Head Start children through Circles of Care’s Building Strong Families program, (3) on-site individual, group, and family counseling services, facilitated by Circle of Care’s mental health specialists, to an estimated 20 percent of Head Start children and families across the five pilot implementation sites, and (4) case management services to fathers of Head Start children through a Circles of Care Outreach Specialist.
Enhancement components. During a development phase, VIP developers will have to specify the targeted population, as well as the content, dosage, and duration for each of the four key components. The first component, the Second Step curriculum, is a pre-packaged curriculum developed by the Committee for Children that teaches empathy, impulse-control, anger-management, and problem-solving skills. It is recommended for use throughout an entire school (or, in this case, throughout an entire Head Start center), rather than for use in a single classroom in a school or center. The preschool lessons last approximately 30 minutes, and children are motivated by poster-sized photo-lesson cards and puppets and soft toys (Impulsive Puppy, Slow-Down Snail, and Be-Calm Bunny) that relate to the cards. The cards depict children expressing emotions in real-life situations. Instructions for the teacher on the back of each card present an overview of the concepts to be covered, list the lesson’s objectives and required materials, and provide a story about the photo that is accompanied by discussion topics and specific questions for the children to answer. Each lesson also contains guidance about supplemental activities, such as role-plays, and offers teachers suggestions about how to model the skills taught in the lesson throughout the week. The Second Step curriculum also provides ideas for extension activities that can be incorporated into classroom activities throughout the week, with minimal preparation or materials.
VIP’s second component, the Building Strong Families program for parents, is designed to build stronger relationships within the family, and to prevent mental health and substance abuse problems. The enhancement will have to define whether this program component will target all parents of Head Start children in the implementation sites, or whether families with specific risk factors (such as single-parent households, low-educated parents, or low proficiency in English) will be the primary targeted population. The method of delivery and dosage for this parental component will need definition such as whether Building Strong Families will provide parental instruction and guidance through small group workshops, one-on-one sessions, or a combination of the two methods, and the frequency and length of the sessions. Presumably, the content of Building Strong Families sessions will include such topics as stress management, handling of conflict, parenting skills, and relationship skills.
For the third component, on-site mental health counseling will be provided at each implementation site one day each week. Circles of Care expects that 20 percent of the Head Start families within the VIP program sites will need individual, group, or family counseling. The screening procedures to identify these families should be detailed, as should the specific counseling plan and approach (for example, the method, frequency, and duration of counseling).
Finally, the case management support for fathers to maximize fathers’ participation in the VIP program specifically, and in the Head Start program more broadly, needs definition. Again, any criteria that Circles of Care will use to target fathers with specific characteristics or circumstances for services should be made explicit. In addition, VIP developers will have to describe the specific outreach activities to engage fathers, as well as the breadth and depth of the case management support (for example, employment referrals, social service referrals, and assistance with supportive services, including transportation and child care).
Theory of change. Circles of Care also will have to define explicitly the theory of change for the VIP program in order to guide early evaluation efforts, and to identify the intermediate and child outcomes of interest. As a comprehensive approach, the goal of the VIP enhancement is to increase children’s social and emotional competence through both the classroom and home environment. In this section, we describe the targets of change and the intermediate outcomes for each of the four components of VIP separately. Although each component of the VIP enhancement may have its own targets of change and intermediate outcomes, all four components are intended to work together to positively affect child outcomes in the area of social and emotional development.
The Second Step curriculum is designed to improve teachers’ skills and increase the number of classroom activities that focus on prosocial behavior. The intermediate outcomes that measure progress in this component would focus on the classroom and, possibly center, environment. For example, the overall quality of the classroom environment could be expected to improve, specific lessons or tasks may take less time if children remain “on task,” and fewer classroom disruptions due to arguments, fights, or other problem behaviors may occur. Other changes may be visible in the center more broadly and could be captured by an intermediate outcome that gauges the climate on the playground, in the lunchroom (if applicable to full-day programs), or during other group activities that integrate children across classrooms (such as an indoor play area or group music program).
For the sake of this example, we assume that the Building Strong Families component will focus on some of the same issues that we presented in our discussion of the Brief Parenting Intervention component of the FFP example (discussed in Section D.1 of this chapter). Specifically, the topics of this intervention would focus on supporting the mother-father relationship and positive parenting skills. We also assume that the Building Strong Families component directly addresses mental health and substance abuse issues by discussing the risk factors and indications of a problem, and by offering routes to screening, counseling, and treatment, if necessary. The targets of change are individual coping skills, parent-to-parent communication skills, and parent-child positive interactions. Intermediate outcomes to measure these changes might include a reduction in household conflict, an increase in father presence and involvement with the child, a reduction in parenting stress, and an increase in parenting support and warmth. Other intermediate outcomes of the Building Strong Families component might measure the percentage of parents screened for mental health and substance abuse problems, as well as a follow-up measure for those who are diagnosed with a problem and subsequently have entered counseling or treatment.
The counseling component of VIP will have different routes to the child depending on the method of delivery. In some cases, a child with significant behavioral problems or a diagnosed problem will receive direct individual counseling from a Circles of Care mental health specialist. In other cases, the entire family may receive counseling; in still others, parents may receive group counseling. The end goal is improved mental health of the individual receiving the counseling. When the parents or the whole family unit are the recipients of counseling, the intermediate outcomes affecting the child might include reductions in household conflict and parenting stress. In addition, an intermediate measure should capture the psychological well-being of the parent.
We assume that the outreach and case management support provided specifically to fathers would have overlapping intermediate outcomes with the Building Strong Families component. Specifically, the intermediate outcomes from the case management support to fathers would include increased father engagement with the child and increased parenting support and warmth specifically between the father and the child. Circles of Care also expects that increased father engagement will lead to increased financial support of the child.
As the curriculum developer, the Committee for Children has developed complete documentation for the Second Step curriculum. The Committee for Children provides the training for teachers and all classroom materials, lesson plans, and extension activities necessary to implement Second Step. Circles of Care will have to produce the detailed documentation necessary for replication of the VIP model as a whole by clearly documenting the three remaining components during the development stage.
Community resource assessment. Implementation guidance should first help Head Start programs to assess the mental health and social service agencies in their communities in order to identify partners for the VIP program. Circles of Care is a unique organization with the ability to provide all the distinct pieces that comprise the integrated VIP program. Aside from the Second Step curriculum, the other program components may or may not be housed within the Head Start program itself. Counseling services (either on- or off-site) must be delivered by mental health specialists wherever the VIP program is implemented. Head Start programs are required by the Program Performance Standards to have relationships with mental health professionals for screening and referrals, and may want to build on these existing partnerships for implementation of the VIP program. Individual Head Start programs can be given the choice of administering the Building Strong Families parenting and family supports and the case management services for fathers themselves or seeking community partners for these services. Cost and resource constraints may prevent a program from adopting the whole model for in-house administration.
Implementation guidance for program components. Circles of Care will have to consider the finer points of implementation that a Head Start program or partnering organization will have to take into account when implementing the components of the VIP program. In fact, each component could have its own stand-alone documentation.
Communication and referrals between components. A key element to replication of the VIP program by other Head Start programs will be achieving the appropriate level of service integration and communication among the program components. If separate entities administer separate pieces of VIP, it will be critical to establish proper protocols for communication to create an integrated system of referrals and supports. For example, a trainer’s experiences with the parents of a Head Start child during the Building Strong Families component may lead the trainer to recommend individual or family counseling with the mental health specialist. Implementation documentation should therefore specify methods to ensure that the communication and referrals among the various components of VIP are created and sustained.
Given the VIP program’s integrated approach to improving the social and emotional competency of Head Start children and families, VIP developers will have to perform a substantial amount of measurement work during the development stage. The developers will have to develop or identify measures of the quality of implementation and fidelity to the enhancement for each program component as well as to capture the level of integration among services, where appropriate. This measurement work will benefit to some degree from measures that already have been developed for the Second Step program; as with any enhancement, however, measures in these areas typically need development for specificity to the enhancement.
Quality of implementation. Quality of implementation measures should capture the extent to which programs are able to garner resources, build partnerships, establish administrative procedures, identify qualified staff, and deliver the T/TA necessary to successfully launch each of the VIP program components (Table III.4). The Committee for Children has developed a Second Step Implementation Checklist that measures whether the planning steps and resources are in place to launch and sustain Second Step, the percentage of teaching and non-teaching staff trained, whether the appropriate level of resources and instructional materials have been obtained for each classroom, and the level of center-wide support for the curriculum. In addition, trainers complete a Second Step Training Evaluation immediately after training. This form, which takes about five minutes to complete, includes nine questions about the trainer’s skill level, the training content, and the value of the session, all rated on a scale of poor, fair, good, and excellent, as well as five open-ended questions about ways in which the training could be improved. This type of participant feedback can be helpful, but VIP developers also may have to develop a tool for use by an objective, nonparticipant evaluator.
VIP developers and/or evaluators will also need to develop tools for measuring the quality of implementation for the additional three components of the VIP program. Among these should be an observational tool to measure the quality of staff training for the Building Strong Families component. Observations and interviews during implementation site visits should be performed to gather information about the infrastructure in place to support the Building Strong Families component, the case management services to fathers, and the mental health counseling services, whether it is housed in the Head Start agency or with partnering agencies. Finally, developers should measure other components that are critical to the success of implementation, including whether staff meet the necessary qualifications, whether the ratio of staff to parents/fathers is appropriate, the extent to which multiple methods of engaging parents in the VIP components have been developed, and the level of quality and comprehensiveness of communication and referral protocols to support integration of services.
| Area of Measurement | Concept/Domain | Potential Measurement Method/Existing Tool |
|---|---|---|
| Quality of Implementation | Second Step staff training sessions meet quality parameters in terms of qualifications of trainer/facilitator, frequency, duration. | - Observation and interviews during site visits - Second Step Training Evaluation |
| School-wide implementation of Second Step meets parameters advised by developers. | Second Step Implementation Checklist | |
| Level of participation in Second Step T/TA by "required" Head Start staff | Second Step Implementation Checklist | |
| Partnership formed with appropriate entity to provide mental health counseling to individuals and/or families. | Interviews during site visits | |
| Mental health counselors meet qualifications. | Resume review; interviews with staff during site visits | |
| Criteria and methods for engaging individuals and/or families in mental health counseling are in place. | Interviews during site visits | |
| Resources to support Building Strong Families and/or connections built with necessary community partners are in place to effectively implement and sustain this component. | Interviews during site visits | |
| Building Strong Families staff training sessions meet quality parameters in terms of qualifications of trainer/facilitator, frequency, duration. | Observation and interviews during site visits | |
| Case managers/outreach specialist to work with fathers meet qualifications for position. | Resume review; interviews with staff during site visits | |
| Ratio of case managers to Head Start fathers meets parameters. | Observation and interviews with staff during site visits | |
| Multiple methods of engaging targeted parent populations in the various components of VIP are in place. | Observation and interviews with staff during site visits | |
| Communication/referral protocols among program componentsare in place. | Observation and interviews with staff during site visits | |
| Fidelity to Enhancement | Competency of teachers and other trained Head Start staff on the concepts of the Second Step curriculum | Post-test of staff trainees tailored to specific Second Step concepts (must have knowledge of a specified percentage of training content) |
| Teacher practice of Second Step instructional skills (delivery of specific lessons, teacher skills, classroom activities, teacher/child interactions outside of lessons). | - Classroom observation - Lesson-Completion Record - SELC |
|
| Individuals/families are effectively engaged in mental health counseling, when appropriate; frequency and consistency of participation meets expectations. | Interviews during site visits | |
| Competency of trained staff on the concepts of the Building | Post-test of staff trainees tailored to specific Building | |
| Strong Families parental instruction content | Strong Families concepts (must have knowledge of a specified percentage of training content) | |
| Building Strong Families sessions for parents meet quality parameters in terms of content, frequency, duration | Observation during site visits | |
| Interactions between case managers and Head Start fathers meet quality parameters in terms of content, frequency, duration | Observation during site visits | |
| Frequency of referrals/communication among components of VIP | Observation and interviews with staff during site visits | |
| Overall level of Head Start parent participation in Building Strong Families component | Administrative data/observation | |
| Level of participation of fathers in case management services and in Building Strong Families | Administrative data/observation | |
| Intermediate Outcomes | Global measure of the quality of the classroom environment | - ECERS-R - Preschool program quality assessment |
| Time on specific classroom activities or child-directed tasks | Observation during site visits | |
| Disruptions in the classroom | Observation during site visits | |
| Playground (or other integrated setting) climate | Playground and Lunchroom Climate Questionnaire | |
| Parenting stress | PSI | |
| Household conflict | FES Conflict Items | |
| Father presence and involvement with child | Father Activities with Child-Early Head Start Father Study Measures | |
| Increased parenting support and warmth; decreased detachment and intrusiveness | - Parent-Child Interaction Task (NICHD) - Parent-Child Relationship Scale (NICHD) - Father-Child Interaction Task for the Three-Bag Task |
|
| Financial support of child provided by the father | Interview/questionnaire completed by the mother and the father | |
| Percentage of parents in Building Strong Families screened for mental health and/or substance abuse problems | Program data | |
| Percentage of parents in Building Strong Families with positive screens who are diagnosed with a problem; percentage who seek counseling or treatment | Program data | |
| Parental mental health/psychological well-being | - CES-D full version or abbreviated - Pearlin Mastery Scale, Locus of Control |
|
| Child Outcomes | Children's knowledge about empathy, problem solving, management of strong emotions, and ways to respond to problematic situations with peers | - Second Step Knowledge Assessment for Non- or Beginning Readers - Social Problem-Solving Test Revised-Child Measure |
| Empathy | Empathy Subscale of the Social Skills Rating System-Teacher Report | |
| Self-control, regulation | Self Control Subscale of the Social Skills Rating System-Parent or Teacher Report | |
| Problem solving | Social Problem-Solving Test-Revised, Teacher Report | |
| Anger management/behavioral problems | - Behavior Problems Scale (or Classroom Conduct Problems) (NICHD)-Teacher Report - Child Behavior Problem Index (NICHD)-Parent Report - Child Behavior Checklist (NICHD)-Parent or TeacherReport |
|
| Prosocial behaviors | - Howes Peer Play Observation Scale (FACES) - Parent-Child Interaction Task (NICHD) - Social Competence Subscale of the Social Competenceand Behavior Evaluation-Teacher Report - Social Skills Scale of the Social Skills Rating System-Teacher Report |
COR = Child Observation Record; ECERS-S = Social-Emotional Learning Checklist; ELL = English language learners; ELLCO = Early Language & Literacy Classroom Observation Toolkit; FACES = Family and Child Experiences Survey; NRS = Head Start National Reporting System; PBLS = Preschool Learning Behavior Scale; PPVT-III Peabody Picture Vocabulary Test, Third Edition—Revised; Pre-CTOPPP = Preschool Comprehensive Test of Phonological and Print Processing; TA = technical assistance; T/TA = training and technical assistance. |
Fidelity to enhancement. Fidelity measures will capture the extent to which the VIP components look and function as intended by the developers at Circles of Care. Again, each component should be considered separately in measuring fidelity, and at least one measure should assess the frequency and quality of referrals and communication among components (Table III.4).
Second Step developers at the Committee for Children have two tools that may help to measure fidelity for the Second Step curriculum. Both are based on either teachers’ or staffs’ reports and may therefore have to be combined with an independent classroom observational tool specific to the Second Step curriculum. The first tool, the Lesson-Completion Record, is used by teachers to track when specific lessons were conducted, which students participated in the lessons’ role-plays, and how many students participated. The Social-Emotional Learning Checklist (SELC) is a method of assessing teachers’ (or other staffs’) support of students’ skill use outside of Second Step lesson instruction. Using a scale of “never, once, 2-3 times, 4+ times,” teachers are asked to rate how often a series of nine events occurred outside of specific lesson instruction. Events include such items as asking students for solutions to a problem in the classroom, modeling problem-solving or anger-management strategies, and intervening in a student conflict by prompting students to use problem-solving or anger-management strategies. Second Step developers recommend completion of the SELC once a month throughout the program year.
Other fidelity measures should assess the competency level of trained teachers or other staff on the concepts of Second Step and Building Strong Families training. These measures are likely to be post-tests tailored to the specific concepts of each training that can be conducted both immediately after training and, ideally, at a later point during implementation to assess the degree of retention.
VIP developers also should conduct observations during implementation site visits to assess the quality and frequency of interactions between case managers and fathers and the quality of Building Strong Families sessions. For reasons of confidentiality, evaluators will be unable to observe counseling sessions, but a fidelity measure should capture the percentage of clients engaged in counseling services and the frequency and consistency of the clients’ participation.
Finally, developers will have to develop measures of the level of parents’ participation in the Building Strong Families sessions and the level of fathers’ engagement in case management services. As the FFP example indicates, programs will have to achieve a sufficiently high threshold for participation in order to generate effects of the magnitude that can significantly change child outcomes.
Intermediate outcomes. The range of activities in the VIP program present program evaluators with a number of potential intermediate outcomes that could be examined (Table III.4). Part of the measurement work during the development stage may be to prioritize the usefulness and value of different intermediate outcomes. In the development stage, evaluators may be interested in collecting a broad array of outcomes to experiment with different measures and methods of data collection. However, they should consider the cost and resource implications that a large data collection effort could have on a larger-scale evaluation.
Center and classroom environment. One tool that has been used in previous process evaluations of Second Step and that could inform development of an intermediate outcome measure is a Playground and Lunchroom Climate Questionnaire. This 23-item, 5- to 10-minute survey is designed to collect information about the structure of, monitoring of, and staff collaboration in unstructured settings in which problems among students may occur. The concept of measuring the climate of unstructured settings is an important one; however, this tool was developed to identify areas of concern and to suggest methods of improving the facilitation of unstructured activities. It may be a useful starting point, but it probably will not achieve the full objective of an intermediate measure to capture the social-emotional climate among and between children in unstructured settings. Other intermediate measures related to changes in the classroom that could result from Second Step will require some development including an observational tool that measures time spent performing specific classroom activities and/or child-directed tasks and the number of disruptions in the classroom over a specified period. Evaluators also should consider including a global measure of the quality of the classroom environment; they have the option of choosing from a number of existing tools with good psychometric properties, such as the Early Childhood Environment Rating Scale, Revised Edition and the Preschool Program Quality Assessment.
Home environment. A number of the intermediate outcomes that would assess changes in the home environment were presented in our discussion of FFP. Details about measures of parenting stress, household conflict, father presence, father involvement with the child, and increased parenting support and warmth are also presented there. In addition to those outcomes, evaluators should collect information about the level of financial support that the father provides to the child, as the VIP program developers believe that this level could increase as a result of the case management services provided to fathers.
Mental health and substance abuse problems. Additional intermediate outcomes are process measures that assess the extent to which discussion of mental health and substance abuse issues in Building Strong Families sessions may lead to screening and treatment. These measures can be collected from program data (paper flow or electronic). The last intermediate outcome shown in Table III.4 assesses the change in parents’ psychological well-being that could result from the receipt of mental health counseling. Two existing and tested measurement tools are suggested for this purpose.
Child outcomes. Particularly when selecting child outcome measures in the area of social and emotional development, evaluators must carefully consider the theory of change underlying the enhancement and must select the measure or measures that best fit with that theory. For example, is change expected on a particular social-emotional outcome, such as empathy or problem solving, or is change expected on social competence more broadly, such as by decreasing problem behaviors and increasing positive social relationships? The answer for an evaluation of VIP may be mixed. The Second Step curriculum focuses on specific skills, such as empathy, impulse control, anger management, and problem solving, so measures that can capture practice of these skills could be important. However, the three other components of VIP are intended to affect the child indirectly through improvements in the home environment that contribute to social and emotional competence more broadly. Certainly, children who do acquire the skills taught through the Second Step curriculum might be expected to improve in broader measures of social and emotional competence as well.
Second Step developers and previous evaluators currently are using two direct child measures that might be useful tools for measuring child outcomes for the VIP program. The Second Step Knowledge Assessment for Non- or Beginning Readers uses black-and-white pictures to depict social situations, and to assess children’s social-emotional knowledge and skill. The assessment tool uses a story-and-question format similar to the format of the Second Step lessons. It also has been used in evaluation research with preschool and kindergarten students. The tool could provide useful information as a measure of children’s knowledge about the concepts of empathy, problem solving, management of strong emotions, and ways to respond to problematic situations with peers that are taught in Second Step lessons—children who show a learned competency in these social-emotional skills may be more likely to practice these skills on their own. Another option for measurement of this outcome is the Social Problem-Solving Test-Revised, which is another direct child measure designed to assess the quantitative and qualitative dimensions of social problem solving. Although it may be useful to examine this existing tool, the assessment tool that is specific to the Second Step curriculum may be the most productive route.
Many other good options exist for measuring aspects of preschool-age children’s social and emotional development. In Table III.4, we present a few measurement options for the more specific skills of empathy, self-control, and problem solving, as well as broader measures of behavioral problems and prosocial behavior. These outcomes are typically measured through parent and teacher reports. However, when measuring social relationships, observational tools may be used to directly observe a child in play tasks with either another child or a parent. The Empathy Subscale of the Social Skills Rating System is a validated subscale based on teachers’ reports. Measures of self-control often can be thought of as measures of social relationships more broadly; however, the Self-Control Subscale of the Social Skills Rating System measures self-control more directly and is a valid and reliable multi-item measure of children’s self-control at home (the parent report) or in the classroom (the teacher report). Problem-solving skills could be measured using the teacher report of the Social Problem-Solving Test-revised. This tool is a complementary one to the direct child assessment discussed above. However, rather than directly testing a child’s knowledge, the teacher report assesses a child’s actual practice of problem-solving skills. Our example of the FFP included a discussion of the broader measures of behavior problems and prosocial behavior that are presented in Table III.4. We refer you to that discussion for details on these measures.
Circles of Care has developed what they consider to be a comprehensive approach to promote the social and emotional development of Head Start children combined with the integration of mental health problem prevention and intervention services for both children and their families. At the end of the six-month planning period that is supported by I&I funding, Circles of Care will have further refined and promoted its model and intends to implement VIP in five Head Start programs in Brevard County, Florida. During implementation, Circles of Care should use the experience of the five sites to develop and refine documentation so that the enhancement is portable to other Head Start programs. Because of the breadth of this enhancement, substantial measurement work must be undertaken. This work can begin at any time and measures can be tested in the existing sites, as appropriate.
The VIP program currently is administered by one entity—Circles of Care—that is ostensibly capable of providing the range of services comprising the full enhancement. As in FFP, the VIP program will have to be implemented in sites beyond its core service area to ensure that Head Start programs with diverse resources and in diverse communities can identify the appropriate partners to support and sustain the full VIP model. This expanded implementation should occur only after documentation has achieved a sufficient level of depth and clarity to assist other programs in replicating VIP, and only after measurement tools have been identified or developed. The second round of implementation sites should inform replication efforts as well as serve as testing grounds for the measurement framework.
If it can meet the criteria for progression to Stage 2, the VIP enhancement could be an interesting candidate for a planned variation study. A number of evaluation studies already have been completed on the Second Step curriculum that suggest promising results. One pre-post outcomes study of 109 predominantly African-American and Latino three- to seven-year-old children from low-income urban families found that children demonstrated an increased conceptual knowledge of social skills, a decrease in observed levels of physical and verbal aggression, and a decrease in disruptive behavior (McMahon et al. 2000). A rigorous evaluation of the VIP program could include both sites that implement only the Second Step curriculum and sites that implement all the VIP program’s components. This type of approach could test the added value of a full family strategy versus a direct classroom approach.
1 In this example, we assume that the two fidelity measures and their thresholds for determining fidelity are equally rigorous. If they are not, it will be difficult to compare fidelity across the two curricula. (back to footnote 1)
| Table of Contents | Previous | Next |

